BUG: pandas.read_csv incorrectly uses float.NaN instead of panda.NA #48173

Ark-kun · 2022-08-20T07:24:18Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df3 = pandas.read_csv(
    io.StringIO("""col1,col2
,
a,
""")
)
print(df3)

  col1 col2
0  NaN  NaN
1    a  NaN

Issue Description

I load CSV user files using pandas.read_csv. Some of the values might be missing.
Pandas has a missing value object - pandas.NA.
I also see that pandas.read_csv has several mentions of NA - the na_values and keep_default_na parameters.

However, when I parse CSV file with string data and some missing values, the missing values are replaced with float.NaN instead of pandas.NA.

Expected Behavior

I expect Pandas to use pandas.NA when values are missing, not float.NaN.

I expect Pandas to not use float.NaN in columns that consist of strings.

I expect Pandas to use float.NaN only in float columns (although even for float columns I'd expect pandas.NA).

   col1  col2
0  <NA>  <NA>
1     a  <NA>

Installed Versions

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-10-cloud-amd64
Version : #1 SMP Debian 4.19.132-1 (2020-07-24)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.1
numpy : 1.21.6
pytz : 2020.1
dateutil : 2.8.1
pip : 22.1.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : None
gcsfs : 0.7.0
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 9.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.49.1

The text was updated successfully, but these errors were encountered:

phofl · 2022-08-20T12:38:27Z

Hi, thanks for your report. There is currently no option to use nullable types in read csv, see #36712 and associated issues

…ook at column data Bugs: pandas-dev/pandas#48170 pandas-dev/pandas#48173 pandas-dev/pandas#48175

… data Bugs: pandas-dev/pandas#48170 pandas-dev/pandas#48173 pandas-dev/pandas#48175 See also: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

Ark-kun added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2022

phofl closed this as completed Aug 20, 2022

phofl added Usage Question IO CSV read_csv, to_csv NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2022

phofl mentioned this issue Aug 20, 2022

BUG: pandas.read_csv does not parse CSV with missing values in a sane way #48175

Closed

3 tasks

Ark-kun added a commit to Ark-kun/pipeline_components that referenced this issue Aug 31, 2022

fix: Pandas - Fixed Pandas data mangling for components that do not l…

8c78aae

…ook at column data Bugs: pandas-dev/pandas#48170 pandas-dev/pandas#48173 pandas-dev/pandas#48175

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas.read_csv incorrectly uses float.NaN instead of panda.NA #48173

BUG: pandas.read_csv incorrectly uses float.NaN instead of panda.NA #48173

Ark-kun commented Aug 20, 2022 •

edited

Loading

INSTALLED VERSIONS

phofl commented Aug 20, 2022

BUG: pandas.read_csv incorrectly uses float.NaN instead of panda.NA #48173

BUG: pandas.read_csv incorrectly uses float.NaN instead of panda.NA #48173

Comments

Ark-kun commented Aug 20, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Aug 20, 2022

Ark-kun commented Aug 20, 2022 •

edited

Loading