Skip to content

BUG: Pandas's ujson module incorrectly returns None when it reads NaN. #46627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
Erotemic opened this issue Apr 3, 2022 · 2 comments
Open
3 tasks done
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@Erotemic
Copy link
Contributor

Erotemic commented Apr 3, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
    import io

    # There is corner case in pandas ujson library.

    # Normally it looks like NaNs are read correctly

    f1 = io.StringIO('[NaN]')
    df1 = pd.read_json(f1)
    print(df1)

    # But that is only because pandas sees a list of floaty things and converts
    # None to a floating NaN

    # If we add an object so the list is no longer floaty, we can see
    # that the NaN is read as a None.
    f2 = io.StringIO('[NaN, {}, null, 1]')
    df2 = pd.read_json(f2)
    print(df2)

    # The minimal bug can be demoed using the ujson vendored library
    from  pandas._libs import json as pd_ujson
    loaded = pd_ujson.loads('[NaN]')
    print('loaded = {!r}'.format(loaded))

Issue Description

The above script has comments explaining it. The main issue is the ujson parser in pandas generates a null when it reads a NaN.

This is a bug, albeit minor because the only way you would ever notice is if you loaded a json object with nans and non-numeric members.

This is seen via the output of the above MWE:


# It seems like pandas reads Nones correctly
    0
0 NaN


# But it actually doesn't
      0
0  None
1    {}


# Underlying ujson parser always replaces NaNs with Nones
2  None
3     1
loaded = [None]

Expected Behavior

Instead of emiting a None, the ujson parser should emit a Py_NaN when it reads nan. I have already fixed this in ujson itself, and I will submit a PR fixing it here.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4bfe3d0 python : 3.10.1.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-39-generic Version : #44-Ubuntu SMP Thu Mar 24 15:35:05 UTC 2022 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 57.5.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.2.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@Erotemic Erotemic added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 3, 2022
@Erotemic Erotemic mentioned this issue Apr 3, 2022
4 tasks
@rhshadrach
Copy link
Member

@Erotemic - thanks for the report; can you give this issue a more informative title.

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 5, 2022
@Erotemic Erotemic changed the title BUG: BUG: Pandas's ujson module incorrectly returns None when it reads NaN. Apr 6, 2022
@Erotemic
Copy link
Contributor Author

Erotemic commented Apr 6, 2022

Must have missed filling that part in. Fixed.

Erotemic added a commit to Erotemic/pandas that referenced this issue Apr 6, 2022
Erotemic added a commit to Erotemic/pandas that referenced this issue May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants