BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() #41876

neelmraman · 2021-06-08T17:11:05Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# raises a KeyError due to the missing 'id' in the second element, even though errors='ignore' was passed
pd.json_normalize(
    data=[{"id": 1, "outer": {"inner": [{"field": 2}]}}, {"outer": {"inner": [{"field": 3}]}}],
    record_path=["outer", "inner"],
    meta=["id"],
    errors="ignore",
)

# no issues since both elements have an 'id', and shows that it is looking for 'id' at the right level
pd.json_normalize(
    data=[{"id": 1, "outer": {"inner": [{"field": 2}]}}, {"id": 4, "outer": {"inner": [{"field": 3}]}}],
    record_path=["outer", "inner"],
    meta=["id"],
    errors="ignore",
)

# no issues when record_path has length 1
pd.json_normalize(
    data=[{"id": 1, "outer": [{"field": 2}]}, {"outer": [{"field": 3}]}],
    record_path=["outer"],
    meta=["id"],
    errors="ignore",
)

Problem description

I would expect the first case above to return the same output as the third case (i.e., with missing ids filled with np.nan). Is the absence of a try...except around the first _pull_field() call intentional?

Expected Output

    field    id
0       2     1
1       3   NaN

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.67-ts1
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.utf8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.16.6
pytz : 2020.4
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : 5.3.1
hypothesis : 3.57.0
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.0.3
numexpr : 2.6.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pytest : 5.3.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.18
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.1.0
xlwt : None
xlsxwriter : None
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

neelmraman · 2021-06-19T01:44:39Z

take

… (pandas-dev#42179)

neelmraman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 8, 2021

github-actions bot assigned neelmraman Jun 19, 2021

neelmraman added a commit to neelmraman/pandas that referenced this issue Jun 21, 2021

BUG: json_normalize not consistently ignoring errors (pandas-dev#41876)

1575f2f

neelmraman mentioned this issue Jun 21, 2021

BUG: json_normalize not consistently ignoring errors (#41876) #42179

Merged

4 tasks

neelmraman added a commit to neelmraman/pandas that referenced this issue Jun 22, 2021

adding unit tests (pandas-dev#41876)

cbea941

neelmraman added a commit to neelmraman/pandas that referenced this issue Jun 23, 2021

fixing typo in unit tests (pandas-dev#41876)

1c7c10c

jreback added this to the 1.4 milestone Jun 25, 2021

jreback added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 25, 2021

neelmraman added a commit to neelmraman/pandas that referenced this issue Jun 30, 2021

add whatsnew and other minor changes (pandas-dev#41876)

4a2197b

jreback closed this as completed in #42179 Jul 15, 2021

jreback pushed a commit that referenced this issue Jul 15, 2021

BUG: json_normalize not consistently ignoring errors (#41876) (#42179)

e271712

feefladder pushed a commit to feefladder/pandas that referenced this issue Sep 7, 2021

BUG: json_normalize not consistently ignoring errors (pandas-dev#41876)…

7aa5300

… (pandas-dev#42179)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() #41876

BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() #41876

neelmraman commented Jun 8, 2021 •

edited

Loading

INSTALLED VERSIONS

neelmraman commented Jun 19, 2021

BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() #41876

BUG: seemingly unexpected behavior with errrors='ignore' in json_normalize() #41876

Comments

neelmraman commented Jun 8, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

neelmraman commented Jun 19, 2021

neelmraman commented Jun 8, 2021 •

edited

Loading

Output of `pd.show_versions()`