Skip to content

BUG: Index with null value not serialized correctly to json #50400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Dec 22, 2022

Restores 0.25.3 behavior, which honestly feels a little weird to me, since it doesn't roundtrip. I'm not too familiar with JSON format, though (I guess the problem is that null can't be a key?).

Also serialize decimal nans correctly.

} else {
tc->type = JT_NULL;
}
Py_DECREF(is_null_obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a case where PyObject_CallMethod returns NULL this would try to Py_DECREF NULL

if (PyFloat_Check(item)) {
double fval = PyFloat_AS_DOUBLE(item);
is_null = npy_isnan(fval);
} else if (item == Py_None || object_is_na_type(item)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a similar expression already in missing.pyx? I wonder if there's a way for us to be consistent about our missing object evaluation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checknull is probably the equivalent. Without having a C API, I don't really think we can unify anything sadly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Not a blocker here, but it might be nice to create a separate missing.c file with the proper implementation and then can wrap that from missing.pyx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second read I think we should at least create a separate function for int is_null(PyObject *obj) here. Can just be static in the current translation unit - makes the code more readable than the nested if statements

"is_nan",
NULL);
is_null = (is_null_obj == Py_True);
Py_DECREF(is_null_obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely segfault if PyObject_CallMethod returns NULL

@lithomas1 lithomas1 marked this pull request as ready for review December 23, 2022 18:22
@lithomas1 lithomas1 added Bug IO JSON read_json, to_json, json_normalize Index Related to the Index class or subclasses labels Dec 23, 2022
} else {
tc->type = JT_NULL;
}
Py_XDECREF(is_null_obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using Py_XDECREF we should explicitly check the return value of PyObject_CallMethod and return an error if it is NULL

} else {
tc->type = JT_NULL;
}
if (!is_null_obj) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should move this up to 1542 before the condition of equating to Py_False. This makes for a more consistent pattern where we always check for NULL after a Py* call

if (PyFloat_Check(item)) {
double fval = PyFloat_AS_DOUBLE(item);
is_null = npy_isnan(fval);
} else if (item == Py_None || object_is_na_type(item)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second read I think we should at least create a separate function for int is_null(PyObject *obj) here. Can just be static in the current translation unit - makes the code more readable than the nested if statements

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking nice - definitely going in the right direction

@@ -276,6 +276,27 @@ static int is_simple_frame(PyObject *obj) {
Py_DECREF(mgr);
return ret;
}
/* TODO: Consider unifying with checknull and co.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick note in the docstring that this returns -1 on error?

@@ -276,6 +276,27 @@ static int is_simple_frame(PyObject *obj) {
Py_DECREF(mgr);
return ret;
}
/* TODO: Consider unifying with checknull and co.
in missing.pyx */
static int is_null_obj(PyObject* obj) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but I think this should be called is_json_null since its semantics may vary from what we have elsewhere in the codebase

if (is_null == -1) {
// Something errored
// Return to let the error surface
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised we return 0 here but I see that you are just matching the pattern of the rest of the function. It should really be returning NULL in case of an error - looks to not be handled properly. But of course that is separate from this PR

@@ -1539,15 +1552,15 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
PyObject *is_null_obj = PyObject_CallMethod(obj,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be replaced with the function you are introducing? Seems like it should work to keep logic consistent?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 16, 2023
@simonjayhawkins
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses IO JSON read_json, to_json, json_normalize Stale
Projects
None yet
3 participants