-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: isin fails to correctly compare None values. #41145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DriesSchaumont
commented
Apr 24, 2021
•
edited
Loading
edited
- closes different behaviour of df.isin() in 1.0.5/1.1.0, when df contains None #35565
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
@@ -160,17 +160,19 @@ def vec_compare(ndarray[object] left, ndarray[object] right, object op) -> ndarr | |||
for i in range(n): | |||
x = left[i] | |||
y = right[i] | |||
|
|||
if checknull(x) or checknull(y): | |||
if x is None and y is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is superfluous as checknull already checks None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this was incorrect. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was incorrect?
rule of thumb: let the question-asker mark the conversation as resolved
result[i] = True | ||
else: | ||
result[i] = PyObject_RichCompareBool(x, y, flag) | ||
else: | ||
for i in range(n): | ||
x = left[i] | ||
y = right[i] | ||
|
||
if checknull(x) or checknull(y): | ||
if x is None and y is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is contrary to what we do with nulls generally, does anything break when you make this change (only)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, this does not break any tests. However, I am starting to think that #35565 is not actually a bug, but intended behavior. From the docs: "One has to be mindful that in Python (and NumPy), the nan's don’t compare equal, but None's do. Note that pandas/NumPy uses the fact that np.nan != np.nan, and treats None like np.nan."
I see the same behavior in scalar_compare
:
Lines 74 to 86 in 1269bc6
if flag == Py_NE: | |
for i in range(n): | |
x = values[i] | |
if checknull(x): | |
result[i] = True | |
elif isnull_val: | |
result[i] = True | |
else: | |
try: | |
result[i] = PyObject_RichCompareBool(x, val, flag) | |
except TypeError: | |
result[i] = True | |
elif flag == Py_EQ: |
I found this related issue: #34975
Should we close this PR and the issue, then?
@DriesSchaumont can you respond to jreback comments |
Hi @jbrockmendel, I am in the middle of moving to a new place and I currently have limited internet access. I will try to come back to this PR as soon as possible. I hope this is not blocking anything? |
no, not to worry |
Seems I accidentally pushed a commit that was meant to be in another branch. Updating this PR now. I will add a comment when this is ready for review. |
if possible we should reuse libmissing.is_matching_na |
closing as stale, if you want to continue working, please ping. |