Skip to content

BUG: isin fails to correctly compare None values. #41145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

DriesSchaumont
Copy link
Member

@DriesSchaumont DriesSchaumont commented Apr 24, 2021

@@ -160,17 +160,19 @@ def vec_compare(ndarray[object] left, ndarray[object] right, object op) -> ndarr
for i in range(n):
x = left[i]
y = right[i]

if checknull(x) or checknull(y):
if x is None and y is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is superfluous as checknull already checks None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was incorrect. Changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was incorrect?

rule of thumb: let the question-asker mark the conversation as resolved

result[i] = True
else:
result[i] = PyObject_RichCompareBool(x, y, flag)
else:
for i in range(n):
x = left[i]
y = right[i]

if checknull(x) or checknull(y):
if x is None and y is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is contrary to what we do with nulls generally, does anything break when you make this change (only)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this does not break any tests. However, I am starting to think that #35565 is not actually a bug, but intended behavior. From the docs: "One has to be mindful that in Python (and NumPy), the nan's don’t compare equal, but None's do. Note that pandas/NumPy uses the fact that np.nan != np.nan, and treats None like np.nan."

I see the same behavior in scalar_compare:

if flag == Py_NE:
for i in range(n):
x = values[i]
if checknull(x):
result[i] = True
elif isnull_val:
result[i] = True
else:
try:
result[i] = PyObject_RichCompareBool(x, val, flag)
except TypeError:
result[i] = True
elif flag == Py_EQ:

I found this related issue: #34975

Should we close this PR and the issue, then?

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 26, 2021
@simonjayhawkins simonjayhawkins added Bug Regression Functionality that used to work in a prior pandas version labels May 25, 2021
@jbrockmendel
Copy link
Member

@DriesSchaumont can you respond to jreback comments

@DriesSchaumont
Copy link
Member Author

Hi @jbrockmendel, I am in the middle of moving to a new place and I currently have limited internet access. I will try to come back to this PR as soon as possible. I hope this is not blocking anything?

@jbrockmendel
Copy link
Member

I hope this is not blocking anything?

no, not to worry

@DriesSchaumont
Copy link
Member Author

DriesSchaumont commented Jul 29, 2021

Seems I accidentally pushed a commit that was meant to be in another branch. Updating this PR now. I will add a comment when this is ready for review.

@DriesSchaumont DriesSchaumont requested a review from jreback July 30, 2021 08:45
@jbrockmendel
Copy link
Member

if possible we should reuse libmissing.is_matching_na

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

closing as stale, if you want to continue working, please ping.

@jreback jreback closed this Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

different behaviour of df.isin() in 1.0.5/1.1.0, when df contains None
4 participants