Skip to content

BUG: isin fails to correctly compare None values. #41145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,7 @@ Styler
Other
^^^^^
- Bug in :meth:`CustomBusinessMonthBegin.__add__` (:meth:`CustomBusinessMonthEnd.__add__`) not applying the extra ``offset`` parameter when beginning (end) of the target month is already a business day (:issue:`41356`)
- Bug in :meth:`DataFrame.isin` and :meth:`Series.isin` incorrectly comparing ``None`` values to ``False`` (:issue:`35565`)

.. ***DO NOT USE THIS SECTION***

Expand Down
10 changes: 6 additions & 4 deletions pandas/_libs/ops.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -157,17 +157,19 @@ def vec_compare(ndarray[object] left, ndarray[object] right, object op) -> ndarr
for i in range(n):
x = left[i]
y = right[i]

if checknull(x) or checknull(y):
if x is None and y is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is superfluous as checknull already checks None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was incorrect. Changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was incorrect?

rule of thumb: let the question-asker mark the conversation as resolved

result[i] = False
elif checknull(x) or checknull(y):
result[i] = True
else:
result[i] = PyObject_RichCompareBool(x, y, flag)
else:
for i in range(n):
x = left[i]
y = right[i]

if checknull(x) or checknull(y):
if x is None and y is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is contrary to what we do with nulls generally, does anything break when you make this change (only)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this does not break any tests. However, I am starting to think that #35565 is not actually a bug, but intended behavior. From the docs: "One has to be mindful that in Python (and NumPy), the nan's don’t compare equal, but None's do. Note that pandas/NumPy uses the fact that np.nan != np.nan, and treats None like np.nan."

I see the same behavior in scalar_compare:

if flag == Py_NE:
for i in range(n):
x = values[i]
if checknull(x):
result[i] = True
elif isnull_val:
result[i] = True
else:
try:
result[i] = PyObject_RichCompareBool(x, val, flag)
except TypeError:
result[i] = True
elif flag == Py_EQ:

I found this related issue: #34975

Should we close this PR and the issue, then?

result[i] = True
elif checknull(x) or checknull(y):
result[i] = False
else:
result[i] = PyObject_RichCompareBool(x, y, flag)
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -1084,6 +1084,14 @@ def test_isin_float_df_string_search(self):
expected_false = DataFrame({"values": [False, False]})
tm.assert_frame_equal(result, expected_false)

def test_isin_none(self):
# GH35565
x = DataFrame([["foo", "bar"], [1, None]])
y = x[1].copy()
res = x.isin(y)
expected = DataFrame([[False, True], [False, True]])
tm.assert_frame_equal(res, expected)


class TestValueCounts:
def test_value_counts(self):
Expand Down