-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Hash and compare tuple subclasses as builtin tuples #59286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/libs/test_hashtable.py
Outdated
assert table.get_item(nan2) == 42 | ||
with pytest.raises(KeyError, match=None) as error: | ||
table.get_item(other) | ||
assert str(error.value) == str(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead could you use the match
argument instead of setting it to None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be removed now that you added the match argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes more sense. I'll fix the other tests too while I'm at it.
Co-authored-by: Matthew Roeschke <[email protected]>
Thanks @matiaslindgren |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Relaxing the
PyTuple_CheckExact
in khash_python.h toPyTuple_Check
would allow the usage of types created withcollections.namedtuple
as index keys everywheretuple
is expected.It is worth noting this would change the behaviour of all client code expecting custom
__eq__
and__hash__
functions intuple
subclasses to be respected.This also removes the non-deterministic behaviour in cases where
collections.namedtuple
types andtuple
s are used interchangeably.I traced the intermittent
KeyError
s to this part inPyObjectHashTable
.I did not verify this, but I suspect the custom tuple hasher leads to a scenario where builtin
tuple
s are hashed using the custom hasher, while alltuple
subclasses are hashed with their own__hash__
, resulting in different hash values.For small hash tables, the resulting index might point to the correct bucket just by coincidence.
A similar issue, but for different types, was reported in #39585.