-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Multiindex recurse error fix #29260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiindex recurse error fix #29260
Conversation
Hmm it doesn't appear a consensus was ever made on this, but the suggestion was to raise. Why do you think this is better than raising? |
there are join operations that modify the order of MultiIndex names, like x=pd.DataFrame(np.arange(4),pd.MultiIndex.from_product([[1,2],[3,4]],names=['a','b']))
x2=pd.DataFrame(np.arange(8),pd.MultiIndex.from_product([[1,2],[5,6],[3,4]],names=['a','c','b']))
x2.join(x, lsuffix='x1') so I thought it is not such a definitive feature and I couldn't think of a use case where it might be a problem also, this change doesn't break any of the tests, so there seems to be no agreement on the other side either anyway, raising a recursion error is definitely not an intended effect, changing it to a more informative error can also be a way to go. I just feel that it is more in line with the general way of python (and pandas) for the api to be more practical, convenient and flexible than restrictive |
pandas/core/indexes/base.py
Outdated
@@ -3549,8 +3549,12 @@ def _join_multi(self, other, how, return_indexers=True): | |||
ldrop_names = list(self_names - overlap) | |||
rdrop_names = list(other_names - overlap) | |||
|
|||
self_jnlevels = self.droplevel(ldrop_names) | |||
other_jnlevels = other.droplevel(rdrop_names) | |||
if len(ldrop_names + rdrop_names) == 0: # if only the order differs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than add a special case, is there an issue droplevel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, "issue droplevel"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you are trying to fix here. is there somthing that is actually wrong with droplevel instead of adding a special case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing is wrong with droplevel, it just gets called twice with empty lists if len(ldrop_names + rdrop_names) == 0
is true, which seems pointless. and then because this doesn't change anything, it ends up in an infinite recursion loop. this can be stopped with checking for this condition, and the reordering of the names makes the join work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is not idiomatic, use
if not len(....)
....
put he comment on the line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a bug fix whatsnew in reshaping in 1.0
midx1 = pd.MultiIndex.from_product([[1, 2], [3, 4]], names=["a", "b"]) | ||
midx2 = pd.MultiIndex.from_product([[1, 2], [3, 4]], names=["b", "a"]) | ||
|
||
midx1.join(midx2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert the results of the join here
pandas/core/indexes/base.py
Outdated
@@ -3549,8 +3549,12 @@ def _join_multi(self, other, how, return_indexers=True): | |||
ldrop_names = list(self_names - overlap) | |||
rdrop_names = list(other_names - overlap) | |||
|
|||
self_jnlevels = self.droplevel(ldrop_names) | |||
other_jnlevels = other.droplevel(rdrop_names) | |||
if len(ldrop_names + rdrop_names) == 0: # if only the order differs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is not idiomatic, use
if not len(....)
....
put he comment on the line above
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -421,6 +421,7 @@ Reshaping | |||
- :func:`qcut` and :func:`cut` now handle boolean input (:issue:`20303`) | |||
- Fix to ensure all int dtypes can be used in :func:`merge_asof` when using a tolerance value. Previously every non-int64 type would raise an erroneous ``MergeError`` (:issue:`28870`). | |||
- Better error message in :func:`get_dummies` when `columns` isn't a list-like value (:issue:`28383`) | |||
- Bug in :meth:`Index.join` that caused infinite recursion error for mismatched multiindex name orders (:issue:`25760`, :issue:`28956`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this more user-facing? I somehow doubt the test case in this PR is something users would do in actual code, so this seems to be hinting at addressing some other use case(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting a use case into the docs or expanding the tests?
assert midx1.equals(join_idx) | ||
assert midx2.equals(join_idx) | ||
assert lidx is None | ||
tm.assert_numpy_array_equal(ridx, exp_ridx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use tm.assert_index_equal here instead of breaking up into individual arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only do this for midx1 as Index.equals is not sensitive for the order of names, but tm.assert_index_equal is.
Anyway, I expanded both the test and the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure I totally follow; can you not set the proper expectation inclusive of the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mdx does not equal join_idx, so midx2 just remains the same, basically, which is fine for Index.equals, but there is nothing to compare it with in this example using tm.assert_index_equal. That would require a more complicated example. I didn't add any, as the order of the indices is not agreed upon, I just corrected a bug one way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, since this I expanded the test a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - the last batch of updates look good. Do you know if this works when there are unused levels in the MultiIndex as well? A test case to cover that might be good
assert midx1.equals(join_idx) | ||
assert midx2.equals(join_idx) | ||
assert lidx is None | ||
tm.assert_numpy_array_equal(ridx, exp_ridx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure I totally follow; can you not set the proper expectation inclusive of the order?
can you merge master |
thanks @endremborza |
test and 3 line fix for a small bug mentioned in 2 issues (1 closed). test might be expanded, I only checked for the error disappearing
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff