Multiindex recurse error fix #29260

endremborza · 2019-10-28T22:43:15Z

test and 3 line fix for a small bug mentioned in 2 issues (1 closed). test might be expanded, I only checked for the error disappearing

closes RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760 (and BUG: For Multi-Index, joining same level names in opposite order results in infinite recursion #28956 )
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

update

WillAyd · 2019-10-29T03:24:12Z

Hmm it doesn't appear a consensus was ever made on this, but the suggestion was to raise. Why do you think this is better than raising?

endremborza · 2019-10-29T11:18:50Z

there are join operations that modify the order of MultiIndex names, like

x=pd.DataFrame(np.arange(4),pd.MultiIndex.from_product([[1,2],[3,4]],names=['a','b']))
x2=pd.DataFrame(np.arange(8),pd.MultiIndex.from_product([[1,2],[5,6],[3,4]],names=['a','c','b']))
x2.join(x, lsuffix='x1')

so I thought it is not such a definitive feature and I couldn't think of a use case where it might be a problem

also, this change doesn't break any of the tests, so there seems to be no agreement on the other side either

anyway, raising a recursion error is definitely not an intended effect, changing it to a more informative error can also be a way to go. I just feel that it is more in line with the general way of python (and pandas) for the api to be more practical, convenient and flexible than restrictive

jreback · 2019-10-29T12:25:07Z

pandas/core/indexes/base.py

@@ -3549,8 +3549,12 @@ def _join_multi(self, other, how, return_indexers=True):
            ldrop_names = list(self_names - overlap)
            rdrop_names = list(other_names - overlap)

-            self_jnlevels = self.droplevel(ldrop_names)
-            other_jnlevels = other.droplevel(rdrop_names)
+            if len(ldrop_names + rdrop_names) == 0:  # if only the order differs


rather than add a special case, is there an issue droplevel?

I'm sorry, "issue droplevel"?

I am not sure what you are trying to fix here. is there somthing that is actually wrong with droplevel instead of adding a special case here

nothing is wrong with droplevel, it just gets called twice with empty lists if len(ldrop_names + rdrop_names) == 0 is true, which seems pointless. and then because this doesn't change anything, it ends up in an infinite recursion loop. this can be stopped with checking for this condition, and the reordering of the names makes the join work.

ok, this is not idiomatic, use

if not len(....) ....

put he comment on the line above

jreback

pls add a bug fix whatsnew in reshaping in 1.0

jreback · 2019-10-31T14:45:19Z

pandas/tests/indexes/multi/test_join.py

+    midx1 = pd.MultiIndex.from_product([[1, 2], [3, 4]], names=["a", "b"])
+    midx2 = pd.MultiIndex.from_product([[1, 2], [3, 4]], names=["b", "a"])
+
+    midx1.join(midx2)


assert the results of the join here

jreback · 2019-10-31T14:45:50Z

pandas/core/indexes/base.py

@@ -3549,8 +3549,12 @@ def _join_multi(self, other, how, return_indexers=True):
            ldrop_names = list(self_names - overlap)
            rdrop_names = list(other_names - overlap)

-            self_jnlevels = self.droplevel(ldrop_names)
-            other_jnlevels = other.droplevel(rdrop_names)
+            if len(ldrop_names + rdrop_names) == 0:  # if only the order differs


ok, this is not idiomatic, use

if not len(....) ....

put he comment on the line above

WillAyd · 2019-10-31T22:42:11Z

doc/source/whatsnew/v1.0.0.rst

@@ -421,6 +421,7 @@ Reshaping
 - :func:`qcut` and :func:`cut` now handle boolean input (:issue:`20303`)
 - Fix to ensure all int dtypes can be used in :func:`merge_asof` when using a tolerance value. Previously every non-int64 type would raise an erroneous ``MergeError`` (:issue:`28870`).
 - Better error message in :func:`get_dummies` when `columns` isn't a list-like value (:issue:`28383`)
+- Bug in :meth:`Index.join` that caused infinite recursion error for mismatched multiindex name orders (:issue:`25760`, :issue:`28956`)


Can you make this more user-facing? I somehow doubt the test case in this PR is something users would do in actual code, so this seems to be hinting at addressing some other use case(s)

putting a use case into the docs or expanding the tests?

WillAyd · 2019-10-31T22:42:34Z

pandas/tests/indexes/multi/test_join.py

+    assert midx1.equals(join_idx)
+    assert midx2.equals(join_idx)
+    assert lidx is None
+    tm.assert_numpy_array_equal(ridx, exp_ridx)


Can you use tm.assert_index_equal here instead of breaking up into individual arrays?

I can only do this for midx1 as Index.equals is not sensitive for the order of names, but tm.assert_index_equal is.

Anyway, I expanded both the test and the docs.

Hmm not sure I totally follow; can you not set the proper expectation inclusive of the order?

mdx does not equal join_idx, so midx2 just remains the same, basically, which is fine for Index.equals, but there is nothing to compare it with in this example using tm.assert_index_equal. That would require a more complicated example. I didn't add any, as the order of the indices is not agreed upon, I just corrected a bug one way.

anyway, since this I expanded the test a little

WillAyd

Thanks - the last batch of updates look good. Do you know if this works when there are unused levels in the MultiIndex as well? A test case to cover that might be good

WillAyd · 2019-11-07T20:54:36Z

pandas/tests/indexes/multi/test_join.py

+    assert midx1.equals(join_idx)
+    assert midx2.equals(join_idx)
+    assert lidx is None
+    tm.assert_numpy_array_equal(ridx, exp_ridx)


Hmm not sure I totally follow; can you not set the proper expectation inclusive of the order?

jreback · 2019-11-20T13:48:59Z

can you merge master

jreback · 2020-01-01T20:46:53Z

thanks @endremborza

endremborza added 6 commits September 19, 2019 20:07

Merge pull request #1 from pandas-dev/master

79a08d7

update

Merge remote-tracking branch 'upstream/master'

9dda1ec

Merge remote-tracking branch 'upstream/master'

03529a9

add test for multi join recursion error

d32222a

fix multi join recursion error

02e2645

black formatting

8bafd6b

WillAyd added the MultiIndex label Oct 29, 2019

jreback requested changes Oct 29, 2019

View reviewed changes

jreback requested changes Oct 31, 2019

View reviewed changes

endremborza added 4 commits October 31, 2019 17:29

add assertion to midx join test

99aa8bd

style corrections

a00e80f

add assertion to midx join test

505d814

whatsnew entry for join

5f09a01

WillAyd requested changes Oct 31, 2019

View reviewed changes

endremborza added 2 commits November 1, 2019 00:39

exapnd multiindex test

f22a665

whatsnew bugfix multiindex reshaping expand

66fb17d

WillAyd requested changes Nov 7, 2019

View reviewed changes

endremborza and others added 4 commits November 20, 2019 19:41

Merge remote-tracking branch 'upstream/master'

12af280

Merge branch 'master' into multiindex-recurse-error-fix

b33cef8

Merge branch 'master' into multiindex-recurse-error-fix

18acc85

clean tests

733b1c6

jreback added this to the 1.0 milestone Jan 1, 2020

jreback added the Bug label Jan 1, 2020

jreback approved these changes Jan 1, 2020

View reviewed changes

jreback merged commit 8778760 into pandas-dev:master Jan 1, 2020

hweecat pushed a commit to hweecat/pandas that referenced this pull request Jan 1, 2020

Multiindex recurse error fix (pandas-dev#29260)

970478e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiindex recurse error fix #29260

Multiindex recurse error fix #29260

endremborza commented Oct 28, 2019 •

edited

Loading

WillAyd commented Oct 29, 2019

endremborza commented Oct 29, 2019

jreback Oct 29, 2019

endremborza Oct 29, 2019

jreback Oct 30, 2019

endremborza Oct 30, 2019

jreback Oct 31, 2019

jreback left a comment

jreback Oct 31, 2019

jreback Oct 31, 2019

WillAyd Oct 31, 2019

endremborza Oct 31, 2019

WillAyd Oct 31, 2019

endremborza Oct 31, 2019

WillAyd Nov 7, 2019

endremborza Nov 20, 2019

endremborza Nov 20, 2019

WillAyd left a comment

WillAyd Nov 7, 2019

jreback commented Nov 20, 2019

jreback commented Jan 1, 2020

Multiindex recurse error fix #29260

Multiindex recurse error fix #29260

Conversation

endremborza commented Oct 28, 2019 • edited Loading

WillAyd commented Oct 29, 2019

endremborza commented Oct 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 20, 2019

jreback commented Jan 1, 2020

endremborza commented Oct 28, 2019 •

edited

Loading