Skip to content

BUG: For Multi-Index, joining same level names in opposite order results in infinite recursion #28956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MatthewMcGonagle opened this issue Oct 13, 2019 · 3 comments
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex

Comments

@MatthewMcGonagle
Copy link

MatthewMcGonagle commented Oct 13, 2019

Code Sample

import pandas as pd
idx = pd.IndexSlice
import numpy as np
import sys
import traceback
trace_limit = 10

# Make the original index order.
ab_ind = pd.MultiIndex.from_product(
    [['i', 'ii', 'iii'],
    [1, 2, 3]],
    names = ['A', 'B'])

# Make the dataframe with levels in order ['A', 'B']
ab_df = pd.DataFrame(
    np.arange(len(ab_ind)),
    index = ab_ind)
print(ab_df)

# Now swap the order of the levels to ['B', 'A']
ba_df = ab_df.swaplevel('A', 'B')
print(ba_df)

# Now try adding them together.
try:
    sum_df = ab_df + ba_df
    print(sum_df)
except RecursionError:
    print('Recursion Error! Here is {limit} levels of the stack trace:'
          .format(limit = trace_limit))
    print(
        traceback.print_tb(
            sys.exc_info()[2], limit = trace_limit))

This gives the following output:

       val
A  B
a1 b1    0
   b2    1
   b3    2
a2 b1    3
   b2    4
   b3    5
       val
B  A
b1 a1    0
b2 a1    1
b3 a1    2
b1 a2    3
b2 a2    4
b3 a2    5
Recursion Error! Here is 10 levels of the stack trace:
  File "recursion_error.py", line 27, in <module>
    sum_df = ab_df + ba_df
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\ops\__init__.py", line 1493, in f
    return self._combine_frame(other, pass_op, fill_value, level)
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 5359, in _combine_frame
    this, other = self.align(other, join="outer", level=level, copy=False)
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3939, in align
    broadcast_axis=broadcast_axis,
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 8811, in align
    fill_axis=fill_axis,
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 8850, in _align_frame
    other.index, how=join, level=level, return_indexers=True
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3522, in join
    return self._join_multi(other, how=how, return_indexers=return_indexers)
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3643, in _join_multi
    other_jnlevels, how, return_indexers=True
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3522, in join
    return self._join_multi(other, how=how, return_indexers=return_indexers)
  File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3643, in _join_multi
    other_jnlevels, how, return_indexers=True
None

Problem description

Switching the order of the levels of the Multi-Index results in infinite recursion if we use any operation that tries to join the original order of levels with the new order of levels. The desired result of ab_df + ba_df should just be the same as 2 * ab_df.

Expected Output

I would expect the same as 2 * ab_df where we choose an order of levels coming from the first operand ab_df in ab_df + ba_df. So we should get the dataframe:

       val
A  B
a1 b1    0
   b2    2
   b3    3
a2 b1    6
   b2    8
   b3    10

The actual order isn't as important as not resulting in an infinite recursion.

Possible Cause of Bug

In pandas.core.indexes.base.join(), there is first a check to see of self.names == other.names. This is false if the names are the same but in the opposite order.

Then in pandas.core.indexes.base._join_multi(), there is a call on pandas.core.indexes.base.join() on the set intersection of the level names. So this treats the levels the same even if they are in the opposite order.

For the same set of names but in the wrong order, the recursion keeps alternating between pandas.core.indexes.base.join() and pandas.core.indexes.base._join_multi().

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.1
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.6
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

@MatthewMcGonagle MatthewMcGonagle changed the title BUG: For Multi-Index, same level names in opposite order results in infinite recursion BUG: For Multi-Index, joining same level names in opposite order results in infinite recursion Oct 13, 2019
@eriknw
Copy link
Contributor

eriknw commented Oct 15, 2019

This looks like a duplicate of #25760

Note: I just encountered this too!

@TomAugspurger
Copy link
Contributor

Thanks Eric, this does look like a duplicate of #25760.

Not sure what the best way to fix it is though...

@MatthewMcGonagle
Copy link
Author

Sorry, I did a search for "infinite recursion" and missed the original issue. I'll add a comment as to what I think is causing it there.

@jreback jreback added Bug MultiIndex Duplicate Report Duplicate issue or pull request labels Jan 1, 2020
@jreback jreback added this to the No action milestone Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants