RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760

abirkmanis · 2019-03-18T01:40:01Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
x=pd.DataFrame(np.arange(4),pd.MultiIndex.from_product([[1,2],[3,4]],names=['a','b']))
y=pd.DataFrame(np.arange(4),pd.MultiIndex.from_product([[3,4],[1,2]],names=['b','a']))
print(x+y)

Problem description

RecursionError in align()/join().
The ideal expected behavior is to calculate the sum.
If different orders of names in MultiIndex are not supported by design, then a clear error message stating that would be preferable to RecursionError.

Expected Output

or

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.125-linuxkit
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.15.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: 4.7.1
html5lib: None
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-19T01:36:35Z

I'm not sure if we make any requirements around the ordering of levels in a MultiIndex - @mroeschke any thoughts here?

mroeschke · 2019-03-19T04:04:14Z

One issue is that if this arithmetic is allowed which solution should this op return?

I think this case is ambiguous and would opt to raise in this case - a more informative error message in this case.

abirkmanis · 2019-03-19T15:17:00Z

One issue is that if this arithmetic is allowed which solution should this op return?

If all operations after that "align on both row and column labels" as documentation says, then picking which to return is not important. It's like picking which order to use for lists that represent sets.
Conversely, if order is important (and not labels), then documentation of DataFrame has to be changed.

Another option is to change behavior of align - currently it puts common columns before others. It's a problem because even if I try to use a consistent order of columns in my application, it gets jumbled by operations that use align. E.g., if I have X and Y both with columns a and b, and Z with column b, then (X+Z)+Y will fail, as X+Z will have columns b and a (in this order).

khaeru · 2019-10-10T07:13:29Z

Just ran into this. I ended up having to write a workaround like this:

def align_levels(ref, obj):
    """Return a copy of *obj* with common levels in the same order as *ref*."""
    # Common levels in the same order as ref
    common = [n for n in ref.index.names if n in obj.index.names]
    # Levels only appearing on obj
    unique = [n for n in obj.index.names if n not in common]
    # Return copy with new level order
    return obj.reorder_levels(common + unique)

x * align_levels(x, y)

MatthewMcGonagle · 2019-10-18T02:59:31Z

I looked a little bit at the source code, and I think the cause of this is the following:

Possible Cause of the Issue

In pandas.core.indexes.base.join(), there is first a check to see of self.names == other.names. This is false if the names are the same but in the opposite order.

Then in pandas.core.indexes.base._join_multi(), there is a call on pandas.core.indexes.base.join() on the set intersection of the level names. So this treats the levels the same even if they are in the opposite order.

For the same set of names but in the wrong order, the recursion keeps alternating between pandas.core.indexes.base.join() and pandas.core.indexes.base._join_multi().

Question

Considering that there isn't any decision as to how this should work, would it be possible to check if the neither of the indexes changed in _join_multi() before recalling join()? If neither changed, then throw an error instead of allowing the infinite recursion?

WillAyd added the MultiIndex label Mar 19, 2019

mroeschke added the Error Reporting Incorrect or improved errors from pandas label Mar 19, 2019

khaeru added a commit to khaeru/ixmp that referenced this issue Oct 10, 2019

Add workaround for pandas-dev/pandas#25760

ab77331

eriknw mentioned this issue Oct 15, 2019

BUG: For Multi-Index, joining same level names in opposite order results in infinite recursion #28956

Closed

TomAugspurger added this to the Contributions Welcome milestone Oct 15, 2019

endremborza mentioned this issue Oct 28, 2019

Multiindex recurse error fix #29260

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Jan 1, 2020

jreback closed this as completed in #29260 Jan 1, 2020

khaeru added a commit to khaeru/genno that referenced this issue Apr 26, 2020

Add workaround for pandas-dev/pandas#25760

2e3fdf2

theemathas mentioned this issue Oct 6, 2020

BUG: Joining data frames with MultiIndex results in non-deterministic level order. #36910

Closed

3 tasks

khaeru added a commit to khaeru/genno that referenced this issue Jan 8, 2021

Add workaround for pandas-dev/pandas#25760

fbe377e

khaeru added a commit to khaeru/genno that referenced this issue Jan 8, 2021

Add workaround for pandas-dev/pandas#25760

4790888

khaeru added a commit to khaeru/genno that referenced this issue Jan 8, 2021

Add workaround for pandas-dev/pandas#25760

8ae937e

rs2 mentioned this issue Nov 2, 2021

BUG: Correctly specified broadcasting leads to a RecursionError #44290

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760

RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760

abirkmanis commented Mar 18, 2019 •

edited

Loading

INSTALLED VERSIONS

WillAyd commented Mar 19, 2019

mroeschke commented Mar 19, 2019

abirkmanis commented Mar 19, 2019 •

edited

Loading

khaeru commented Oct 10, 2019

MatthewMcGonagle commented Oct 18, 2019 •

edited

Loading

RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760

RecursionError when aligning DataFrames based on MultiIndex with different order of names #25760

Comments

abirkmanis commented Mar 18, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Mar 19, 2019

mroeschke commented Mar 19, 2019

abirkmanis commented Mar 19, 2019 • edited Loading

khaeru commented Oct 10, 2019

MatthewMcGonagle commented Oct 18, 2019 • edited Loading

Possible Cause of the Issue

Question

abirkmanis commented Mar 18, 2019 •

edited

Loading

Output of `pd.show_versions()`

abirkmanis commented Mar 19, 2019 •

edited

Loading

MatthewMcGonagle commented Oct 18, 2019 •

edited

Loading