Skip to content

series.to_xarray() fails when MultiIndex not sorted in xarray 0.15.1 #3951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
delgadom opened this issue Apr 7, 2020 · 4 comments · Fixed by #3953
Closed

series.to_xarray() fails when MultiIndex not sorted in xarray 0.15.1 #3951

delgadom opened this issue Apr 7, 2020 · 4 comments · Fixed by #3953

Comments

@delgadom
Copy link
Contributor

delgadom commented Apr 7, 2020

series.to_xarray() fails when MultiIndex not sorted in xarray 0.15.1

Summary

It seems that series.to_xarray() fails (returns incorrect data) in xarray 0.15.1 when the dataframe's MultiIndex dimensions are not sorted

Demonstration

xarray should be able to handle MultiIndices with unsorted dimensions. Using a fresh conda environment with xarray 0.14.1:

$ conda run -n py37xr14 python test.py
>>> df
alpha  B  A
num
0      1  4
1      2  5
2      3  6

>>> df.stack('alpha')
num  alpha
0    B        1
     A        4
1    B        2
     A        5
2    B        3
     A        6
dtype: int64

>>> df.stack('alpha').to_xarray()
<xarray.DataArray (num: 3, alpha: 2)>
array([[1, 4],
       [2, 5],
       [3, 6]])
Coordinates:
  * num      (num) int64 0 1 2
  * alpha    (alpha) object 'B' 'A'

This fails in xarray 0.15.1 - note the data is not merely reordered - the data in column 'B' now has the incorrect values 4, 5, 6 rather than 1, 2, 3:

$ conda run -n py37xr15 python test.py
>>> df
alpha  B  A
num
0      1  4
1      2  5
2      3  6

>>> df.stack('alpha')
num  alpha
0    B        1
     A        4
1    B        2
     A        5
2    B        3
     A        6
dtype: int64

>>> df.stack('alpha').to_xarray()
<xarray.DataArray (num: 3, alpha: 2)>
array([[4, 1],
       [5, 2],
       [6, 3]])
Coordinates:
  * num      (num) int64 0 1 2
  * alpha    (alpha) object 'B' 'A'

Test setup & environment info

contents of test.py
import pandas as pd

df = pd.DataFrame({'B': [1, 2, 3], 'A': [4, 5, 6]})
df = df.rename_axis('num').rename_axis('alpha', axis=1)

print(">>> df")
print(df)

print("\n>>> df.stack('alpha')")
print(df.stack('alpha'))

print("\n>>> df.stack('alpha').to_xarray()")
print(df.stack('alpha').to_xarray())
packages in py37xr14 environment
$ conda list -n py37xr14
# packages in environment at /Users/delgadom/miniconda3/envs/py37xr14:
#
# Name                    Version                   Build  Channel
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libcxx                    9.0.1                         2    conda-forge
libffi                    3.2.1             h4a8c4bd_1007    conda-forge
libgfortran               4.0.0                         2    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libopenblas               0.3.9                h3d69b6c_0    conda-forge
llvm-openmp               9.0.1                h28b9765_2    conda-forge
ncurses                   6.1               h0a44026_1002    conda-forge
numpy                     1.18.1           py37h7687784_1    conda-forge
openssl                   1.1.1f               h0b31af3_0    conda-forge
pandas                    1.0.3            py37h94625e5_0    conda-forge
pip                       20.0.2                     py_2    conda-forge
python                    3.7.6           h90870a6_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2019.3                     py_0    conda-forge
readline                  8.0                  hcfe32e1_0    conda-forge
setuptools                46.1.3           py37hc8dfbb8_0    conda-forge
six                       1.14.0                     py_1    conda-forge
sqlite                    3.30.1               h93121df_0    conda-forge
tk                        8.6.10               hbbe82c9_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xarray                    0.14.1                     py_1    conda-forge
xz                        5.2.5                h0b31af3_0    conda-forge
zlib                      1.2.11            h0b31af3_1006    conda-forge
packages in py37xr15 environment
$ conda list -n py37xr15
# packages in environment at /Users/delgadom/miniconda3/envs/py37xr15:
#
# Name                    Version                   Build  Channel
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge
libblas                   3.8.0               16_openblas    conda-forge
libcblas                  3.8.0               16_openblas    conda-forge
libcxx                    9.0.1                         2    conda-forge
libffi                    3.2.1             h4a8c4bd_1007    conda-forge
libgfortran               4.0.0                         2    conda-forge
liblapack                 3.8.0               16_openblas    conda-forge
libopenblas               0.3.9                h3d69b6c_0    conda-forge
llvm-openmp               9.0.1                h28b9765_2    conda-forge
ncurses                   6.1               h0a44026_1002    conda-forge
numpy                     1.18.1           py37h7687784_1    conda-forge
openssl                   1.1.1f               h0b31af3_0    conda-forge
pandas                    1.0.3            py37h94625e5_0    conda-forge
pip                       20.0.2                     py_2    conda-forge
python                    3.7.6           h90870a6_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2019.3                     py_0    conda-forge
readline                  8.0                  hcfe32e1_0    conda-forge
setuptools                46.1.3           py37hc8dfbb8_0    conda-forge
six                       1.14.0                     py_1    conda-forge
sqlite                    3.30.1               h93121df_0    conda-forge
tk                        8.6.10               hbbe82c9_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xarray                    0.15.1                     py_0    conda-forge
xz                        5.2.5                h0b31af3_0    conda-forge
zlib                      1.2.11            h0b31af3_1006    conda-forge
@max-sixty
Copy link
Collaborator

Thanks for the issue @delgadom . This looks like a fairly significant bug. I'll try and look more at it later; if anyone else can @pydata/xarray that would be great.

@delgadom
Copy link
Contributor Author

delgadom commented Apr 7, 2020

yeah I use this pattern all the time - df.stack().to_xarray() seems to now fail unless your columns were sorted alphabetically. not sure yet where this is happening but it does result in some pernicious bad data errors that can be hard to debug if you catch them at all.

@delgadom
Copy link
Contributor Author

delgadom commented Apr 7, 2020

Here's a test script I'm using for this:

echo '$ conda create -n py37xr14 -c conda-forge --yes python=3.7 xarray=0.14.1'
conda create -n py37xr14 -c conda-forge --yes python=3.7 xarray=0.14.1 > /dev/null

echo '$ conda create -n py37xr15 -c conda-forge --yes python=3.7 xarray=0.15.1'
conda create -n py37xr15 -c conda-forge --yes python=3.7 xarray=0.15.1 > /dev/null

echo '$ conda run -n py37xr14 python test.py'
conda run -n py37xr14 python test.py

echo

echo '$ conda run -n py37xr15 python test.py'
conda run -n py37xr15 python test.py

echo
echo '$ conda list -n py37xr14'
conda list  -n py37xr14

echo
echo '$ conda list -n py37xr15'
conda list  -n py37xr15

conda env remove -n py37xr14 > /dev/null 2>&1
conda env remove -n py37xr15 > /dev/null 2>&1

@fujiisoup
Copy link
Member

Thanks, @delgadom, for reporting this issue.
Reproduced.

I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants