Skip to content

swap_dims() incorrectly changes underlying index name #3748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jaicher opened this issue Feb 4, 2020 · 1 comment · Fixed by #3752
Closed

swap_dims() incorrectly changes underlying index name #3748

jaicher opened this issue Feb 4, 2020 · 1 comment · Fixed by #3752
Labels

Comments

@jaicher
Copy link
Contributor

jaicher commented Feb 4, 2020

MCVE Code Sample

import xarray as xr

# create data array with named dimension and named coordinate
x = xr.DataArray([1], {"idx": [2], "y": ("idx", [3])}, ["idx"], name="x")

# what's our current index? (idx, this is fine)
x.indexes
# prints "idx: Int64Index([2], dtype='int64', name='idx')"

# swap dim so that y is our dimension, what's index now?
x.swap_dims({"idx": "y"}).indexes
# prints "y: Int64Index([3], dtype='int64', name='idx')"

The dimension name is appropriately swapped but the pandas index name is incorrect.

Expected Output

# swap dim so that y is our dimension, what's index now?
x.swap_dims({"idx": "y"}).indexes
# prints "y: Int64Index([3], dtype='int64', name='y')"

Problem Description

This is a problem because running x.swap_dims({"idx": "y"}).to_dataframe() gives
a dataframe with columns ["x", "idx"] and index "idx". This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 0.25.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: None

@dcherian dcherian added the bug label Feb 4, 2020
@dcherian
Copy link
Contributor

dcherian commented Feb 4, 2020

Thanks for reporting @jaicher

jaicher added a commit to jaicher/xarray that referenced this issue Feb 4, 2020
jaicher added a commit to jaicher/xarray that referenced this issue Feb 4, 2020
dcherian added a commit that referenced this issue Feb 24, 2020
* Added test for GH3748

* Rename newly created index in swap_dims() to dim name if not multiindex

Fixes GH3748

* Updated whats-new.rst with pull request information for swap_dims fix

* Move tests for GH3748 into existing swap_dims tests

+ integrated new tests for GH3748 for DataArray into existing swap_dims
  tests
+ added similar tests for Dataset
+ added test for multiindex case

Co-authored-by: Deepak Cherian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants