-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame.from_dict unexpectedly "flatten" tuples in the dictionary keys #16769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that it works as excepted in pandas 0.19.2. |
It seems that the problem is rather the DataFrame class itself. It happens even when there are multiple keys.: In [1]: import pandas
In [2]: pandas.DataFrame({('a',): [1], ('b',): [2]}).columns
Out[2]: Index(['a', 'b'], dtype='object')
In [3]: pandas.DataFrame({('a',): [1], 'b': [2]}).columns
Out[3]: Index([('a',), 'b'], dtype='object') I expect Reproduced in the current master (1265c27):
INSTALLED VERSIONS
------------------
commit: 1265c27
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.47-1-lts
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0.dev+179.g1265c27f4 |
Another related strange and inconsistent behavior is that setting tuple(s) to In [1]: import pandas
In [2]: df1 = pandas.DataFrame({'a': [1]})
In [3]: df1.columns = [('a',)]
In [4]: df1.columns
Out[4]: Index(['a'], dtype='object')
In [5]: df2 = pandas.DataFrame({'a': [1], 'b': [2]})
In [6]: df2.columns = [('a',), ('b',)]
In [7]: df2.columns
Out[7]: Index([('a',), ('b',)], dtype='object')
In [8]: df3 = pandas.DataFrame({('a',): [1], ('b',): [2]})
In [9]: df3.columns
Out[9]: Index(['a', 'b'], dtype='object')
In [10]: df3.columns = [('a',), ('b',)]
In [11]: df3.columns
Out[11]: Index([('a',), ('b',)], dtype='object') I expect |
Here is a workaround I've found. You can avoid pandas "de-tupling" a singleton tuple by adding a dummy column: newcolumns = [c if isinstance(c, tuple) else (c,) for c in df.columns]
dummy = object()
df[dummy] = pandas.Categorical(0)
df.columns = newcolumns + [dummy]
del df[dummy] |
@tkf a len-1 tuple as an Index entry is not allowed. This would imply
would actually return a 1-level You can bisect if you want and see when this changed as you said it worked in 0.19.2. |
@jreback I'm not talking about MultiIndex here. I want to use tuples of arbitrary length as keys. This is actually noted as a valid use-case in the document:
At the "atomic" levels, I think any hashable has to be accepted as-is. This includes a singleton tuple. |
@tkf well, you are fighting pandas here. I'll mark it, and if you can find a change that makes your test work and preserves other behavior then would accept. |
Looks like this is returning the expected result on master. I supposed this could use a test:
|
Code Sample, a copy-pastable example if possible
Problem description
When (1) dictionaries with a single identical key is given to
pandas.DataFrame.from_dict
and (2) the key is a singleton tuple, then it returns a dataframe whose column is the content of the tuple, instead of the tuple itself.Note that this problem does not happen when (1) is not the case (see
In [4]
) or (2) is not the case (seeIn [5]
). It makes the case (1) & (2) inconsistent with those other cases.Expected Output
Output of
pd.show_versions()
pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.0
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
It is also reproduced with the current master branch:
pandas: 0.21.0.dev+179.g1265c27f4
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: