-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Pivot table drops column/index names=nan when dropna=false #14246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Input:
Output was:
Output after fix:
I think the rearrangement of columns is due to the hash function problem similar as the one in #12679 and needs extra work to solve. |
@@ -230,7 +230,7 @@ class Categorical(PandasObject): | |||
_typ = 'categorical' | |||
|
|||
def __init__(self, values, categories=None, ordered=False, | |||
name=None, fastpath=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a good idea to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback So the key of the issue is to pass a dropna
to core/algorithms.factorize()
, and factorize()
is called directly in categorical.__init__()
. Without the dropna
here, we lose the dropna
value from m = MultiIndex.from_arrays(cartesian_product(table.columns.levels), names=table.columns.names)
and here, and generate the wrong result:
z | NaN | A | B | All |
---|---|---|---|---|
y | ||||
a | 0 | 1 | 1 | |
b | 0 | 0 | 1 | 1 |
c | 0 | 0 | 0 | 1 |
All | NaN | 1 | 1 | 3 |
While the correct result should be:
z | NaN | A | B | All |
---|---|---|---|---|
y | ||||
a | 0 | 1 | 1 | |
b | 0 | 0 | 1 | 1 |
c | 1 | 0 | 0 | 1 |
All | 1 | 1 | 1 | 3 |
@@ -4323,7 +4323,8 @@ def infer(x): | |||
# ---------------------------------------------------------------------- | |||
# Merging / joining methods | |||
|
|||
def append(self, other, ignore_index=False, verify_integrity=False): | |||
def append(self, other, ignore_index=False, verify_integrity=False, | |||
dropna=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again not a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do about this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was wrong about it. The removal of the dropna
fails test pandas.tools.tests.test_pivot.test_margin_dropna
. When margins are appended, the missing of dropna
leads to the drop of NaN
level. This could also be solved by specifying the dropna
parameter in MultiIndex
class constructor.
@@ -1392,7 +1392,7 @@ def __getitem__(self, key): | |||
else: | |||
return result | |||
|
|||
def append(self, other): | |||
def append(self, other, dropna=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that I need to call the derived append()
in Multi.py
and the dropna
in Multi.append()
cannot be omitted. The possible way to remove this dropna
as well as the one here is: have an parameter dropna
in MultiIndex.__init__()
. I am not sure how you like this idea. I would be appreciated if you could tell me your thoughts on it. Thanks!
|
||
@classmethod | ||
def from_tuples(cls, tuples, sortorder=None, names=None): | ||
def from_tuples(cls, tuples, sortorder=None, names=None, dropna=True): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is the same as here
@@ -1269,7 +1269,7 @@ def _get_join_keys(llab, rlab, shape, sort): | |||
|
|||
def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, | |||
keys=None, levels=None, names=None, verify_integrity=False, | |||
copy=True): | |||
copy=True, dropna=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
import numpy as np | ||
|
||
|
||
a = np.array(['foo', 'foo', 'foo', 'bar', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls put tests with current tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this should not be here.
@jreback I realized that this issue has a lot overlap with Issue #3729 and PR #12607. |
Current coverage is 84.77% (diff: 100%)@@ master #14246 diff @@
==========================================
Files 145 145
Lines 51129 51154 +25
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43343 43368 +25
Misses 7786 7786
Partials 0 0
|
@jreback All tests pass now. I can move |
Sorry I was thinking wrong. Will figure out how to limit the use of |
73c42bc
to
2e679ea
Compare
can you rebase |
2e679ea
to
72f98e6
Compare
@jreback It seems that the appveyor exceeded time limit. Also I did a |
can you rebase and we can see where this is |
72f98e6
to
2e3f8e0
Compare
@jreback I made some major changes and now the changes are restrained to |
keys = index + columns | ||
|
||
if not dropna: | ||
key_data = np.array(data[keys], dtype='object') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the heck is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback converting NaN
values in keys to special strings to avoid the passing of dropna
around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not acceptable, we use masks if needed. converting things like that will just lead to future reader confusion and bugs.
@OXPHOS can you rebase / update |
I started over and now I'm half way through. I'll open a new pr once I cleaned up. |
git diff upstream/master | flake8 --diff