-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: use .values in index difference #11279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c610191
to
d483846
Compare
@@ -1605,12 +1605,12 @@ def difference(self, other): | |||
self._assert_can_do_setop(other) | |||
|
|||
if self.equals(other): | |||
return Index([], name=self.name) | |||
return self._shallow_copy([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe nice to add a test for this one? (that it keeps the correct class)
there are quite a number of tests in |
d483846
to
b3fbdd5
Compare
OK cheers @jreback. At the moment I'm getting a number of failures similar to the one below - I think it's where this operates on ======================================================================
ERROR: test_stack_partial_multiIndex (pandas.tests.test_frame.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/tests/test_frame.py", line 13998, in test_stack_partial_multiIndex
_test_stack_with_multiindex(full_multiindex[multiindex_columns])
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/tests/test_frame.py", line 13969, in _test_stack_with_multiindex
result = df.stack(level=level, dropna=False)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 3745, in stack
return stack(self, level, dropna=dropna)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/reshape.py", line 481, in stack
return _stack_multi_columns(frame, level_num=level_num, dropna=dropna)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/reshape.py", line 648, in _stack_multi_columns
result = DataFrame(new_data, index=new_index, columns=new_columns)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 227, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 322, in _init_dict
data = dict((k, v) for k, v in compat.iteritems(data)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 323, in <genexpr>
if k in columns)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/index.py", line 1116, in __contains__
return key in self._engine
File "pandas/index.pyx", line 99, in pandas.index.IndexEngine.__contains__ (pandas/index.c:2749)
File "pandas/index.pyx", line 261, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:5304)
File "pandas/index.pyx", line 267, in pandas.index.IndexEngine.initialize (pandas/index.c:5408)
File "pandas/hashtable.pyx", line 703, in pandas.hashtable.PyObjectHashTable.map_locations (pandas/hashtable.c:12850)
ValueError: Does not understand character buffer dtype format string ('w') |
looks like something else is going on |
any progress? |
@jreback not yet - will look at it this weekend. Thanks for the ping |
b3fbdd5
to
224791a
Compare
@MaximilianR if you'd like to update would be gr8 |
I had a go at debugging this. But I'm struggling, since the errors happen on the Cython side - I need to get up to speed on how to debug those. |
@MaximilianR can you rebase / update |
224791a
to
19cc65d
Compare
I still get this error below. I'm really not sure how to debug the pyx files - although keen to learn. Any guidance? ======================================================================
ERROR: test_stack_partial_multiIndex (pandas.tests.test_frame.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/tests/test_frame.py", line 14305, in test_stack_partial_multiIndex
_test_stack_with_multiindex(full_multiindex[multiindex_columns])
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/tests/test_frame.py", line 14276, in _test_stack_with_multiindex
result = df.stack(level=level, dropna=False)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 3803, in stack
return stack(self, level, dropna=dropna)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/reshape.py", line 481, in stack
return _stack_multi_columns(frame, level_num=level_num, dropna=dropna)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/reshape.py", line 648, in _stack_multi_columns
result = DataFrame(new_data, index=new_index, columns=new_columns)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 226, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 323, in _init_dict
data = dict((k, v) for k, v in compat.iteritems(data)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/frame.py", line 324, in <genexpr>
if k in columns)
File "/Users/maximilianroos/Dropbox/workspace/pandas/pandas/core/index.py", line 1161, in __contains__
return key in self._engine
File "pandas/index.pyx", line 99, in pandas.index.IndexEngine.__contains__ (pandas/index.c:2749)
File "pandas/index.pyx", line 261, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:5304)
File "pandas/index.pyx", line 267, in pandas.index.IndexEngine.initialize (pandas/index.c:5408)
File "pandas/hashtable.pyx", line 703, in pandas.hashtable.PyObjectHashTable.map_locations (pandas/hashtable.c:12518)
ValueError: Does not understand character buffer dtype format string ('w')
---------------------------------------------------------------------- |
go up the stack when debugging. somehow the
|
OK thanks, I'll try that angle |
@MaximilianR pls reopen if you would like to update |
OK, I will aim to come back to this one at some point |
np. just trying to keep out outstanding PR's to minimum. |
The existing
.difference
method 'unboxed' all the objects, which has a severe performance impact onPeriodIndex
in particular....so around 2000x
I haven't worked with asv or the like - is this a case where a test like that is required?