PERF: pandas 0.15.2 multi-indexed DataFrame sum #9049
Labels
Numeric Operations
Arithmetic, Comparison, and Logical operations
Performance
Memory or execution speed performance
Milestone
Problem:
data.sum(level=...) for multi-index table produce different result (lots of NAs) than groupby
in certain cases. It's also much slower than groupby. Seems that the new version
produced a cross join of the keys and produce NAs for pair of keys with no data, which makes
the result bigger and significantly slower.
data.groupby(level=...).sum(). This happens in the following example:
Code:
-------------- pandas version: 0.15.1.dev
CPU times: user 876 ms, sys: 17 ms, total: 893 ms
Wall time: 894 ms
(392040, 1)
CPU times: user 109 ms, sys: 0 ns, total: 109 ms
Wall time: 108 ms
(198000, 1)
-------------- pandas version: 0.14.1
CPU times: user 94 ms, sys: 0 ns, total: 94 ms
Wall time: 94.2 ms
(198000, 1)
CPU times: user 120 ms, sys: 0 ns, total: 120 ms
Wall time: 120 ms (198000, 1)
The text was updated successfully, but these errors were encountered: