-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Rolling rank #43338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Rolling rank #43338
Conversation
rolling rank and percentile rank using skiplist
rolling rank and percentile rank using skiplist
remove unused variables, fix indentation, add comment to roll_rank()
don't fill output when nobs < minp
address lint warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this to the asvs as well.
cc @mroeschke |
will this close #9481 ? (can pull some examples fromt here) |
as a followup (can create an issue), need tests / impl for groupby.rolling.rank (it should just work though) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr @gsiano! Some comments on the cython code
raise MemoryError on skiplist_insert failure, destroy skiplist before re-init, and other minor changes
What happens currently when there are ties?
|
implement min, max, and average rank methods
I added the |
the |
+1 on pushing |
That sounds like it will work. I'll try adding it. |
added the `ascending` flag, various cleanups, expanded tests and asv benchmark
remove unimplemented rank types
reorder parameter list to match DataFrame.rank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, can you add
- whatsnew note (1.4 enhacements), wouldn't object to a small example if you'd like (or a single note ok)
- add in the reference section: https://github.com/pandas-dev/pandas/blob/master/doc/source/reference/window.rst somewhere
- this is just for rolling, or works for expanding? (i think it should, can you add tests)
- i think we need a doc-string in rolling.py
added tests for `Expanding`. added doc strings and whatsnew note
I added a test for expanding and updated the docs and whatsnew file. Is there a way to render the docs locally so I can see how it looks? |
|
@pytest.mark.parametrize("ascending", [True, False]) | ||
@pytest.mark.parametrize("test_data", ["default", "duplicates", "nans"]) | ||
def test_rank(window, method, pct, ascending, test_data): | ||
length = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we test with a smaller sample (e.g. 20) and adjust window
accordingly?
pandas/tests/window/test_rolling.py
Outdated
@pytest.mark.parametrize("ascending", [True, False]) | ||
@pytest.mark.parametrize("test_data", ["default", "duplicates", "nans"]) | ||
def test_rank(window, method, pct, ascending, test_data): | ||
length = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, otherwise LGTM!
pandas/_libs/window/aggregations.pyx
Outdated
int64_t nobs = 0, win | ||
float64_t val | ||
skiplist_t *skiplist | ||
float64_t[::1] output = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float64_t[::1] output = None | |
float64_t[::1] output |
NBD since doesn't affect correctness, but I find this clearer since None
initialization usually used only when there's a path where the variable might not end up initialized. Also generates a bit less code :)
pandas/_libs/window/aggregations.pyx
Outdated
else: | ||
rank = NaN | ||
if nobs >= minp: | ||
output[i] = <float64_t>(rank) / nobs if percentile else rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cast here necessary?
pandas/core/window/expanding.py
Outdated
) | ||
def rank( | ||
self, | ||
method: str = "average", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent this could be the same Literal
you defined in aggregations.pyi
? Maybe that could be aliased in pandas/_typing.py
, so that then if we do something like add dense
then we only need to change typing in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah these should be consisteent across the rank methods (could be a followup to fix this)
elif test_data == "duplicates": | ||
ser = Series(data=np.random.choice(3, length)) | ||
elif test_data == "nans": | ||
ser = Series(data=np.random.choice([1.0, 0.25, 0.75, np.nan], length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you throw an inf
and -inf
in here? Not really a special case I guess with how it's implemented in rolling_rank
, but it is special cased in other rank algos and were some issues before with inf
and nan
, so covering here would be good
pandas/tests/window/test_rolling.py
Outdated
elif test_data == "duplicates": | ||
ser = Series(data=np.random.choice(3, length)) | ||
elif test_data == "nans": | ||
ser = Series(data=np.random.choice([1.0, 0.25, 0.75, np.nan], length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about inf
doc/source/whatsnew/v1.4.0.rst
Outdated
|
||
.. ipython:: python | ||
|
||
>>> s = pd.Series([1, 4, 2, 3, 5, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just put the commands; they will render during the doc-build (e.g. L100 w/o the '>>>' and not below)
doc/source/whatsnew/v1.4.0.rst
Outdated
5 2.0 | ||
dtype: float64 | ||
|
||
>>> s.expanding().rank() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add a comment; this works for expanding as well (or kill the expanding, up 2 you)
pandas/_libs/window/aggregations.pyx
Outdated
@@ -1139,6 +1141,120 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start, | |||
return output | |||
|
|||
|
|||
cdef enum RankType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the enums defined in pandas/_libs/algos.pyx? e.g. the TIEBREAK_AVERAGE. may need to move the to algox.pyd to import properly.
pandas/core/window/expanding.py
Outdated
) | ||
def rank( | ||
self, | ||
method: str = "average", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah these should be consisteent across the rank methods (could be a followup to fix this)
pandas/core/window/rolling.py
Outdated
@@ -1409,6 +1409,22 @@ def quantile(self, quantile: float, interpolation: str = "linear", **kwargs): | |||
|
|||
return self._apply(window_func, name="quantile", **kwargs) | |||
|
|||
def rank( | |||
self, | |||
method: str = "average", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
addressing comments
add rolling rank tiebreakers dict
fix docs warnings - remove '>>>' from code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small comment to consider for followup.
cc @mzeitlin11 over to you
@@ -1139,6 +1143,122 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start, | |||
return output | |||
|
|||
|
|||
rolling_rank_tiebreakers = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible to unify these with the same in algos.pyx?
oh see you approved, ok then. |
thanks @gsiano very nice! pls create an issue for for the requested followups (PR as well if you can .....) keep em coming! |
xref #9481