ENH: Rolling rank #43338

gsiano · 2021-08-31T23:10:54Z

xref #9481

closes Implement high performance rolling_rank #9481
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

rolling rank and percentile rank using skiplist

…rank

remove unused variables, fix indentation, add comment to roll_rank()

don't fill output when nobs < minp

pep8speaks · 2021-08-31T23:10:57Z

Hello @gsiano! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-13 18:29:13 UTC

address lint warnings

jreback

can you add this to the asvs as well.

pandas/tests/window/test_rolling.py

jreback · 2021-08-31T23:51:02Z

cc @mroeschke

jreback · 2021-08-31T23:51:39Z

will this close #9481 ? (can pull some examples fromt here)

jreback · 2021-08-31T23:52:35Z

as a followup (can create an issue), need tests / impl for groupby.rolling.rank (it should just work though)

mzeitlin11

Thanks for the pr @gsiano! Some comments on the cython code

pandas/_libs/window/aggregations.pyi

pandas/_libs/window/aggregations.pyx

raise MemoryError on skiplist_insert failure, destroy skiplist before re-init, and other minor changes

pandas/core/window/rolling.py

mroeschke · 2021-09-01T05:33:57Z

What happens currently when there are ties?

DataFrame.rank currently has a method argument to control that behavior https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas.DataFrame.rank

implement min, max, and average rank methods

gsiano · 2021-09-01T16:30:39Z

What happens currently when there are ties?

DataFrame.rank currently has a method argument to control that behavior https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas.DataFrame.rank

I added the min, max, and average rank methods. I still have to add the dense and first methods, and I'll look into adding the other flags from DataFrame.rank

gsiano · 2021-09-01T16:41:39Z

I added the min, max, and average rank methods. I still have to add the dense and first methods, and I'll look into adding the other flags from DataFrame.rank

the first method doesn't seem relevant for rolling rank, and dense probably deserves a separate pr

mzeitlin11 · 2021-09-01T17:56:57Z

I added the min, max, and average rank methods. I still have to add the dense and first methods, and I'll look into adding the other flags from DataFrame.rank

the first method doesn't seem relevant for rolling rank, and dense probably deserves a separate pr

+1 on pushing dense to a separate pr. ascending would be a nice flag to have ... I wonder if something like just adding the negative of the value to the skiplist would handle flipping the order

gsiano · 2021-09-01T21:51:39Z

+1 on pushing dense to a separate pr. ascending would be a nice flag to have ... I wonder if something like just adding the negative of the value to the skiplist would handle flipping the order

That sounds like it will work. I'll try adding it.

added the `ascending` flag, various cleanups, expanded tests and asv benchmark

remove unimplemented rank types

pandas/core/window/rolling.py

reorder parameter list to match DataFrame.rank

jreback

looks good, can you add

whatsnew note (1.4 enhacements), wouldn't object to a small example if you'd like (or a single note ok)
add in the reference section: https://github.com/pandas-dev/pandas/blob/master/doc/source/reference/window.rst somewhere
this is just for rolling, or works for expanding? (i think it should, can you add tests)
i think we need a doc-string in rolling.py

pandas/_libs/window/aggregations.pyx

added tests for `Expanding`. added doc strings and whatsnew note

gsiano · 2021-09-04T15:00:22Z

looks good, can you add

whatsnew note (1.4 enhacements), wouldn't object to a small example if you'd like (or a single note ok)

add in the reference section: https://github.com/pandas-dev/pandas/blob/master/doc/source/reference/window.rst somewhere

this is just for rolling, or works for expanding? (i think it should, can you add tests)

i think we need a doc-string in rolling.py

I added a test for expanding and updated the docs and whatsnew file. Is there a way to render the docs locally so I can see how it looks?

mroeschke · 2021-09-04T17:34:58Z

Is there a way to render the docs locally so I can see how it looks?

https://pandas.pydata.org/pandas-docs/stable/development/contributing_documentation.html#how-to-build-the-pandas-documentation

mroeschke · 2021-09-04T17:37:28Z

pandas/tests/window/test_expanding.py

+@pytest.mark.parametrize("ascending", [True, False])
+@pytest.mark.parametrize("test_data", ["default", "duplicates", "nans"])
+def test_rank(window, method, pct, ascending, test_data):
+    length = 1000


Nit: Could we test with a smaller sample (e.g. 20) and adjust window accordingly?

mroeschke · 2021-09-04T17:37:34Z

pandas/tests/window/test_rolling.py

+@pytest.mark.parametrize("ascending", [True, False])
+@pytest.mark.parametrize("test_data", ["default", "duplicates", "nans"])
+def test_rank(window, method, pct, ascending, test_data):
+    length = 1000


mzeitlin11

Small comments, otherwise LGTM!

mzeitlin11 · 2021-09-04T19:59:30Z

pandas/_libs/window/aggregations.pyx

+        int64_t nobs = 0, win
+        float64_t val
+        skiplist_t *skiplist
+        float64_t[::1] output = None


Suggested change

float64_t[::1] output = None

float64_t[::1] output

NBD since doesn't affect correctness, but I find this clearer since None initialization usually used only when there's a path where the variable might not end up initialized. Also generates a bit less code :)

mzeitlin11 · 2021-09-04T20:00:08Z

pandas/_libs/window/aggregations.pyx

+                    else:
+                        rank = NaN
+            if nobs >= minp:
+                output[i] = <float64_t>(rank) / nobs if percentile else rank


Is the cast here necessary?

pandas/_libs/window/aggregations.pyx

mzeitlin11 · 2021-09-04T20:02:24Z

pandas/core/window/expanding.py

+    )
+    def rank(
+        self,
+        method: str = "average",


To be consistent this could be the same Literal you defined in aggregations.pyi? Maybe that could be aliased in pandas/_typing.py, so that then if we do something like add dense then we only need to change typing in one place.

yeah these should be consisteent across the rank methods (could be a followup to fix this)

mzeitlin11 · 2021-09-04T20:04:27Z

pandas/tests/window/test_expanding.py

+    elif test_data == "duplicates":
+        ser = Series(data=np.random.choice(3, length))
+    elif test_data == "nans":
+        ser = Series(data=np.random.choice([1.0, 0.25, 0.75, np.nan], length))


Can you throw an inf and -inf in here? Not really a special case I guess with how it's implemented in rolling_rank, but it is special cased in other rank algos and were some issues before with inf and nan, so covering here would be good

mzeitlin11 · 2021-09-04T20:04:38Z

pandas/tests/window/test_rolling.py

+    elif test_data == "duplicates":
+        ser = Series(data=np.random.choice(3, length))
+    elif test_data == "nans":
+        ser = Series(data=np.random.choice([1.0, 0.25, 0.75, np.nan], length))


Same comment as above about inf

jreback · 2021-09-06T21:31:14Z

doc/source/whatsnew/v1.4.0.rst

+
+.. ipython:: python
+
+    >>> s = pd.Series([1, 4, 2, 3, 5, 3])


just put the commands; they will render during the doc-build (e.g. L100 w/o the '>>>' and not below)

jreback · 2021-09-06T21:31:49Z

doc/source/whatsnew/v1.4.0.rst

+    5    2.0
+    dtype: float64
+
+    >>> s.expanding().rank()


you could add a comment; this works for expanding as well (or kill the expanding, up 2 you)

pandas/_libs/src/skiplist.h

jreback · 2021-09-06T21:34:59Z

pandas/_libs/window/aggregations.pyx

@@ -1139,6 +1141,120 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start,
    return output


+cdef enum RankType:


can you use the enums defined in pandas/_libs/algos.pyx? e.g. the TIEBREAK_AVERAGE. may need to move the to algox.pyd to import properly.

jreback · 2021-09-06T21:36:06Z

pandas/core/window/expanding.py

+    )
+    def rank(
+        self,
+        method: str = "average",


yeah these should be consisteent across the rank methods (could be a followup to fix this)

jreback · 2021-09-06T21:36:14Z

pandas/core/window/rolling.py

@@ -1409,6 +1409,22 @@ def quantile(self, quantile: float, interpolation: str = "linear", **kwargs):

        return self._apply(window_func, name="quantile", **kwargs)

+    def rank(
+        self,
+        method: str = "average",


same comment here

addressing comments

add rolling rank tiebreakers dict

fix docs warnings - remove '>>>' from code block

jreback

lgtm. small comment to consider for followup.

cc @mzeitlin11 over to you

jreback · 2021-09-14T13:05:37Z

pandas/_libs/window/aggregations.pyx

@@ -1139,6 +1143,122 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start,
    return output


+rolling_rank_tiebreakers = {


possible to unify these with the same in algos.pyx?

jreback · 2021-09-14T13:07:52Z

oh see you approved, ok then.

jreback · 2021-09-14T13:08:26Z

thanks @gsiano very nice! pls create an issue for for the requested followups (PR as well if you can .....) keep em coming!

gsiano added 5 commits August 31, 2021 18:47

ENH: rolling rank

3ebf8c0

rolling rank and percentile rank using skiplist

ENH: rolling rank

ce754f7

rolling rank and percentile rank using skiplist

Merge branch 'rolling_rank' of github.com:gsiano/pandas into rolling_…

f13a720

…rank

ENH: rolling rank

874c980

remove unused variables, fix indentation, add comment to roll_rank()

ENH: rolling rank

4d06ba3

don't fill output when nobs < minp

ENH: rolling rank

1308208

address lint warnings

jreback requested changes Aug 31, 2021

View reviewed changes

pandas/tests/window/test_rolling.py Outdated Show resolved Hide resolved

jreback added Enhancement Window rolling, ewma, expanding labels Aug 31, 2021

jreback changed the title ~~Rolling rank~~ ENH: Rolling rank Aug 31, 2021

mzeitlin11 reviewed Sep 1, 2021

View reviewed changes

ENH: rolling rank

4caa51b

raise MemoryError on skiplist_insert failure, destroy skiplist before re-init, and other minor changes

gsiano mentioned this pull request Sep 1, 2021

BUG: skiplist memory leak in rolling functions #43339

Closed

3 tasks

mroeschke reviewed Sep 1, 2021

View reviewed changes

pandas/core/window/rolling.py Outdated Show resolved Hide resolved

ENH: rolling rank - rank methods

f2ee5b2

implement min, max, and average rank methods

gsiano added 2 commits September 1, 2021 18:09

ENH: rolling rank - ascending flag

b135f1e

added the `ascending` flag, various cleanups, expanded tests and asv benchmark

ENH: rolling rank

fda85b4

remove unimplemented rank types

mroeschke reviewed Sep 2, 2021

View reviewed changes

pandas/core/window/rolling.py Outdated Show resolved Hide resolved

gsiano added 2 commits September 1, 2021 20:30

ENH: rolling rank - reorder parameter list

e692ce3

reorder parameter list to match DataFrame.rank

ENH: rolling rank - address pre-commit errors

6b23fc0

jreback added this to the 1.4 milestone Sep 2, 2021

jreback requested changes Sep 2, 2021

View reviewed changes

pandas/_libs/window/aggregations.pyx Outdated Show resolved Hide resolved

ENH: rolling rank

5f7d319

added tests for `Expanding`. added doc strings and whatsnew note

ENH: rolling rank - fix pre-commit errors

63d37c5

mroeschke reviewed Sep 4, 2021

View reviewed changes

mzeitlin11 approved these changes Sep 4, 2021

View reviewed changes

jreback requested changes Sep 6, 2021

View reviewed changes

gsiano added 4 commits September 12, 2021 22:34

ENH: rolling rank

ba468c6

addressing comments

Merge branch 'master' into rolling_rank

e078119

ENH: rolling rank

bb7005f

add rolling rank tiebreakers dict

ENH: rolling rank

1470c7b

fix docs warnings - remove '>>>' from code block

jreback approved these changes Sep 14, 2021

View reviewed changes

jreback merged commit 9f90bd4 into pandas-dev:master Sep 14, 2021

gsiano mentioned this pull request Sep 15, 2021

ENH: rolling rank followups #43579

Open

4 tasks

twoertwein mentioned this pull request Nov 15, 2021

ENH: Rolling mode #36861

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Rolling rank #43338

ENH: Rolling rank #43338

gsiano commented Aug 31, 2021 •

edited by mroeschke

Loading

pep8speaks commented Aug 31, 2021 •

edited

Loading

jreback left a comment

jreback commented Aug 31, 2021

jreback commented Aug 31, 2021

jreback commented Aug 31, 2021

mzeitlin11 left a comment

mroeschke commented Sep 1, 2021

gsiano commented Sep 1, 2021 •

edited

Loading

gsiano commented Sep 1, 2021

mzeitlin11 commented Sep 1, 2021

gsiano commented Sep 1, 2021

jreback left a comment

gsiano commented Sep 4, 2021

mroeschke commented Sep 4, 2021

mroeschke Sep 4, 2021

mroeschke Sep 4, 2021

mzeitlin11 left a comment

mzeitlin11 Sep 4, 2021

mzeitlin11 Sep 4, 2021

mzeitlin11 Sep 4, 2021

jreback Sep 6, 2021

mzeitlin11 Sep 4, 2021

mzeitlin11 Sep 4, 2021

jreback Sep 6, 2021

jreback Sep 6, 2021

jreback Sep 6, 2021

jreback Sep 6, 2021

jreback Sep 6, 2021

jreback left a comment

jreback Sep 14, 2021

jreback commented Sep 14, 2021

jreback commented Sep 14, 2021

		@@ -1139,6 +1141,120 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start,
		return output


		cdef enum RankType:

		@@ -1139,6 +1143,122 @@ def roll_quantile(const float64_t[:] values, ndarray[int64_t] start,
		return output


		rolling_rank_tiebreakers = {

ENH: Rolling rank #43338

ENH: Rolling rank #43338

Conversation

gsiano commented Aug 31, 2021 • edited by mroeschke Loading

pep8speaks commented Aug 31, 2021 • edited Loading

Comment last updated at 2021-09-13 18:29:13 UTC

jreback left a comment

Choose a reason for hiding this comment

jreback commented Aug 31, 2021

jreback commented Aug 31, 2021

jreback commented Aug 31, 2021

mzeitlin11 left a comment

Choose a reason for hiding this comment

mroeschke commented Sep 1, 2021

gsiano commented Sep 1, 2021 • edited Loading

gsiano commented Sep 1, 2021

mzeitlin11 commented Sep 1, 2021

gsiano commented Sep 1, 2021

jreback left a comment

Choose a reason for hiding this comment

gsiano commented Sep 4, 2021

mroeschke commented Sep 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzeitlin11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 14, 2021

jreback commented Sep 14, 2021

gsiano commented Aug 31, 2021 •

edited by mroeschke

Loading

pep8speaks commented Aug 31, 2021 •

edited

Loading

gsiano commented Sep 1, 2021 •

edited

Loading