-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: cythonize kendall correlation #39132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jreback this is now ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm cc @jbrockmendel if any comments
A small slightly unrelated change that I made in this PR was in the docstring of the min_periods arg for corr. It used to state that min_periods was only valid for spearman and pearson correlations, however, I believe that it always worked for kendall correlation. |
|
||
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
def nancorr_kendall(ndarray[float64_t, ndim=2] mat, Py_ssize_t minp=1) -> ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually dont mix-and-match cython-vs-python style annotations. i know this is tough bc the python-style won't allow ndarray[float64_t, ndim=2]
. not sure if there's a better alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the function signature for this function is the same as the one for nancorr_spearman. However, I notice that other functions have left out the -> ndarray
part(including nancorr_pearson). Should I remove that from this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but we are currently not checking the .pyx right so this is moot for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah its not a big deal
couple nitpicks, generally looks good. i take it we have sufficient testing? |
@jbrockmendel I think testing coverage should be fine. There is a test that compares the results between scipy and our implementation. (see https://github.com/pandas-dev/pandas/blob/master/pandas/tests/frame/methods/test_cov_corr.py#L89) |
thanks @lithomas1 very nice! |
This reverts commit 57ccd2a. The Kendall implementation failed to take into account ties and was inconsistent with scipy's method
This reverts commit 57ccd2a. The Kendall implementation failed to take into account ties and was inconsistent with scipy's method
This reverts commit 57ccd2a. The Kendall implementation failed to take into account ties and was inconsistent with scipy's method
This reverts commit 57ccd2a. The Kendall implementation failed to take into account ties and was inconsistent with scipy's method
ASV's.
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.