Skip to content

PERF: cythonize kendall correlation #39132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 20, 2021

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Jan 12, 2021

ASV's.

   before           after         ratio
 [a6bc6ecc]       [1eb86441]
 <perf-cythonize-kendall~2^2>       <perf-cythonize-kendall>
  •         81M            57.6M     0.71  stat_ops.Correlation.peakmem_corr_wide('kendall')
    
  •  1.41±0.02s          362±7ms     0.26  stat_ops.Correlation.time_corr_wide_nans('kendall')
    
  •  1.57±0.04s          398±4ms     0.25  stat_ops.Correlation.time_corr_wide('kendall')
    
  •    34.6±4ms      8.75±0.06ms     0.25  stat_ops.Correlation.time_corr('kendall')
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@lithomas1 lithomas1 closed this Jan 12, 2021
@lithomas1 lithomas1 reopened this Jan 12, 2021
@lithomas1 lithomas1 marked this pull request as ready for review January 20, 2021 03:45
@lithomas1 lithomas1 changed the title [WIP] PERF: cythonize kendall correlation PERF: cythonize kendall correlation Jan 20, 2021
@lithomas1 lithomas1 requested a review from jreback January 20, 2021 03:50
@lithomas1
Copy link
Member Author

@jreback this is now ready for review.
cc @jbrockmendel and @dsaxton(if you guys are interested).

@jreback jreback added the Performance Memory or execution speed performance label Jan 20, 2021
@jreback jreback added this to the 1.3 milestone Jan 20, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm cc @jbrockmendel if any comments

@lithomas1
Copy link
Member Author

A small slightly unrelated change that I made in this PR was in the docstring of the min_periods arg for corr. It used to state that min_periods was only valid for spearman and pearson correlations, however, I believe that it always worked for kendall correlation.


@cython.boundscheck(False)
@cython.wraparound(False)
def nancorr_kendall(ndarray[float64_t, ndim=2] mat, Py_ssize_t minp=1) -> ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually dont mix-and-match cython-vs-python style annotations. i know this is tough bc the python-style won't allow ndarray[float64_t, ndim=2]. not sure if there's a better alternative

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the function signature for this function is the same as the one for nancorr_spearman. However, I notice that other functions have left out the -> ndarray part(including nancorr_pearson). Should I remove that from this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but we are currently not checking the .pyx right so this is moot for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah its not a big deal

@jbrockmendel
Copy link
Member

couple nitpicks, generally looks good. i take it we have sufficient testing?

@lithomas1
Copy link
Member Author

@jbrockmendel I think testing coverage should be fine. There is a test that compares the results between scipy and our implementation. (see https://github.com/pandas-dev/pandas/blob/master/pandas/tests/frame/methods/test_cov_corr.py#L89)

@jreback jreback merged commit 57ccd2a into pandas-dev:master Jan 20, 2021
@jreback
Copy link
Contributor

jreback commented Jan 20, 2021

thanks @lithomas1 very nice!

nofarm3 pushed a commit to nofarm3/pandas that referenced this pull request Jan 21, 2021
@lithomas1 lithomas1 deleted the perf-cythonize-kendall branch January 23, 2021 20:26
zrait added a commit to zrait/pandas that referenced this pull request Sep 4, 2021
This reverts commit 57ccd2a.

The Kendall implementation failed to take into account ties
and was inconsistent with scipy's method
zrait added a commit to zrait/pandas that referenced this pull request Sep 4, 2021
This reverts commit 57ccd2a.

The Kendall implementation failed to take into account ties
and was inconsistent with scipy's method
zrait added a commit to zrait/pandas that referenced this pull request Sep 5, 2021
This reverts commit 57ccd2a.

The Kendall implementation failed to take into account ties
and was inconsistent with scipy's method
zrait added a commit to zrait/pandas that referenced this pull request Sep 6, 2021
This reverts commit 57ccd2a.

The Kendall implementation failed to take into account ties
and was inconsistent with scipy's method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.corr(method="kendall") calculation is slow
3 participants