-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Faster merge_asof() performs a single pass when joining tables (#13902) #13903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lint failed with
I will fix this in a subsequent update. |
Current coverage is 85.29% (diff: 97.72%)@@ master #13903 diff @@
==========================================
Files 139 139
Lines 50143 50154 +11
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42776 42780 +4
- Misses 7367 7374 +7
Partials 0 0
|
First attempt at Tempita. More work needed still -- not ready for merge yet. |
While Travis is running, here are the latest benchmarks:
This can handle |
it's fine to limit this for now |
@jreback Alright, I've made it explicit now, both in the documentation and in the validation. |
# choose appropriate function by type | ||
on_dtype = left_values.dtype | ||
by_dtype = left_by_values.dtype | ||
if is_integer_dtype(by_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a more generic way of getting these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see core/algorithms.py, basically put tuple of the args in a dict then get it
e.g.
_asof_funcs = { ('int64', 'double') : .....,
('int64', 'int64'): ....}
just merged #13925 pls rebase and can merge this |
…13902) Uses Tempita for different specializations of "by" and "on" types. These parameters can only take a single label now.
@jreback Everything has been rebased and pushed. Thanks! |
thanks @chrisaycock very nice! 2 things for future:
|
This version passes existing regression tests but is ultimately wrong
because it requires the "by" column to be a single object. A proper version
would handle int (and possily float) columns through type differentiation.