-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
WIP/API: add magic 'X' for selection #14209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm neutral to negative on this type of solution until we do a more thorough analysis of what kind of deferred expression API we might want in pandas. I feel like it might want to wait for pandas 2.0 to have time to incubate and see some hardening through use. |
Current coverage is 85.21% (diff: 72.72%)@@ master #14209 diff @@
==========================================
Files 140 141 +1
Lines 50563 50684 +121
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43103 43191 +88
- Misses 7460 7493 +33
Partials 0 0
|
That's reasonable, and I'm not sure myself it should be in pandas. That said, a couple points. First, if this were included, I'd definitely mark it opt-in, experimental, etc. Second, I don't see this as a delayed API solution, just a smoothover for a particular, existing use. The deferred pieces already exist in pandas, this just provides an arguably nicer way to express it. e.g., you can already do:
Where the
|
What's the intended relationship between |
Technically this a bit more flexible because it could handle column names that aren't valid python names (e.g. |
I'm going to close this for now - I still do think something like the |
git diff upstream/master | flake8 --diff
This is very WIP, but wanted to put it up and show the general direction. This adds essentially a modified version of
pandas_ply
that produces plain callables that can be passed to the existing[]
/assign
methods. Short demo below.One thing that's tricky is figuring out when an expression is "complete."
pandas_ply
anddplython
don't have to do this because they use a special method to instantiate the selection, but I'd prefer not to do this if possible, so this doesn't touch any pandas internals. There's one example below (X.c.str.upper()
) that shows where the current heuristic is failing.cc @shoyer, @jreback, @joshuahhh @dodger487, welcome any thoughts