-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: port the magic X from pandas_ply/dplython to pandas proper? #13133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is also this: https://github.com/dodger487/dplython |
@datnamer Thanks -- I had a feeling I was missing something! I updated my post to include discussion of dplython as well. |
I have mixed thoughts. On the one hand, I agree that having to put But the I like using (Thanks for asking!) |
To add onto @joshuahhh's comment calling On the other hand, I've found that I don't need to often apply functions to Overall, I agree there are some difficulties but I'm optimistic about Thanks for including me on the thread! |
@dodger487 @joshuahhh thanks for sharing your thoughts! I think pandas supports method chaining enough that the inability to use arbitrary functions is OK. It occurs to me that dask.delayed contains yet another implementation of deferred evaluation that might be a useful reference. |
Very intriguing. I've tested out dplython and pandas-ply based on this issue and they both look very interesting. It looks like both use X for their own functions, but it can't be used elsewhere; i.e.:
doesn't actually work with either implementation as it is. Your proposal sounds like it would allow its use there, and elsewhere; for example, I'm assuming instead of (please forgive the obviously highly contrived example):
I could instead do:
? While pandas is never going to have some of the sheer convenience of the R syntax for these types of things, that brings it a lot closer from what I can see. |
Despite these limitations, I still think @jreback @jorisvandenbossche @TomAugspurger any opinions? |
I think exposing |
Guys, didn't saw this issue. I think i done something very similar to X magic, see #18077. |
Discussed on today's dev call and the consensus is we don't want to add to the API. Closing. |
Many DataFrame methods (now including
__getitem__
) accept callables that take the DataFrame as input, e..g,df[lambda x: x.sepal_length > 3]
.However, this is annoyingly verbose. I recently suggested (#13040) enabling argument-free lambdas like
df[lambda: sepal_length > 3]
, but this isn't a viable solution (too much magic!) because it's impossible to implement with Python's standard scoping rules.pandas-ply and dplython provide an alternative approach, based on a magic
X
operator, e.g.,pandas-ply also introduces (injects onto pandas.DataFrame) two new dataframe methods
ply_select
andply_where
that accept these symbolic expression build fromX
. dplython takes a different approach, introducing it's own dplyr like API for chaining expressions instead of using method chaining. The pandas-ply approach is much closer to what makes sense for pandas proper, given that we already support method chaining.I think we should consider introducing an object like
X
into pandas proper and supporting its use on all pandas methods that accept callables that take the DataFrame as input.I don't think we need to port
ply_select
andply_where
, because support for expressions inDataFrame.assign
and indexing is a good substitute.So my proposed syntax (after
from pandas import X
) looks like the following:Indexing is a little uglier than using the
ply_where
method, but otherwise this is a nice improvement.Best of all, we don't need do any special tricks to introduce new scopes -- we simply define
X.__getattr__
to looking attributes as columns in the DataFrame context. I expect we could even reuse the expression engines from pandas-ply or dplython directly, perhaps with a few modifications.In my mind, this would mostly obviate the need for pandas-ply, though the alternate API provided by dpython would still be independently useful. In an ideal world, our
X
implementation in pandas would be something that could be reused by dplython.cc @joshuahhh @dodger487
The text was updated successfully, but these errors were encountered: