Add row filtering operator #5900

elyase · 2014-01-10T17:07:04Z

This would allow chaining operations like:

pd.read_csv('imdb.txt')
  .sort(columns='year')
  .filter(lambda x: x['year']>1990)   # <---this is missing in Pandas
  .to_csv('filtered.csv')

For current alternatives see:

http://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining

jreback · 2014-01-10T17:10:07Z

Does this not work?

df = pd.read_csv('imdb.txt').sort(columns='year')
df[df['year']>1990].to_csv('filtered.csv')

elyase · 2014-01-10T17:12:17Z

Sure that works, but the creation of the unnecessary intermediate variable df interrupts the functional flow that is so nice to have in pandas. Is there something I don't see against this addition?

jreback · 2014-01-10T17:21:54Z

there was a whole discussion about this in #2460 IIRC

The problem with using the filter function is that it filters an index (and is not what you are doing).

however, could potentially do something like this:

pd.read_csv('imdb.txt')
  .sort(columns='year')
  .[lambda x: x['year']>1990]
  .to_csv('filtered.csv')

or

pd.read_csv('imdb.txt')
  .sort(columns='year')
  .loc[lambda x: x['year']>1990]
  .to_csv('filtered.csv')

or could make filter first argument accept a callable and then use the axis keyword to module the resultant selector

so making __getitem__ and the indexers (iloc/loc/ix) accept a callable that returns a boolean indexer is not too hard

cpcloud · 2014-01-12T23:06:38Z

Couldn't you use query as well? IMO lambdas in loc et al is bit of feature
creep.

On Friday, January 10, 2014, jreback wrote:

there was a whole discussion about this in #2460 https://github.com/pydata/pandas/issues/2460IIRC

The problem with using the filter function is that it filters an index
(and is not what you are doing).

however, could potentially do something like this:

pd.read_csv('imdb.txt')
.sort(columns='year')
.[lambda x: x['year']>1990]
.to_csv('filtered.csv')

or

pd.read_csv('imdb.txt')
.sort(columns='year')
.loc[lambda x: x['year']>1990]
.to_csv('filtered.csv')

or could make filter first argument accept a callable and then use the
axis keyword to module the resultant selector

so making getitem and the indexers (iloc/loc/ix) accept a callable
that returns a boolean indexer is not too hard

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5900#issuecomment-32046948
.

Best,
Phillip Cloud

cpcloud · 2014-01-12T23:07:38Z

Hm nvm u would need the local

On Sunday, January 12, 2014, Phillip Cloud wrote:

Couldn't you use query as well? IMO lambdas in loc et al is bit of feature
creep.

On Friday, January 10, 2014, jreback wrote:

there was a whole discussion about this in #2460 https://github.com/pydata/pandas/issues/2460IIRC

The problem with using the filter function is that it filters an index
(and is not what you are doing).

however, could potentially do something like this:

pd.read_csv('imdb.txt')
.sort(columns='year')
.[lambda x: x['year']>1990]
.to_csv('filtered.csv')

or

pd.read_csv('imdb.txt')
.sort(columns='year')
.loc[lambda x: x['year']>1990]
.to_csv('filtered.csv')

or could make filter first argument accept a callable and then use the
axis keyword to module the resultant selector

so making getitem and the indexers (iloc/loc/ix) accept a callable
that returns a boolean indexer is not too hard

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5900#issuecomment-32046948
.

Best,
Phillip Cloud

Best,
Phillip Cloud

naught101 · 2015-09-16T03:09:14Z

Might it be possible with patsy to make a filter method that uses a formula string?

pd.read_csv('imdb.txt')
  .sort(columns='year')
  .filter('year >1990')
  .to_csv('filtered.csv')

shoyer · 2015-09-16T03:52:27Z

@naught101 Using strings to filter dataframes is already possible. The method is query, e.g.,
pd.DataFrame({'x': [1, 2, 3, 4, 5]}).query('x > 3')

jreback · 2015-09-16T21:17:47Z

I suppose .query could take a lambda to provide this in-line type of chaining

jreback · 2016-01-31T17:58:48Z

dupe of #11485 (which has more examples)

jreback modified the milestones: 0.15.0, 0.14.0 Feb 15, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 1, 2015

jreback added Difficulty Intermediate labels Jan 31, 2016

jreback modified the milestones: 0.18.0, Next Major Release Jan 31, 2016

jreback closed this as completed Jan 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add row filtering operator #5900

Add row filtering operator #5900

elyase commented Jan 10, 2014

jreback commented Jan 10, 2014

elyase commented Jan 10, 2014

jreback commented Jan 10, 2014

cpcloud commented Jan 12, 2014

cpcloud commented Jan 12, 2014

naught101 commented Sep 16, 2015

shoyer commented Sep 16, 2015

jreback commented Sep 16, 2015

jreback commented Jan 31, 2016

Add row filtering operator #5900

Add row filtering operator #5900

Comments

elyase commented Jan 10, 2014

jreback commented Jan 10, 2014

elyase commented Jan 10, 2014

jreback commented Jan 10, 2014

cpcloud commented Jan 12, 2014

cpcloud commented Jan 12, 2014

naught101 commented Sep 16, 2015

shoyer commented Sep 16, 2015

jreback commented Sep 16, 2015

jreback commented Jan 31, 2016