Skip to content

read_csv arguments: can we have skipcols and userows? #15799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pseth opened this issue Mar 24, 2017 · 4 comments
Closed

read_csv arguments: can we have skipcols and userows? #15799

pseth opened this issue Mar 24, 2017 · 4 comments
Labels
API Design Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv

Comments

@pseth
Copy link

pseth commented Mar 24, 2017

Is there a reason why read_csv has a usecols and skiprows as arguments, but not skipcols and userows? Is this to avoid parameter checks or something more fundamental than that?

It would be nice to have all four options to avoid clunky inversions of the type usecols = columns.remove(unwanted_col).

@jreback
Copy link
Contributor

jreback commented Mar 24, 2017

this is essentially a duplicate of the now closed: #10882

usecols accepts a callable, to allow arbitrary evaluation of which columns to use.

in a similar vein, skiprows accepts a callable as well.

The defaults make the most sense here, e.g. generally you want to keep columns (out of a larger set) and skip (a small subset of rows).

I am not anti the counter parts, but this is just one more keyword and added complexity.

@jreback jreback added API Design Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv labels Mar 24, 2017
@gfyoung
Copy link
Member

gfyoung commented Mar 27, 2017

Yep, I'm in agreement with @jreback here. Especially since we can accept callables for both inputs you can emulate skipcols and userows as follows:

skipcols = [...]
userows = [...]
read_csv(..., usecols=lambda x: x not in skipcols,
              skiprows=lambda x: x not in userows])

I think this should resolve your concern about "clunkiness" as you put it, so if there are no other concerns, I think this is safe to close.

@pseth
Copy link
Author

pseth commented Mar 27, 2017

@gfyoung Ah, I did not realise that was possible, the online documentation for read_csv doesn't seem to be up to date. That is indeed a more elegant solution over possibly conflicting arguments.

@pseth pseth closed this as completed Mar 27, 2017
@jreback
Copy link
Contributor

jreback commented Mar 27, 2017

This feature is in 0.20.0 which is not released yet, docs are in the dev-docs: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants