-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Feature Request: skipcols in .read_csv #10882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So read_csv() is defined by calling _make_parser_function() which calls _read(). Any instructions would be appreciated. It's a bit confusing to me. @jreback |
parser is a bit complicated. see how |
It looks like the code related to |
Is there a spot in the code where when know all the columns before starting to parse the rows? If so you can assign |
Yeah I did something similar but stopped due to other code related to
|
Any progress on this addition? Thanks. |
ping @jreback |
if you submit a PR there will be progress |
many thanks |
In a similar vain, is there a way to read in a subset of rows? In other words, is there a counterpart to the df = pd.read_csv("bigdata.csv")
df
# Output: Millions of rows
selection = [i for i in range(0, 1000000) if i % 2 == 0]
subset = pd.read_csv("bigdata.csv", use_rows=selection) # skip all rows except those listed
subset
# Output: only even rows for the first million |
@pylang : We now accept |
sure, maybe an example of doing that in io.rst would be helpful? |
@gfyoung I'm not sure what you have in mind. I am interested in selecting rows. An example would be helpful, thank you. |
@pylang :
>>> data = 'a,b,c\n1,2,3\n2,3,4'
>>> read_csv(StringIO(data), skiprows=lambda x: x%2 == 0, engine='python')
a b c
2 3 4 where |
@jreback : There are examples in the docs to illustrate |
yes that's what i mean, to show using s callable to skipcols |
@gfyoung I think your example for |
Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.
Illustrate how we can use the "usecols" argument to skip particular columns. Closes pandas-devgh-10882.
Illustrate how we can use the "usecols" argument to skip particular columns. Closes gh-10882.
Title is self-explanatory. xref #10882. Author: gfyoung <[email protected]> Closes #15059 from gfyoung/skiprows-callable and squashes the following commits: d15e3a3 [gfyoung] ENH: Accept callable for skiprows
Title is self-explanatory. xref pandas-dev#10882. Author: gfyoung <[email protected]> Closes pandas-dev#15059 from gfyoung/skiprows-callable and squashes the following commits: d15e3a3 [gfyoung] ENH: Accept callable for skiprows
I'd like to read a set of csv files but exclude specific columns.
read_csv
currently has ausecols
keyword, but it requires writing a list of all the columns present. This is a bit tedious and more importantly, not all files have the same columns, sousecols
would not work in general cases, whereas a complimentary function would work. Can askipcols
keyword be added to 0.17 that accepts a list of column names and reads all but those columns into a DataFrame? Thanks.xref #4749
xref #8985
xref #6710
The text was updated successfully, but these errors were encountered: