Skip to content

BUG: if first row is short, read_csv raises exception instead of filling with NaN #4749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
foobarbecue opened this issue Sep 4, 2013 · 1 comment
Milestone

Comments

@foobarbecue
Copy link

Please see http://nbviewer.ipython.org/6443825 . Below is an unformatted version.

pandas' read_csv is supposed to handle short rows by filling them with NaN. It does so most of the time. However, if the first row is short and you specify header=None, then you get an error. This happens whether or not you specify column names.
In [20]:

from pandas import DataFrame, read_csv

This works:
In [21]:

mydata='1,2,3\n1,2\n1,2\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])

Out[21]:
one two three
0 1 2 3
1 1 2 NaN
2 1 2 NaN

But this doesn't:
In [22]:

mydata='1,2\n1,2,3\n4,5,6\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])


CParserError Traceback (most recent call last)
in ()
1 mydata='1,2\n1,2,3\n4,5,6\n'
----> 2 read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
397 buffer_lines=buffer_lines)
398
--> 399 return _read(filepath_or_buffer, kwds)
400
401 parser_f.name = name

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
206
207 # Create the parser.
--> 208 parser = TextFileReader(filepath_or_buffer, **kwds)
209
210 if nrows is not None:

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in init(self, f, engine, **kwds)
505 self.options['has_index_names'] = kwds['has_index_names']
506
--> 507 self._make_engine(self.engine)
508
509 def _get_options_with_defaults(self, engine):

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
607 def _make_engine(self, engine='c'):
608 if engine == 'c':
--> 609 self._engine = CParserWrapper(self.f, **self.options)
610 else:
611 if engine == 'python':

/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in init(self, src, *_kwds)
888 # #2442
889 kwds['allow_leading_cols'] = self.index_col is not False
--> 890 self._reader = _parser.TextReader(src, *_kwds)
891
892 # XXX

/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader.cinit (pandas/src/parser.c:3946)()

/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader._get_header (pandas/src/parser.c:5628)()

CParserError: Column names have 3 fields, data has 2 fields

@foobarbecue
Copy link
Author

Nevermind. This was fixed at some point between 0.10.1-1ubuntu1 and f373864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant