You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pandas' read_csv is supposed to handle short rows by filling them with NaN. It does so most of the time. However, if the first row is short and you specify header=None, then you get an error. This happens whether or not you specify column names.
In [20]:
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
206
207 # Create the parser.
--> 208 parser = TextFileReader(filepath_or_buffer, **kwds)
209
210 if nrows is not None:
Please see http://nbviewer.ipython.org/6443825 . Below is an unformatted version.
pandas' read_csv is supposed to handle short rows by filling them with NaN. It does so most of the time. However, if the first row is short and you specify header=None, then you get an error. This happens whether or not you specify column names.
In [20]:
from pandas import DataFrame, read_csv
This works:
In [21]:
mydata='1,2,3\n1,2\n1,2\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])
Out[21]:
one two three
0 1 2 3
1 1 2 NaN
2 1 2 NaN
But this doesn't:
In [22]:
mydata='1,2\n1,2,3\n4,5,6\n'
read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])
CParserError Traceback (most recent call last)
in ()
1 mydata='1,2\n1,2,3\n4,5,6\n'
----> 2 read_csv(StringIO.StringIO(mydata),header=None,names=['one','two','three'])
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
397 buffer_lines=buffer_lines)
398
--> 399 return _read(filepath_or_buffer, kwds)
400
401 parser_f.name = name
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
206
207 # Create the parser.
--> 208 parser = TextFileReader(filepath_or_buffer, **kwds)
209
210 if nrows is not None:
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in init(self, f, engine, **kwds)
505 self.options['has_index_names'] = kwds['has_index_names']
506
--> 507 self._make_engine(self.engine)
508
509 def _get_options_with_defaults(self, engine):
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
607 def _make_engine(self, engine='c'):
608 if engine == 'c':
--> 609 self._engine = CParserWrapper(self.f, **self.options)
610 else:
611 if engine == 'python':
/usr/lib/python2.7/dist-packages/pandas/io/parsers.pyc in init(self, src, *_kwds)
888 # #2442
889 kwds['allow_leading_cols'] = self.index_col is not False
--> 890 self._reader = _parser.TextReader(src, *_kwds)
891
892 # XXX
/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader.cinit (pandas/src/parser.c:3946)()
/usr/lib/python2.7/dist-packages/pandas/_parser.so in pandas._parser.TextReader._get_header (pandas/src/parser.c:5628)()
CParserError: Column names have 3 fields, data has 2 fields
The text was updated successfully, but these errors were encountered: