Skip to content

Commit 8ef5aa5

Browse files
committed
BUG: Improve error message for skipfooter malformed rows in Python engine
Python's native CSV library does not respect the skipfooter parameter, so if one of those skipped rows is malformed, it will still raise an error. Closes pandas-devgh-13879.
1 parent 58731c4 commit 8ef5aa5

File tree

3 files changed

+30
-7
lines changed

3 files changed

+30
-7
lines changed

doc/source/whatsnew/v0.19.2.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Bug Fixes
3131
- Allow ``nanoseconds`` in ``Timestamp.replace`` as a kwarg (:issue:`14621`)
3232
- Bug in ``pd.read_csv`` where reading files fails, if the number of headers is equal to the number of lines in the file (:issue:`14515`)
3333
- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when multi-char delimiters were not being respected with quotes (:issue:`14582`)
34+
- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when ``skipfooter`` was not being respected by Python's CSV library (:issue:`13879`)
3435

3536

3637

pandas/io/parsers.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2327,14 +2327,21 @@ def _next_line(self):
23272327
try:
23282328
orig_line = next(self.data)
23292329
except csv.Error as e:
2330+
msg = str(e)
2331+
23302332
if 'NULL byte' in str(e):
2331-
raise csv.Error(
2332-
'NULL byte detected. This byte '
2333-
'cannot be processed in Python\'s '
2334-
'native csv library at the moment, '
2335-
'so please pass in engine=\'c\' instead.')
2336-
else:
2337-
raise
2333+
msg = ('NULL byte detected. This byte '
2334+
'cannot be processed in Python\'s '
2335+
'native csv library at the moment, '
2336+
'so please pass in engine=\'c\' instead')
2337+
2338+
if self.skipfooter > 0:
2339+
reason = ('Error could possibly be due to '
2340+
'skipfooter being ignored in Python\'s '
2341+
'native csv library.')
2342+
msg += '. ' + reason
2343+
2344+
raise csv.Error(msg)
23382345
line = self._check_comments([orig_line])[0]
23392346
self.pos += 1
23402347
if (not self.skip_blank_lines and

pandas/io/tests/parser/python_parser_only.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,3 +221,18 @@ def test_multi_char_sep_quotes(self):
221221
with tm.assertRaisesRegexp(ValueError, msg):
222222
self.read_csv(StringIO(data), sep=',,',
223223
quoting=csv.QUOTE_NONE)
224+
225+
def test_skipfooter_bad_row(self):
226+
# see gh-13879
227+
228+
data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz'
229+
msg = 'skipfooter being ignored'
230+
231+
with tm.assertRaisesRegexp(csv.Error, msg):
232+
self.read_csv(StringIO(data), skipfooter=1)
233+
234+
# We expect no match, so there should be an assertion
235+
# error out of the inner context manager.
236+
with tm.assertRaises(AssertionError):
237+
with tm.assertRaisesRegexp(csv.Error, msg):
238+
self.read_csv(StringIO(data))

0 commit comments

Comments
 (0)