-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv c engine accepts binary mode data and python engine rejects it #23779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @gfyoung |
@rgeens : Thanks for opening this! Sorry that this got lost in the pile of issues that we have 😞 I strongly believe that this discrepancy is symptomatic of limitations in Python's native To illustrate my point: import csv
with open('test.csv', 'w') as f:
f.write('1,2,3\n4,5,6')
r = csv.reader(open('test.csv', 'rb'))
print(next(r)) This outputs:
This is beyond our control, so the best we can do is to test the behavior for the C engine (if it doesn't exist already). You're more than welcome to do that! |
Python's native CSV library doesn't accept such files, but we do for the C parser. Closes pandas-devgh-23779.
Python's native CSV library doesn't accept such files, but we do for the C parser. Closes pandas-devgh-23779.
Python's native CSV library doesn't accept such files, but we do for the C parser. Closes gh-23779.
Python's native CSV library doesn't accept such files, but we do for the C parser. Closes pandas-devgh-23779.
Python's native CSV library doesn't accept such files, but we do for the C parser. Closes pandas-devgh-23779.
I would like to have this reopened, because it's not actually beyond pandas control to fix this cleanly and because right now the read_csv function doc specifies "file-like object" not "file-like object opened in ascii mode". Just conditionally wrap the object in a io.TextIOWrapper, since you know that csv is by definition an ascii format: import csv
import io
def __ascii_wrap(potentially_binary_buffer):
try:
return io.TextIOWrapper(potentially_binary_buffer)
except Exception:
return potentially_binary_buffer
with open('test.csv', 'w') as f:
f.write('1,2,3\n4,5,6')
r = csv.reader(__ascii_wrap(open('test.csv', 'rb')))
print(next(r)) |
@fiendish : That's a good point. You're more than welcome to open a PR to implement this. |
* BUG: Help python csv engine read binary buffers The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes #23779
* BUG: Help python csv engine read binary buffers The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes pandas-dev#23779
* BUG: Help python csv engine read binary buffers The file buffer given to read_csv could have been opened in binary mode, but the python csv reader errors on binary buffers. closes pandas-dev#23779
Code Sample
Problem description
The second read_csv call (using the C engine and a file opened in binary mode) will correctly read the csv. The fourth read_csv call (using the Python engine and a file opened in binary mode) will throw an exception stating it needs to be in text mode:
Perhaps this is intended behavior, but I found this difference in behavior between the engines surprising, as well as that binary mode was accepted at all.
Expected Output
Either the C engine rejecting binary mode files or the Python engine accepting them.
Output of
pd.show_versions()
pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: