Skip to content

bpo-40762: Fix writing bytes in csv #20371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

sidhant007
Copy link

@sidhant007 sidhant007 commented May 25, 2020

Changes how bytes are written in csv. Changes writing bytes as strings to writing them as bytes.

Ex.

import csv
with open("abc.csv", "w", encoding="utf-8") as f:
   data = [b'A', b'A']
   w = csv.writer(f)
   w.writerow(data)

This code currently writes b'A',b'A' in abc.csv, i.e the b-prefixed string instead of the actual bytes, whereas the natural expectation is for the actual bytes to be written (i.e A,A) in utf-8 or whatever encoding is specified in open.

This un-natural behaviour has been covered in this Pandas issue also.

https://bugs.python.org/issue40762

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@sidhant007

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@sidhant007 sidhant007 changed the title bpo:40762: Fix reading bytes in csv bpo-40762: Fix reading bytes in csv May 25, 2020
@sidhant007 sidhant007 force-pushed the csv-encoding-bytes-fix branch from 35cba89 to 4dc7ebe Compare May 25, 2020 06:54
Copy link
Contributor

@remilapeyre remilapeyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sidhant007, thanks for proposing this PR but as noted on bpo the csv module explicitely only deals with strings and numbers and other types must converted first. I don't think guessing the encoding of random bytes can be accepted.

@sidhant007 sidhant007 force-pushed the csv-encoding-bytes-fix branch from a854fe1 to 21c28fd Compare May 25, 2020 09:06
@sidhant007
Copy link
Author

Hi @remilapeyre, I have updated the docs in this PR to state that csv should allow writing bytes as well in the text mode.

Also this fix is not guessing the encoding, rather it is using the encoding scheme provided in the open(..., ..., encoding='...'), similar to how csv.reader uses this encoding scheme to decode a file (mentioned here).

Also I would suggest we move all this discussion to the bpo issue tracker, to avoid the discussion getting fragmented in two different places.

@sidhant007 sidhant007 changed the title bpo-40762: Fix reading bytes in csv bpo-40762: Fix writing bytes in csv May 25, 2020
@remilapeyre
Copy link
Contributor

Hi @sidhant007, as per the discussion on bpo this option has been rejected in favor of a strict parameter instead so we should probably close this PR.

@sidhant007 sidhant007 closed this Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants