-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333
Conversation
Lib/difflib.py
Outdated
@@ -1266,6 +1266,8 @@ def _check_types(a, b, *args): | |||
if b and not isinstance(b[0], str): | |||
raise TypeError('lines to compare must be str, not %s (%r)' % | |||
(type(b[0]).__name__, b[0])) | |||
if isinstance(a, str) or isinstance(b, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the existing code, IMO it would be great add the not %s (%r)'
phrase.
Lib/test/test_difflib.py
Outdated
@@ -273,9 +273,19 @@ def test_make_file_usascii_charset_with_nonascii_input(self): | |||
self.assertIn('ımplıcıt', output) | |||
|
|||
|
|||
class TestInputParsing(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three existing tests for types checking: test_mixed_types_content
, test_mixed_types_filenames
and test_mixed_types_dates
. I think that it is better to extract them in a separate class (they are no longer about mixing 8-bit and Unicode strings) and add a new test in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I moved the 3 existing tests to a new class and refactored the new test to be in the same style.
Lib/difflib.py
Outdated
@@ -1266,6 +1266,12 @@ def _check_types(a, b, *args): | |||
if b and not isinstance(b[0], str): | |||
raise TypeError('lines to compare must be str, not %s (%r)' % | |||
(type(b[0]).__name__, b[0])) | |||
if isinstance(a, str): | |||
raise TypeError('input must be a sequence of strings, not %s (%r)' % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user pass the content of the whole file without splitting it on lines, all this (potentially megabites) will be included in the error message. This is not good. The content of the string does not add anything to clarify the bug, the type is enough.
It is better to include only type name instead of the repr also in all other error messages, but this is not related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should not print the entire object. I changed it for the new error messages. Let me know whether you want me to also change the other error messages in this PR.
Thank you for your contribution @eendebakpt. There were some unrelated random test failures, so I forget to merge this at its time. I should use "auto-merge" instead. |
difflib.unified_diff
(anddifflib.context_diff
) are documented to have lists of strings as input. When called with strings as input arguments, e.g.difflib.unified_diff('one', 'two')
a diff is generated, instead of returning an error.In this PR we disallow input of pure strings to prevent users from accidentally passing a string to
difflib.unified_diff
.