Skip to content

gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 10, 2024

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 26, 2024

difflib.unified_diff (and difflib.context_diff ) are documented to have lists of strings as input. When called with strings as input arguments, e.g. difflib.unified_diff('one', 'two') a diff is generated, instead of returning an error.

In this PR we disallow input of pure strings to prevent users from accidentally passing a string to difflib.unified_diff.

  • Two tests had to be modified, but it looks like these tests were written to test something else and not passing strings as arguments
  • We still allow other sequence types such as a tuple of strings.
  • This is a backwards incompatible change. If we do not want this, then I will modify the PR to update the documentation and add tests for this behavior.

Lib/difflib.py Outdated
@@ -1266,6 +1266,8 @@ def _check_types(a, b, *args):
if b and not isinstance(b[0], str):
raise TypeError('lines to compare must be str, not %s (%r)' %
(type(b[0]).__name__, b[0]))
if isinstance(a, str) or isinstance(b, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the existing code, IMO it would be great add the not %s (%r)' phrase.

@@ -273,9 +273,19 @@ def test_make_file_usascii_charset_with_nonascii_input(self):
self.assertIn('ımplıcıt', output)


class TestInputParsing(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three existing tests for types checking: test_mixed_types_content, test_mixed_types_filenames and test_mixed_types_dates. I think that it is better to extract them in a separate class (they are no longer about mixing 8-bit and Unicode strings) and add a new test in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I moved the 3 existing tests to a new class and refactored the new test to be in the same style.

Lib/difflib.py Outdated
@@ -1266,6 +1266,12 @@ def _check_types(a, b, *args):
if b and not isinstance(b[0], str):
raise TypeError('lines to compare must be str, not %s (%r)' %
(type(b[0]).__name__, b[0]))
if isinstance(a, str):
raise TypeError('input must be a sequence of strings, not %s (%r)' %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user pass the content of the whole file without splitting it on lines, all this (potentially megabites) will be included in the error message. This is not good. The content of the string does not add anything to clarify the bug, the type is enough.

It is better to include only type name instead of the repr also in all other error messages, but this is not related to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should not print the entire object. I changed it for the new error messages. Let me know whether you want me to also change the other error messages in this PR.

@serhiy-storchaka serhiy-storchaka merged commit c3b6dbf into python:main Jun 10, 2024
33 checks passed
@serhiy-storchaka
Copy link
Member

Thank you for your contribution @eendebakpt. There were some unrelated random test failures, so I forget to merge this at its time. I should use "auto-merge" instead.

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants