gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

eendebakpt · 2024-04-26T20:05:36Z

difflib.unified_diff (and difflib.context_diff ) are documented to have lists of strings as input. When called with strings as input arguments, e.g. difflib.unified_diff('one', 'two') a diff is generated, instead of returning an error.

In this PR we disallow input of pure strings to prevent users from accidentally passing a string to difflib.unified_diff.

Two tests had to be modified, but it looks like these tests were written to test something else and not passing strings as arguments
We still allow other sequence types such as a tuple of strings.
This is a backwards incompatible change. If we do not want this, then I will modify the PR to update the documentation and add tests for this behavior.

Issue: difflib._check_types allows string inputs instead of sequences of strings as documented #115801

eamanu · 2024-04-27T01:39:36Z

Lib/difflib.py

@@ -1266,6 +1266,8 @@ def _check_types(a, b, *args):
    if b and not isinstance(b[0], str):
        raise TypeError('lines to compare must be str, not %s (%r)' %
                        (type(b[0]).__name__, b[0]))
+    if isinstance(a, str) or isinstance(b, str):


Following the existing code, IMO it would be great add the not %s (%r)' phrase.

Lib/test/test_difflib.py

serhiy-storchaka · 2024-04-29T12:46:38Z

Lib/test/test_difflib.py

@@ -273,9 +273,19 @@ def test_make_file_usascii_charset_with_nonascii_input(self):
        self.assertIn('&#305;mpl&#305;c&#305;t', output)


+class TestInputParsing(unittest.TestCase):


There are three existing tests for types checking: test_mixed_types_content, test_mixed_types_filenames and test_mixed_types_dates. I think that it is better to extract them in a separate class (they are no longer about mixing 8-bit and Unicode strings) and add a new test in this class.

Done. I moved the 3 existing tests to a new class and refactored the new test to be in the same style.

Lib/test/test_difflib.py

serhiy-storchaka · 2024-05-07T15:36:17Z

Lib/difflib.py

@@ -1266,6 +1266,12 @@ def _check_types(a, b, *args):
    if b and not isinstance(b[0], str):
        raise TypeError('lines to compare must be str, not %s (%r)' %
                        (type(b[0]).__name__, b[0]))
+    if isinstance(a, str):
+        raise TypeError('input must be a sequence of strings, not %s (%r)' %


If the user pass the content of the whole file without splitting it on lines, all this (potentially megabites) will be included in the error message. This is not good. The content of the string does not add anything to clarify the bug, the type is enough.

It is better to include only type name instead of the repr also in all other error messages, but this is not related to this PR.

I agree we should not print the entire object. I changed it for the new error messages. Let me know whether you want me to also change the other error messages in this PR.

serhiy-storchaka · 2024-06-10T11:08:39Z

Thank you for your contribution @eendebakpt. There were some unrelated random test failures, so I forget to merge this at its time. I should use "auto-merge" instead.

…unified_diff (pythonGH-118333)

Only allow sequence of strings as input for difflib.unified_diff

4ee5856

bedevere-app bot mentioned this pull request Apr 26, 2024

difflib._check_types allows string inputs instead of sequences of strings as documented #115801

Closed

bedevere-app bot added the awaiting review label Apr 26, 2024

eamanu reviewed Apr 27, 2024

View reviewed changes

eendebakpt and others added 3 commits April 27, 2024 20:33

review comments

a0dedbb

📜🤖 Added by blurb_it.

72d3a13

formatting of news entry

3cc80f9

serhiy-storchaka reviewed Apr 29, 2024

View reviewed changes

refactor tests

9e523e3

serhiy-storchaka reviewed May 7, 2024

View reviewed changes

eendebakpt and others added 2 commits May 7, 2024 21:36

address review comments

cb457f6

Fix tests.

7c43495

serhiy-storchaka approved these changes May 7, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels May 7, 2024

serhiy-storchaka and others added 2 commits May 8, 2024 10:27

Merge branch 'main' into difflib_unified_diff

a48565c

Merge branch 'main' into difflib_unified_diff

48cd4cc

serhiy-storchaka merged commit c3b6dbf into python:main Jun 10, 2024
33 checks passed

bedevere-app bot removed the awaiting merge label Jun 10, 2024

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024

pythongh-115801: Only allow sequence of strings as input for difflib.…

c793adb

…unified_diff (pythonGH-118333)

noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024

pythongh-115801: Only allow sequence of strings as input for difflib.…

1918756

…unified_diff (pythonGH-118333)

estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024

pythongh-115801: Only allow sequence of strings as input for difflib.…

896d135

…unified_diff (pythonGH-118333)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

eendebakpt commented Apr 26, 2024 •

edited by bedevere-app bot

Loading

eamanu Apr 27, 2024

serhiy-storchaka Apr 29, 2024

eendebakpt Apr 30, 2024

serhiy-storchaka May 7, 2024

eendebakpt May 7, 2024

serhiy-storchaka commented Jun 10, 2024

		@@ -273,9 +273,19 @@ def test_make_file_usascii_charset_with_nonascii_input(self):
		self.assertIn('ımplıcıt', output)


		class TestInputParsing(unittest.TestCase):

gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

gh-115801: Only allow sequence of strings as input for difflib.unified_diff #118333

Conversation

eendebakpt commented Apr 26, 2024 • edited by bedevere-app bot Loading

eamanu Apr 27, 2024

Choose a reason for hiding this comment

serhiy-storchaka Apr 29, 2024

Choose a reason for hiding this comment

eendebakpt Apr 30, 2024

Choose a reason for hiding this comment

serhiy-storchaka May 7, 2024

Choose a reason for hiding this comment

eendebakpt May 7, 2024

Choose a reason for hiding this comment

serhiy-storchaka commented Jun 10, 2024

eendebakpt commented Apr 26, 2024 •

edited by bedevere-app bot

Loading