Skip to content

Commit 3402e13

Browse files
gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130118)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions. The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that. With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics. Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place. Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs. Also add a thorough test that the implementation agrees with this definition. Author: Greg Price <[email protected]> Co-authored-by: Greg Price <[email protected]>
1 parent 6666b38 commit 3402e13

File tree

6 files changed

+34
-34
lines changed

6 files changed

+34
-34
lines changed

Doc/c-api/unicode.rst

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -256,13 +256,8 @@ the Python configuration.
256256
257257
.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UCS4 ch)
258258
259-
Return ``1`` or ``0`` depending on whether *ch* is a printable character.
260-
Nonprintable characters are those characters defined in the Unicode character
261-
database as "Other" or "Separator", excepting the ASCII space (0x20) which is
262-
considered printable. (Note that printable characters in this context are
263-
those which should not be escaped when :func:`repr` is invoked on a string.
264-
It has no bearing on the handling of strings written to :data:`sys.stdout` or
265-
:data:`sys.stderr`.)
259+
Return ``1`` or ``0`` depending on whether *ch* is a printable character,
260+
in the sense of :meth:`str.isprintable`.
266261
267262
268263
These APIs can be used for fast direct character conversions:

Doc/library/stdtypes.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2012,13 +2012,19 @@ expression support in the :mod:`re` module).
20122012

20132013
.. method:: str.isprintable()
20142014

2015-
Return ``True`` if all characters in the string are printable or the string is
2016-
empty, ``False`` otherwise. Nonprintable characters are those characters defined
2017-
in the Unicode character database as "Other" or "Separator", excepting the
2018-
ASCII space (0x20) which is considered printable. (Note that printable
2019-
characters in this context are those which should not be escaped when
2020-
:func:`repr` is invoked on a string. It has no bearing on the handling of
2021-
strings written to :data:`sys.stdout` or :data:`sys.stderr`.)
2015+
Return true if all characters in the string are printable, false if it
2016+
contains at least one non-printable character.
2017+
2018+
Here "printable" means the character is suitable for :func:`repr` to use in
2019+
its output; "non-printable" means that :func:`repr` on built-in types will
2020+
hex-escape the character. It has no bearing on the handling of strings
2021+
written to :data:`sys.stdout` or :data:`sys.stderr`.
2022+
2023+
The printable characters are those which in the Unicode character database
2024+
(see :mod:`unicodedata`) have a general category in group Letter, Mark,
2025+
Number, Punctuation, or Symbol (L, M, N, P, or S); plus the ASCII space 0x20.
2026+
Nonprintable characters are those in group Separator or Other (Z or C),
2027+
except the ASCII space.
20222028

20232029

20242030
.. method:: str.isspace()

Lib/test/test_str.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -853,6 +853,15 @@ def test_isprintable(self):
853853
self.assertTrue('\U0001F46F'.isprintable())
854854
self.assertFalse('\U000E0020'.isprintable())
855855

856+
@support.requires_resource('cpu')
857+
def test_isprintable_invariant(self):
858+
for codepoint in range(sys.maxunicode + 1):
859+
char = chr(codepoint)
860+
category = unicodedata.category(char)
861+
self.assertEqual(char.isprintable(),
862+
category[0] not in ('C', 'Z')
863+
or char == ' ')
864+
856865
def test_surrogates(self):
857866
for s in ('a\uD800b\uDFFF', 'a\uDFFFb\uD800',
858867
'a\uD800b\uDFFFa', 'a\uDFFFb\uD800a'):

Objects/clinic/unicodeobject.c.h

Lines changed: 3 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Objects/unicodectype.c

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -142,18 +142,10 @@ int _PyUnicode_IsNumeric(Py_UCS4 ch)
142142
return (ctype->flags & NUMERIC_MASK) != 0;
143143
}
144144

145-
/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
146-
0 otherwise.
147-
All characters except those characters defined in the Unicode character
148-
database as following categories are considered printable.
149-
* Cc (Other, Control)
150-
* Cf (Other, Format)
151-
* Cs (Other, Surrogate)
152-
* Co (Other, Private Use)
153-
* Cn (Other, Not Assigned)
154-
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
155-
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
156-
* Zs (Separator, Space) other than ASCII space('\x20').
145+
/* Returns 1 for Unicode characters that repr() may use in its output,
146+
and 0 for characters to be hex-escaped.
147+
148+
See documentation of `str.isprintable` for details.
157149
*/
158150
int _PyUnicode_IsPrintable(Py_UCS4 ch)
159151
{

Objects/unicodeobject.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12452,15 +12452,14 @@ unicode_isidentifier_impl(PyObject *self)
1245212452
/*[clinic input]
1245312453
str.isprintable as unicode_isprintable
1245412454
12455-
Return True if the string is printable, False otherwise.
12455+
Return True if all characters in the string are printable, False otherwise.
1245612456
12457-
A string is printable if all of its characters are considered printable in
12458-
repr() or if it is empty.
12457+
A character is printable if repr() may use it in its output.
1245912458
[clinic start generated code]*/
1246012459

1246112460
static PyObject *
1246212461
unicode_isprintable_impl(PyObject *self)
12463-
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=98a0e1c2c1813209]*/
12462+
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=4e56bcc6b06ca18c]*/
1246412463
{
1246512464
Py_ssize_t i, length;
1246612465
int kind;

0 commit comments

Comments
 (0)