Skip to content

gh-89653: Use int type for Unicode kind #92704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2022
Merged

gh-89653: Use int type for Unicode kind #92704

merged 1 commit into from
May 13, 2022

Conversation

vstinner
Copy link
Member

Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.

@vstinner
Copy link
Member Author

See also: #92705

@serhiy-storchaka
Copy link
Member

In theory, the compiler can generate more efficient code for enum than for int. In practice, many years ago I tried to replace types of all "kind" variables and parameters from int to enum, but did not get convincing results (or got negative results?). On other hand, it was on slow 32-bit Atom, results can be different on modern 64-bit platform with full optimization.

@vstinner
Copy link
Member Author

In theory, the compiler can generate more efficient code for enum than for int.

In the public C API, enum types must not be used, since it causes compatibility issues with other programming languages.

@vstinner
Copy link
Member Author

In the public C API, enum types must not be used, since it causes compatibility issues with other programming languages.

Oh sorry, I was thinking at PR #92705 which is about the Python C API. This PR is (mostly) about unicodeobject.c where we have more freedom.

Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.
@vstinner
Copy link
Member Author

unicode_fill() uses a switch/case with enum PyUnicode_Kind which now (since yesterday!) has 3 possible values (1BYTE, 2BYTE, 4BYTE).

I built Python with gcc -O3 (GCC 11.3.1 on Fedora 35). In gdb, I put a breakpoint on PyUnicode_Fill() and I run:

import _testcapi; _testcapi.unicode_copycharacters("abc", 0, "def", 0, 3)

kind is unicode_fill() first argument so the RCX register. It's called with RCX=1. I see the following x86-64 machine code (pseudo-code: I only write down the CMP/JE/JNE to highlight the control flow):

cmp cl, 0x2
jne label1
... # Py_UCS2

label1:
cmp cl, 0x4
je label2
... memset ... # Py_UCS1

label2:
... # Py_UCS4

Reaching the Py_UCS1 code path takes two CMP comparisons and two conditional jumps (JE and JNE). In short, switch/case is implemented as 3 if/else.

GCC doesn't create a jump table for this switch when using gcc -O3.

At least, the Py_UCS4 code path doesn't check again RCX register: it doesn't handle kind different than 1, 2 or 4. In release mode, default: Py_UNREACHABLE(); is replaced with default: __builtin_unreachable(); by the C preprocessor. GCC simply removes this code.

@vstinner
Copy link
Member Author

I built Python with clang -O3 (clang 13.0, also on Fedora 35). I also see two consecutive CMP and two conditional jumps (JE, JNE). Pseudo-code:

CMP edx, 0x4
JE label_UCS4

CMP edx, 0x2
JNE label_UCS1

...  # UCS2 code path

label_UCS4:
... # UCS4 code path

label_UCS1:
... memset ... # UCS1 code path
...

clang also implements this switch/case as 2 if/else.

@vstinner
Copy link
Member Author

@serhiy-storchaka:

In theory, the compiler can generate more efficient code for enum than for int. In practice, many years ago I tried to replace types of all "kind" variables and parameters from int to enum, but did not get convincing results (or got negative results?). On other hand, it was on slow 32-bit Atom, results can be different on modern 64-bit platform with full optimization.

Yep, I also have this belief that switch/case "should" be more efficient. Problem: I just tested with GCC and clang, and it's not faster, it's just the same than if/else. Moreover, I'm not sure why, but GCC and clang likes to put Py_UCS2 as the first case, whereas I would prefer to put Py_UCS1 at the beginning to optimize L1 instruction cache usage, since Py_UCS1 is the most common case.

According to that, I am not convinced that using enum provides any performance benefit. And I would even suggest replacing switch/case with explicit if/else to put Py_UCS1 as the first case. But I don't plan to replace switch/case. I don't consider that it's worth it (I'm not motivated to fine tune unicodeobject.c for that, I prefer to rely on PGO+LTO for that ;-)).

@vstinner vstinner merged commit f62ad4f into python:main May 13, 2022
@vstinner vstinner deleted the int_kind branch May 13, 2022 10:41
@serhiy-storchaka
Copy link
Member

Thank you for testing this @vstinner.

@vstinner
Copy link
Member Author

Thank you for testing this @vstinner.

I have a lot of beliefs and assumptions on how compilers are supposed to work according to me. Sadly, every time I really look into the machine code, it's very different than what I expected :-(

It would be nice to micro-optimize these switch/case, but so far, I'm not aware of portable way to micro-optimize it. Moreover, usually, the UCS1, UCS2 and UCS4 code paths are very different (on purpose). I don't think that a compiler can magically merge the 3 code paths to have unconditonal code :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants