-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-89653: Use int type for Unicode kind #92704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See also: #92705 |
In theory, the compiler can generate more efficient code for enum than for int. In practice, many years ago I tried to replace types of all "kind" variables and parameters from int to enum, but did not get convincing results (or got negative results?). On other hand, it was on slow 32-bit Atom, results can be different on modern 64-bit platform with full optimization. |
In the public C API, enum types must not be used, since it causes compatibility issues with other programming languages. |
Oh sorry, I was thinking at PR #92705 which is about the Python C API. This PR is (mostly) about unicodeobject.c where we have more freedom. |
Use the same type that PyUnicode_FromKindAndData() kind parameter type (public C API): int.
unicode_fill() uses a switch/case with I built Python with gcc -O3 (GCC 11.3.1 on Fedora 35). In gdb, I put a breakpoint on PyUnicode_Fill() and I run: import _testcapi; _testcapi.unicode_copycharacters("abc", 0, "def", 0, 3) kind is unicode_fill() first argument so the RCX register. It's called with RCX=1. I see the following x86-64 machine code (pseudo-code: I only write down the CMP/JE/JNE to highlight the control flow):
Reaching the Py_UCS1 code path takes two CMP comparisons and two conditional jumps (JE and JNE). In short, switch/case is implemented as 3 if/else. GCC doesn't create a jump table for this switch when using gcc -O3. At least, the Py_UCS4 code path doesn't check again RCX register: it doesn't handle kind different than 1, 2 or 4. In release mode, |
I built Python with clang -O3 (clang 13.0, also on Fedora 35). I also see two consecutive CMP and two conditional jumps (JE, JNE). Pseudo-code:
clang also implements this switch/case as 2 if/else. |
Yep, I also have this belief that switch/case "should" be more efficient. Problem: I just tested with GCC and clang, and it's not faster, it's just the same than if/else. Moreover, I'm not sure why, but GCC and clang likes to put Py_UCS2 as the first case, whereas I would prefer to put Py_UCS1 at the beginning to optimize L1 instruction cache usage, since Py_UCS1 is the most common case. According to that, I am not convinced that using enum provides any performance benefit. And I would even suggest replacing switch/case with explicit if/else to put Py_UCS1 as the first case. But I don't plan to replace switch/case. I don't consider that it's worth it (I'm not motivated to fine tune unicodeobject.c for that, I prefer to rely on PGO+LTO for that ;-)). |
Thank you for testing this @vstinner. |
I have a lot of beliefs and assumptions on how compilers are supposed to work according to me. Sadly, every time I really look into the machine code, it's very different than what I expected :-( It would be nice to micro-optimize these switch/case, but so far, I'm not aware of portable way to micro-optimize it. Moreover, usually, the UCS1, UCS2 and UCS4 code paths are very different (on purpose). I don't think that a compiler can magically merge the 3 code paths to have unconditonal code :-) |
Use the same type that PyUnicode_FromKindAndData() kind parameter
type (public C API): int.