Stop using _Py_IDENTIFIER() in core CPython. #230

ericsnowcurrently · 2022-01-24T17:22:29Z

While looking into @198, I found that it may be better to get rid of _Py_IDENTIFIER() entirely.

The text was updated successfully, but these errors were encountered:

gvanrossum · 2022-01-24T20:02:11Z

(@ericsnowcurrently Seemingly you're still using the old board.)

gvanrossum · 2022-01-24T20:18:36Z

I worry that if you totally get rid of _Py_Identifier and _Py_IDENTIFIER you'd break some code, even though these aren't documented. But I'm not sure how to search for these.

(One example:
https://github.com/laurentfr/blender/blob/786763ca4c10be726bc8293fd3bb93d7d019dbc6/source/blender/python/intern/bpy_traceback.c#L54
)

ericsnowcurrently · 2022-01-24T20:54:39Z

Hmm, that's something I'll have to look into. 🙁

ericsnowcurrently · 2022-01-26T22:06:36Z

There are a few things left to do, but with what I have already done it looks reasonable. So I created https://bugs.python.org/issue46541 and python/cpython#30928.

FYI, with the change I've implemented so far (dropping use of _Py_IDENTIFIER() in the core), I get about a 1% improvement.

Results:

Slower (8):

sqlalchemy_imperative: 18.6 ms +- 0.3 ms -> 19.1 ms +- 1.1 ms: 1.03x slower
pathlib: 19.6 ms +- 0.2 ms -> 20.0 ms +- 0.4 ms: 1.02x slower
nqueens: 84.9 ms +- 0.7 ms -> 86.7 ms +- 0.7 ms: 1.02x slower
meteor_contest: 103 ms +- 2 ms -> 105 ms +- 1 ms: 1.02x slower
logging_format: 6.42 us +- 0.08 us -> 6.54 us +- 0.12 us: 1.02x slower
pickle_dict: 27.7 us +- 0.1 us -> 28.1 us +- 0.1 us: 1.02x slower
json_dumps: 12.6 ms +- 0.2 ms -> 12.8 ms +- 0.2 ms: 1.02x slower
go: 151 ms +- 1 ms -> 153 ms +- 1 ms: 1.01x slower

Faster (35):

spectral_norm: 103 ms +- 2 ms -> 97.5 ms +- 2.1 ms: 1.06x faster
scimark_sparse_mat_mult: 4.89 ms +- 0.11 ms -> 4.64 ms +- 0.14 ms: 1.05x faster
xml_etree_process: 59.2 ms +- 0.8 ms -> 56.7 ms +- 0.5 ms: 1.04x faster
xml_etree_generate: 82.9 ms +- 0.5 ms -> 79.5 ms +- 0.4 ms: 1.04x faster
pickle: 10.4 us +- 0.1 us -> 10.0 us +- 0.1 us: 1.04x faster
richards: 52.0 ms +- 1.0 ms -> 50.3 ms +- 0.7 ms: 1.03x faster
scimark_lu: 113 ms +- 3 ms -> 110 ms +- 1 ms: 1.03x faster
chameleon: 7.50 ms +- 0.09 ms -> 7.29 ms +- 0.06 ms: 1.03x faster
crypto_pyaes: 87.6 ms +- 0.9 ms -> 85.4 ms +- 0.9 ms: 1.03x faster
chaos: 73.3 ms +- 0.6 ms -> 71.5 ms +- 0.8 ms: 1.02x faster
telco: 6.75 ms +- 0.20 ms -> 6.61 ms +- 0.12 ms: 1.02x faster
django_template: 35.6 ms +- 0.6 ms -> 34.9 ms +- 0.5 ms: 1.02x faster
scimark_monte_carlo: 72.9 ms +- 1.1 ms -> 71.7 ms +- 0.5 ms: 1.02x faster
dulwich_log: 65.7 ms +- 0.4 ms -> 64.7 ms +- 0.6 ms: 1.02x faster
unpack_sequence: 44.6 ns +- 1.1 ns -> 43.9 ns +- 0.9 ns: 1.02x faster
xml_etree_parse: 157 ms +- 6 ms -> 155 ms +- 2 ms: 1.02x faster
xml_etree_iterparse: 107 ms +- 1 ms -> 106 ms +- 1 ms: 1.02x faster
regex_dna: 217 ms +- 1 ms -> 214 ms +- 1 ms: 1.01x faster
mako: 11.6 ms +- 0.1 ms -> 11.4 ms +- 0.1 ms: 1.01x faster
sympy_integrate: 21.5 ms +- 0.1 ms -> 21.2 ms +- 0.1 ms: 1.01x faster
unpickle_pure_python: 253 us +- 4 us -> 250 us +- 1 us: 1.01x faster
sympy_str: 303 ms +- 5 ms -> 299 ms +- 3 ms: 1.01x faster
scimark_sor: 126 ms +- 1 ms -> 124 ms +- 1 ms: 1.01x faster
sympy_sum: 169 ms +- 2 ms -> 167 ms +- 1 ms: 1.01x faster
unpickle: 14.3 us +- 0.1 us -> 14.2 us +- 0.4 us: 1.01x faster
nbody: 97.1 ms +- 1.7 ms -> 96.1 ms +- 1.3 ms: 1.01x faster
sqlalchemy_declarative: 147 ms +- 3 ms -> 146 ms +- 3 ms: 1.01x faster
sympy_expand: 505 ms +- 7 ms -> 500 ms +- 5 ms: 1.01x faster
scimark_fft: 329 ms +- 3 ms -> 326 ms +- 3 ms: 1.01x faster
fannkuch: 396 ms +- 3 ms -> 393 ms +- 3 ms: 1.01x faster
pyflate: 449 ms +- 3 ms -> 447 ms +- 3 ms: 1.00x faster
2to3: 265 ms +- 1 ms -> 264 ms +- 1 ms: 1.00x faster
regex_compile: 137 ms +- 1 ms -> 137 ms +- 1 ms: 1.00x faster
python_startup: 8.18 ms +- 0.01 ms -> 8.16 ms +- 0.01 ms: 1.00x faster
python_startup_no_site: 5.80 ms +- 0.00 ms -> 5.78 ms +- 0.00 ms: 1.00x faster

Benchmark hidden because not significant (15): deltablue, float, hexiom, json_loads, logging_silent, logging_simple, pickle_list, pickle_pure_python, pidigits, raytrace, regex_effbot, regex_v8, sqlite_synth, tornado_http, unpickle_list

Geometric mean: 1.01x faster

</details>

kumaraditya303 · 2022-01-31T10:49:42Z

To make this useful for deepfreeze the list of identifiers should be able to merged with the identifiers of deepfreeze, and on runtime the interpreter should intern them. If we are able to get global identifier objects statically allocated then using it within deepfreeze will be much easier and also the identifiers would de-duplicated.

gvanrossum · 2022-01-31T16:50:03Z

To make this useful for deepfreeze the list of identifiers should be able to merged with the identifiers of deepfreeze, and on runtime the interpreter should intern them. If we are able to get global identifier objects statically allocated then using it within deepfreeze will be much easier and also the identifiers would de-duplicated.

So are you suggesting that the two scripts (deepfreeze.py and generate_global_objects.py, both in Tools/scripts) should somehow be combined so that the list of identifiers (i.e. pycore_global_strings.h in python/cpython#30928) includes the deep-frozen strings and deepfreeze.py output can reference those? That seems ambitious and intricate, but not impossible.

kumaraditya303 · 2022-01-31T17:06:02Z

Yes, although this will be more complicated it will be worth it for the savings. For this deepfreeze could produce a list of all the identifiers used (using a CLI flag) and dump it in a file and the same can be done for the stdlib modules and the files can be merged to de-duplicate the identifiers, then the global objects header file will be generated will all these strings statically allocated and the deepfreeze could reference these strings just like it does small integers.

For interning during python init it can loop through the indentifiers array and since the indentifiers will be unique hence no need to fix any other pointers referencing it. (PyUnicode_InternInPlace modifies the pointer if there is already an interned copy but here there will be no copy already interned).

I'll prototype it once python/cpython#30928 lands as it seems it still requires some changes.

gvanrossum · 2022-01-31T18:45:22Z

@ericsnowcurrently Do you have an idea about whether and when python/cpython#30928 will land? It is currently a draft and it touches some 80+ files, which makes me wonder about how far out this is.

ericsnowcurrently · 2022-01-31T20:49:30Z

I'm planning on dropping the "draft" status later today. I'll need a review but otherwise the main blocker would be any objections anyone has to using the global objects instead of _Py_IDENTIFIER() in core code. Thus far I've gotten no objections, but I haven't been all that vocal about it. Unless I can talk myself out of it, I'm planning on a short post on python-dev later today.

gvanrossum · 2022-01-31T21:06:58Z

What about the objections from people who brought up that these APIs are used by some 3rd party projects?

ericsnowcurrently · 2022-01-31T21:26:33Z

I wasn't planning on dropping _Py_IDENTIFIER() or those APIs in python/cpython#30928.

ericsnowcurrently · 2022-01-31T21:28:47Z

Hmm, the PR summary isn't very clear about that. I'll fix that.

ericsnowcurrently · 2022-02-01T01:01:17Z

https://github.com/laurentfr/blender/blob/786763ca4c10be726bc8293fd3bb93d7d019dbc6/source/blender/python/intern/bpy_traceback.c#L54

AKA https://github.com/blender/blender/blob/786763ca4c10be726bc8293fd3bb93d7d019dbc6/source/blender/python/intern/bpy_traceback.c#L54

ericsnowcurrently · 2022-02-02T22:53:57Z

I posted to python-dev about this: https://mail.python.org/archives/list/[email protected]/thread/DNMZAMB4M6RVR76RDZMUK2WRLI6KAAYS/

ericsnowcurrently · 2022-02-14T18:35:19Z

I merged python/cpython#30928.

There are some follow-ups to take care of. See #279.

ericsnowcurrently added the investigate label Jan 24, 2022

ericsnowcurrently self-assigned this Jan 24, 2022

ericsnowcurrently mentioned this issue Jan 24, 2022

move _Py_unicode_state.ids (_Py_Identifier) into _PyRuntimeState.global_objects.singletons #198

Closed

ericsnowcurrently added this to Fancy CPython Board Jan 24, 2022

ericsnowcurrently moved this to Todo in Fancy CPython Board Jan 24, 2022

ericsnowcurrently moved this from Todo to In Progress in Fancy CPython Board Jan 24, 2022

gvanrossum mentioned this issue Jan 31, 2022

bpo-46430: Intern strings in deep-frozen modules python/cpython#30683

Merged

ericsnowcurrently changed the title ~~Get rid of _Py_IDENTIFIER()?~~ Stop using _Py_IDENTIFIER() in core CPython. Feb 1, 2022

kumaraditya303 mentioned this issue Feb 10, 2022

bpo-46712: Share global string identifiers in deepfreeze python/cpython#31261

Merged

ericsnowcurrently closed this as completed Feb 14, 2022

Repository owner moved this from In Progress to Done in Fancy CPython Board Feb 14, 2022

ericsnowcurrently added Meta: multi-core https://github.com/ericsnowcurrently/multi-core-python and removed investigate labels Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop using _Py_IDENTIFIER() in core CPython. #230

Stop using _Py_IDENTIFIER() in core CPython. #230

ericsnowcurrently commented Jan 24, 2022 •

edited

Loading

gvanrossum commented Jan 24, 2022

gvanrossum commented Jan 24, 2022

ericsnowcurrently commented Jan 24, 2022

ericsnowcurrently commented Jan 26, 2022

kumaraditya303 commented Jan 31, 2022 •

edited

Loading

gvanrossum commented Jan 31, 2022

kumaraditya303 commented Jan 31, 2022 •

edited

Loading

gvanrossum commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

gvanrossum commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

ericsnowcurrently commented Feb 1, 2022

ericsnowcurrently commented Feb 2, 2022

ericsnowcurrently commented Feb 14, 2022

Stop using _Py_IDENTIFIER() in core CPython. #230

Stop using _Py_IDENTIFIER() in core CPython. #230

Comments

ericsnowcurrently commented Jan 24, 2022 • edited Loading

gvanrossum commented Jan 24, 2022

gvanrossum commented Jan 24, 2022

ericsnowcurrently commented Jan 24, 2022

ericsnowcurrently commented Jan 26, 2022

kumaraditya303 commented Jan 31, 2022 • edited Loading

gvanrossum commented Jan 31, 2022

kumaraditya303 commented Jan 31, 2022 • edited Loading

gvanrossum commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

gvanrossum commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

ericsnowcurrently commented Jan 31, 2022

ericsnowcurrently commented Feb 1, 2022

ericsnowcurrently commented Feb 2, 2022

ericsnowcurrently commented Feb 14, 2022

ericsnowcurrently commented Jan 24, 2022 •

edited

Loading

kumaraditya303 commented Jan 31, 2022 •

edited

Loading

kumaraditya303 commented Jan 31, 2022 •

edited

Loading