Skip to content

Stop using _Py_IDENTIFIER() in core CPython. #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericsnowcurrently opened this issue Jan 24, 2022 · 15 comments
Closed

Stop using _Py_IDENTIFIER() in core CPython. #230

ericsnowcurrently opened this issue Jan 24, 2022 · 15 comments
Assignees
Labels
Meta: multi-core https://github.com/ericsnowcurrently/multi-core-python

Comments

@ericsnowcurrently
Copy link
Collaborator

ericsnowcurrently commented Jan 24, 2022

While looking into @198, I found that it may be better to get rid of _Py_IDENTIFIER() entirely.

@gvanrossum
Copy link
Collaborator

(@ericsnowcurrently Seemingly you're still using the old board.)

@gvanrossum
Copy link
Collaborator

I worry that if you totally get rid of _Py_Identifier and _Py_IDENTIFIER you'd break some code, even though these aren't documented. But I'm not sure how to search for these.

(One example:
https://github.com/laurentfr/blender/blob/786763ca4c10be726bc8293fd3bb93d7d019dbc6/source/blender/python/intern/bpy_traceback.c#L54
)

@ericsnowcurrently
Copy link
Collaborator Author

Hmm, that's something I'll have to look into. 🙁

@ericsnowcurrently
Copy link
Collaborator Author

There are a few things left to do, but with what I have already done it looks reasonable. So I created https://bugs.python.org/issue46541 and python/cpython#30928.

FYI, with the change I've implemented so far (dropping use of _Py_IDENTIFIER() in the core), I get about a 1% improvement.

Results:

Slower (8):

  • sqlalchemy_imperative: 18.6 ms +- 0.3 ms -> 19.1 ms +- 1.1 ms: 1.03x slower
  • pathlib: 19.6 ms +- 0.2 ms -> 20.0 ms +- 0.4 ms: 1.02x slower
  • nqueens: 84.9 ms +- 0.7 ms -> 86.7 ms +- 0.7 ms: 1.02x slower
  • meteor_contest: 103 ms +- 2 ms -> 105 ms +- 1 ms: 1.02x slower
  • logging_format: 6.42 us +- 0.08 us -> 6.54 us +- 0.12 us: 1.02x slower
  • pickle_dict: 27.7 us +- 0.1 us -> 28.1 us +- 0.1 us: 1.02x slower
  • json_dumps: 12.6 ms +- 0.2 ms -> 12.8 ms +- 0.2 ms: 1.02x slower
  • go: 151 ms +- 1 ms -> 153 ms +- 1 ms: 1.01x slower

Faster (35):

  • spectral_norm: 103 ms +- 2 ms -> 97.5 ms +- 2.1 ms: 1.06x faster
  • scimark_sparse_mat_mult: 4.89 ms +- 0.11 ms -> 4.64 ms +- 0.14 ms: 1.05x faster
  • xml_etree_process: 59.2 ms +- 0.8 ms -> 56.7 ms +- 0.5 ms: 1.04x faster
  • xml_etree_generate: 82.9 ms +- 0.5 ms -> 79.5 ms +- 0.4 ms: 1.04x faster
  • pickle: 10.4 us +- 0.1 us -> 10.0 us +- 0.1 us: 1.04x faster
  • richards: 52.0 ms +- 1.0 ms -> 50.3 ms +- 0.7 ms: 1.03x faster
  • scimark_lu: 113 ms +- 3 ms -> 110 ms +- 1 ms: 1.03x faster
  • chameleon: 7.50 ms +- 0.09 ms -> 7.29 ms +- 0.06 ms: 1.03x faster
  • crypto_pyaes: 87.6 ms +- 0.9 ms -> 85.4 ms +- 0.9 ms: 1.03x faster
  • chaos: 73.3 ms +- 0.6 ms -> 71.5 ms +- 0.8 ms: 1.02x faster
  • telco: 6.75 ms +- 0.20 ms -> 6.61 ms +- 0.12 ms: 1.02x faster
  • django_template: 35.6 ms +- 0.6 ms -> 34.9 ms +- 0.5 ms: 1.02x faster
  • scimark_monte_carlo: 72.9 ms +- 1.1 ms -> 71.7 ms +- 0.5 ms: 1.02x faster
  • dulwich_log: 65.7 ms +- 0.4 ms -> 64.7 ms +- 0.6 ms: 1.02x faster
  • unpack_sequence: 44.6 ns +- 1.1 ns -> 43.9 ns +- 0.9 ns: 1.02x faster
  • xml_etree_parse: 157 ms +- 6 ms -> 155 ms +- 2 ms: 1.02x faster
  • xml_etree_iterparse: 107 ms +- 1 ms -> 106 ms +- 1 ms: 1.02x faster
  • regex_dna: 217 ms +- 1 ms -> 214 ms +- 1 ms: 1.01x faster
  • mako: 11.6 ms +- 0.1 ms -> 11.4 ms +- 0.1 ms: 1.01x faster
  • sympy_integrate: 21.5 ms +- 0.1 ms -> 21.2 ms +- 0.1 ms: 1.01x faster
  • unpickle_pure_python: 253 us +- 4 us -> 250 us +- 1 us: 1.01x faster
  • sympy_str: 303 ms +- 5 ms -> 299 ms +- 3 ms: 1.01x faster
  • scimark_sor: 126 ms +- 1 ms -> 124 ms +- 1 ms: 1.01x faster
  • sympy_sum: 169 ms +- 2 ms -> 167 ms +- 1 ms: 1.01x faster
  • unpickle: 14.3 us +- 0.1 us -> 14.2 us +- 0.4 us: 1.01x faster
  • nbody: 97.1 ms +- 1.7 ms -> 96.1 ms +- 1.3 ms: 1.01x faster
  • sqlalchemy_declarative: 147 ms +- 3 ms -> 146 ms +- 3 ms: 1.01x faster
  • sympy_expand: 505 ms +- 7 ms -> 500 ms +- 5 ms: 1.01x faster
  • scimark_fft: 329 ms +- 3 ms -> 326 ms +- 3 ms: 1.01x faster
  • fannkuch: 396 ms +- 3 ms -> 393 ms +- 3 ms: 1.01x faster
  • pyflate: 449 ms +- 3 ms -> 447 ms +- 3 ms: 1.00x faster
  • 2to3: 265 ms +- 1 ms -> 264 ms +- 1 ms: 1.00x faster
  • regex_compile: 137 ms +- 1 ms -> 137 ms +- 1 ms: 1.00x faster
  • python_startup: 8.18 ms +- 0.01 ms -> 8.16 ms +- 0.01 ms: 1.00x faster
  • python_startup_no_site: 5.80 ms +- 0.00 ms -> 5.78 ms +- 0.00 ms: 1.00x faster

Benchmark hidden because not significant (15): deltablue, float, hexiom, json_loads, logging_silent, logging_simple, pickle_list, pickle_pure_python, pidigits, raytrace, regex_effbot, regex_v8, sqlite_synth, tornado_http, unpickle_list

Geometric mean: 1.01x faster

</details>

@kumaraditya303
Copy link

kumaraditya303 commented Jan 31, 2022

To make this useful for deepfreeze the list of identifiers should be able to merged with the identifiers of deepfreeze, and on runtime the interpreter should intern them. If we are able to get global identifier objects statically allocated then using it within deepfreeze will be much easier and also the identifiers would de-duplicated.

@gvanrossum
Copy link
Collaborator

To make this useful for deepfreeze the list of identifiers should be able to merged with the identifiers of deepfreeze, and on runtime the interpreter should intern them. If we are able to get global identifier objects statically allocated then using it within deepfreeze will be much easier and also the identifiers would de-duplicated.

So are you suggesting that the two scripts (deepfreeze.py and generate_global_objects.py, both in Tools/scripts) should somehow be combined so that the list of identifiers (i.e. pycore_global_strings.h in python/cpython#30928) includes the deep-frozen strings and deepfreeze.py output can reference those? That seems ambitious and intricate, but not impossible.

@kumaraditya303
Copy link

kumaraditya303 commented Jan 31, 2022

Yes, although this will be more complicated it will be worth it for the savings. For this deepfreeze could produce a list of all the identifiers used (using a CLI flag) and dump it in a file and the same can be done for the stdlib modules and the files can be merged to de-duplicate the identifiers, then the global objects header file will be generated will all these strings statically allocated and the deepfreeze could reference these strings just like it does small integers.

For interning during python init it can loop through the indentifiers array and since the indentifiers will be unique hence no need to fix any other pointers referencing it. (PyUnicode_InternInPlace modifies the pointer if there is already an interned copy but here there will be no copy already interned).

I'll prototype it once python/cpython#30928 lands as it seems it still requires some changes.

@gvanrossum
Copy link
Collaborator

@ericsnowcurrently Do you have an idea about whether and when python/cpython#30928 will land? It is currently a draft and it touches some 80+ files, which makes me wonder about how far out this is.

@ericsnowcurrently
Copy link
Collaborator Author

I'm planning on dropping the "draft" status later today. I'll need a review but otherwise the main blocker would be any objections anyone has to using the global objects instead of _Py_IDENTIFIER() in core code. Thus far I've gotten no objections, but I haven't been all that vocal about it. Unless I can talk myself out of it, I'm planning on a short post on python-dev later today.

@gvanrossum
Copy link
Collaborator

What about the objections from people who brought up that these APIs are used by some 3rd party projects?

@ericsnowcurrently
Copy link
Collaborator Author

I wasn't planning on dropping _Py_IDENTIFIER() or those APIs in python/cpython#30928.

@ericsnowcurrently
Copy link
Collaborator Author

Hmm, the PR summary isn't very clear about that. I'll fix that.

@ericsnowcurrently ericsnowcurrently changed the title Get rid of _Py_IDENTIFIER()? Stop using _Py_IDENTIFIER() in core CPython. Feb 1, 2022
@ericsnowcurrently
Copy link
Collaborator Author

@ericsnowcurrently
Copy link
Collaborator Author

I merged python/cpython#30928.

There are some follow-ups to take care of. See #279.

Repository owner moved this from In Progress to Done in Fancy CPython Board Feb 14, 2022
@ericsnowcurrently ericsnowcurrently added Meta: multi-core https://github.com/ericsnowcurrently/multi-core-python and removed investigate labels Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta: multi-core https://github.com/ericsnowcurrently/multi-core-python
Projects
Development

No branches or pull requests

3 participants