-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
100000 assignments of .__sizeof__ cause a segfault on del #87053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the following program 1, method "__sizeof__()" is called and assigned multiple times. The program can work well on Python 3.10. However if I change "__sizeof__()" to "__sizeof__". Then a segmentation fault is reported. I think something wrong for the parser when dealing build-in attribute assignment. program 1: mystr = "hello123"
for x in range(1000000):
mystr = mystr.__sizeof__()
print(mystr) ========================= Output: work well as expected. program 2: mystr = "hello123"
for x in range(1000000):
mystr = mystr.__sizeof__
print(mystr) ========================== Expected output: no segfault. |
I can reproduce the issue. The stack trace is several hundred thousand (!) levels deep. #0 _Py_DECREF (op=<built-in method __sizeof__ of builtin_function_or_method object at remote 0x7fffe60703b0>, lineno=514, filename=0x6570af "./Include/object.h") ... |
This is a recursion problem, "mystr" will be equivalent to 'hello123'.__sizeof__.__sizeof__. ...(100K repetition)... .__sizeof__. The dealloc of "mystr" will cause recursive calls to tp_dealloc along the entire chain and that can exhaust the C stack. |
Xinmeng, to verify Ronald's explanation, run this instead mystr = "hello123"
for x in range(1000000):
mystr = mystr.__sizeof__()
input('>') # Hit Enter to continue.
del mystr # Expect crash here.
input('<') # And never get here. |
Thank you. But I am not sure this is a recursion problem. Please see the following example, I replace "__sizeof__" with "__class__". No segmentation fault. Everything goes well. ======================== mystr = "hello123"
print(dir(mystr))
for x in range(1000000):
mystr = mystr.__class__
print(mystr) ========================= ========================= mystr = "hello123"
for x in range(1000000):
mystr = mystr.__class__
input('>') # Hit Enter to continue.
del mystr # Expect crash here.
input('<') # And never get here ========================= |
Jumping in here to explain why '__class' doesn't crash when '__sizeof__' does: When '__class__' is fetched, it returns a new reference to the object's type. When '__sizeof__' is fetched on the otherhand, a new object is allocated on the heap ('types.MethodType') and is returned to the caller. This object also has a '__sizeof__' that does the same (as it's implemented on 'object'. So yes, you are exhausting the C runtime stack by de-allocating over a THOUSAND objects. You can see this happen by watching the memory usage of Python steadily climb. |
Note that there is a way to avoid this crash using the trashcan API (see the use of Py_TRASHCAN_BEGIN in various implementation). This API is generally only used for recursive data structures and because it has a performance cost (based on what I've read in other issues). |
Yes, there is an overhead of using the trashcan mechanism. This is why it is only used in data collections, because it is expected that your data can contain arbitrary long chains of links. There is many ways to create arbitrary long chains with other objects, but it does not happen in common code. For methods the cost would be especially high, because method objects are usually short-lived and the performance of creating/destroying is critical. AFAIK the same issue (maybe not with __sizeof__, but with other method of the basic object class, like __reduce__) was already reported earlier. I propose to close this issue as "won't fix". |
Mark, would your proposal in PEP-651 fix this case? |
It won't solve the problem. It might make the use of the trashcan cheaper, as it only need be used when stack space is running low. Ultimately, the cycle GC needs to be iterative, rather than recursive. That will take a *lot* of work though. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: