Skip to content

dataclasses astuple and asdict crash on recursive dataclass structures / dont support deepcopy memo #94345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zzzeek opened this issue Jun 27, 2022 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error

Comments

@zzzeek
Copy link

zzzeek commented Jun 27, 2022

I don't see this mentioned anywhere and it seems a bit unusual, we can't use asdict() or astuple() with dataclasses that have cycles to each other. This is something that is handled by most other stdlib features such as deepcopy, dataclasses stringify, etc.

Example below

from __future__ import annotations

import dataclasses

# dataclasses support recursive structures

@dataclasses.dataclass
class A:
    b: list[B] = dataclasses.field(default_factory=list)

@dataclasses.dataclass
class B:
    a: A

# we can make a recursive structure
a1 = A()
b1 = B(a1)
a1.b.append(b1)

# stringify supports recursion
print(a1)
print(b1)


# asdict and astuple don't however, with no way to even work around
# it (like pass in a set to track already seen objects)

# recursion overflow
dataclasses.asdict(a1)

# same
dataclasses.astuple(a1)

output:

A(b=[B(a=...)])
B(a=A(b=[...]))
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test4.py", line 29, in <module>
    dataclasses.asdict(a1)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1232, in asdict
    return _asdict_inner(obj, dict_factory)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1239, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1267, in _asdict_inner
    return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1267, in <genexpr>
    return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1239, in _asdict_inner
    value = _asdict_inner(getattr(obj, f.name), dict_factory)
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1239, in _asdict_inner

...

  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1236, in _asdict_inner
    if _is_dataclass_instance(obj):
  File "/opt/python-3.10.0/lib/python3.10/dataclasses.py", line 1201, in _is_dataclass_instance
    return hasattr(type(obj), _FIELDS)
RecursionError: maximum recursion depth exceeded while calling a Python object

there seems to be no workaround as these two methods are pretty simple and don't accept any arguments like "check_recursion", IIUC there is also no directive on field() that would change this either?

@zzzeek zzzeek added the type-bug An unexpected behavior, bug, or error label Jun 27, 2022
@zzzeek
Copy link
Author

zzzeek commented Jun 27, 2022

as deepcopy itself supports a memo dict, and it looks like the functions here don't support that either, it seems adding a simple "memo" argument to asdict() and astuple() that is also passed to deepcopy() would be the most reasonable approach, no backwards compat issues, etc.

@zzzeek zzzeek changed the title dataclasses astuple and asdict crash on recursive dataclass structures dataclasses astuple and asdict crash on recursive dataclass structures / dont support deepcopy memo Jun 27, 2022
@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Jun 27, 2022
@ericvsmith
Copy link
Member

I think adding memo is a reasonable approach. I haven't looked at what would be involved, or if passing it to deepcopy() would be enough. Patches welcomed!

@zzzeek
Copy link
Author

zzzeek commented Jun 28, 2022

passing it to deepcopy is part of it, but also you want to place every object received, including dataclass instances, in the dict as well, storing the "asdict" form. then you retrieve that object if present rather than recursively calling internal_asdict again (paraphrasing the names here).

@zzzeek
Copy link
Author

zzzeek commented Jun 28, 2022

so, this can't work. we create the dict or tuple all at once and we don't use any kind of mutation since we have a dict factory / tuple factory, not to mention you can't create a recursive tuple structure in any case as they are immutable. even if we changed asdict() to use an interim dictionary so that we can mutate them and make a recursive dict structure, there's still no way to pass them through the given dict_factory.

as far as passing memo to deepcopy(), deepcopy makes its own memo when invoked if not given, so it handles a recursive structure given, as long as the call to asdict() / astuple() is not itself already within a __deepcopy__() chain, which would be very unusual, so that part does not seem important, given that the current contract of the functions can't support recursion without some option to change their behavior.

so at the moment options seem to be:

  1. do nothing, users get a recursion overflow
  2. implement a memo dictionary but only for the purpose of detecting recursion overflow and raising a nicer error
  3. implement a memo dictionary for the purpose of either ignoring recursive keys, or placing some kind of token in the result.

Keeping in mind that it's looking like option 1 is likely here, for option 3 I could envision something like this:

def asdict(obj, *, dict_factory=dict, recursive_token=dataclasses.RAISE):
    # ...

def astuple(obj, *, tuple_factory=tuple, recursive_token=dataclasses.RAISE):
   # ...

where the default behavior on recursion detection is to raise an error (which would most easily mean, do nothing differently from what it is right now). If recursive_token is some other object, like any user defined object, then instead of raising, we return that token, so my structure above would look like:

{"b": [{"a": TOKEN}]}

concretely I don't know how useful this really is. In a more real world sense if I were using asdict() / astuple(), I'd probably want to be able to indicate within a field() that certain fields should not be considered for serialization. That part of it I assume starts feeling out of scope for what these functions intend.

@moi90
Copy link
Contributor

moi90 commented Mar 8, 2024

In my eyes, it would be really beneficial to see where an object contains itself. The default RecursionError is not really informative.

khaeru added a commit to iiasa/message-ix-models that referenced this issue Jan 29, 2025
- Use repr(ConfigHelper) in Context.asdict() to avoid RecursionError that occurs
  when context.transport.spec.add.set["technology"] elements have parent/child
  relationships.
- Handle Enum, Quantity, self in .util.cache.
- Log a verbose message on error.
- Only use ._util.dataclasses._asdict() on Python <3.13.
- Expand tests.
khaeru added a commit to iiasa/message-ix-models that referenced this issue Jan 29, 2025
- Use repr(ConfigHelper) in Context.asdict() to avoid RecursionError that occurs
  when context.transport.spec.add.set["technology"] elements have parent/child
  relationships.
- Handle Enum, Quantity, self in .util.cache.
- Log a verbose message on error.
- Only use ._util.dataclasses._asdict() on Python <3.13.
- Expand tests.
khaeru added a commit to iiasa/message-ix-models that referenced this issue Jan 31, 2025
- Use repr(ConfigHelper) in Context.asdict() to avoid RecursionError that occurs
  when context.transport.spec.add.set["technology"] elements have parent/child
  relationships.
- Handle Enum, Quantity, self in .util.cache.
- Log a verbose message on error.
- Only use ._util.dataclasses._asdict() on Python <3.13.
- Expand tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants