Skip to content

♻️ avoid deepcopy of dict in validate_coerce #3946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 7, 2023

Conversation

jvdd
Copy link
Contributor

@jvdd jvdd commented Nov 3, 2022

This PR avoids the deepcopy in the _plotly_utils.basevalidators.BaseDataValidator its validate_coerce method.

The reason I updated this code; I profiled plotly.py its graph construction time when adding large traces and +/- 70% of the time was spent in this validate_coerce method - of which +/- 33% of the time was spent at the deepcopy (see below).

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2626                                               def validate_coerce(self, v, skip_invalid=False, _validate=True):
  2627         2         26.0     13.0      0.0          from plotly.basedatatypes import BaseTraceType
  2628                                           
  2629                                                   # Import Histogram2dcontour, this is the deprecated name of the
  2630                                                   # Histogram2dContour trace.
  2631         2        321.0    160.5      0.1          from plotly.graph_objs import Histogram2dcontour
  2632                                           
  2633         2          5.0      2.5      0.0          if v is None:
  2634         1          2.0      2.0      0.0              v = []
  2635                                                   else:
  2636         1          4.0      4.0      0.0              if not isinstance(v, (list, tuple)):
  2637                                                           v = [v]
  2638                                           
  2639         1          3.0      3.0      0.0              res = []
  2640         1          1.0      1.0      0.0              invalid_els = []
  2641         2          5.0      2.5      0.0              for v_el in v:
  2642                                           
  2643         1          3.0      3.0      0.0                  if isinstance(v_el, BaseTraceType):
  2644                                                               # Clone input traces
  2645         1     138436.0 138436.0     44.4                      v_el = v_el.to_plotly_json()
  2646                                           
  2647         1          5.0      5.0      0.0                  if isinstance(v_el, dict):
  2648         1      99891.0  99891.0     32.0                      v_copy = deepcopy(v_el)

I believe it is unnecessary overhead to first create a dict (with .to_plotly_json) and then make a deepcopy of that dict to possibly pop one key (even when v_el is a dict and not a BaseTraceType, this still seems like unnecessary overhead to me).

Would appreciate any remarks!

Perhaps it might also be interesting to try avoid the .to_plotly_json call when it is a BaseTraceTtype - as this also takes a considerable amount of time (for large traces)?
-> @jonasvdd took a look at this and replaced the .to_plotly_json (which calls deepcopy on the self._props) with just self._props)


What I profiled:

x = np.arange(5_000_000)
%lprun -f plotly.graph_objects.Figure.add_trace plotly.graph_objects.Figure().add_trace(go.Scatter(x=x, y=x))

Before & after this PR:

image


if "type" in v_copy:
trace_type = v_copy.pop("type")
elif isinstance(v_el, Histogram2dcontour):
Copy link
Contributor Author

@jvdd jvdd Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this check never hits as in in line 2647 is checked whether v_el is a dict.

@nicolaskruchten
Copy link
Contributor

Thanks for this PR! Sorry I haven't had a chance to review it in detail yet, but the performance gains seem promising!

@jonasvdd
Copy link
Contributor

jonasvdd commented Feb 8, 2023

Hi @nicolaskruchten,

No worries, we both know how busy times can get!

Cheers,
Jonas & Jeroen

@jvdd jvdd requested a review from alexcjohnson June 7, 2023 13:18
Copy link
Collaborator

@alexcjohnson alexcjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃 Thanks very much @jvdd, and apologies for the delay!

@alexcjohnson alexcjohnson merged commit d1668b6 into plotly:master Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants