Skip to content

fix(rcm): update the forking behavior in remote config [backport 2.0] #7611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 16, 2023

Conversation

avara1986
Copy link
Member

@avara1986 avara1986 commented Nov 15, 2023

Backport 1da4fc4 from #7548 to 2.0

Context

The Remote Configuration Publisher/Subscriber System #5464 restarts all pubsub instances when the application forks (for example, gunicorn workers, uwsgi workers, etc.).

image

Dynamic Instrumentation needs to update the pubsub instance at this point because the probe mechanism should run in the child process. For that, DI needs the callback as the method of an instance of Debugger, which lives in the child process.

https://github.com/DataDog/dd-trace-py/blob/2.x/ddtrace/debugging/_debugger.py#L276

Problem description

When the application forks and restarts the subscribers, this happens before the debugger updates its callback instance.

10348 starting subscribers
10348 restarting the debugger
10348 register callback 4429382528 <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> 4406068560 shared_data 4398747792

This results in the registration of the callback in the child processes but the execution of callbacks using the instance of the parent process.

94621 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4363974784 Debugger._on_configuration id: 4360362576
94638 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
94639 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
94640 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
  • Child processes: exec callbacks.
94621 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94638 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94639 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94640 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576

As a result, we're registering callback id 4364006016 but calling 4360362576 (which is the instance of the parent process).

PR Description

This PR removes the restart of publisher-subscriber instances in Remote Configuration at fork and delegates it to the products that have integration with Remote Config, namely AppSec and Dynamic Instrumentation.

Checklist

  • Change(s) are motivated and described in the PR description.
  • Testing strategy is described if automated tests are not included in the PR.
  • Risk is outlined (performance impact, potential for breakage, maintainability, etc).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Library release note guidelines are followed. If no release note is required, add label changelog/no-changelog.
  • Documentation is included (in-code, generated user docs, public corp docs).
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Title is accurate.
  • No unnecessary changes are introduced.
  • Description motivates each change.
  • Avoids breaking API changes unless absolutely necessary.
  • Testing strategy adequately addresses listed risk(s).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Release note makes sense to a user of the library.
  • Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy
  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

The Remote Configuration Publisher/Subscriber System
#5464 restarts all pubsub
instances when the application forks (for example, [gunicorn
workers](https://docs.gunicorn.org/en/stable/design.html#server-model),
uwsgi workers, etc.).

![image](https://github.com/DataDog/dd-trace-py/assets/6352942/774fe838-5374-4edc-82c6-51cc563fe66e)

Dynamic Instrumentation needs to update the pubsub instance at this
point because the probe mechanism should run in the child process. For
that, DI needs the callback as the method of an instance of Debugger,
which lives in the child process.

https://github.com/DataDog/dd-trace-py/blob/2.x/ddtrace/debugging/_debugger.py#L276

When the application forks and restarts the subscribers, this happens
before the debugger updates its callback instance.

```
10348 starting subscribers
10348 restarting the debugger
10348 register callback 4429382528 <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> 4406068560 shared_data 4398747792
```

This results in the registration of the callback in the child processes
but the execution of callbacks using the instance of the parent process.

* Parent process: register a callback
[on_configuration](https://github.com/DataDog/dd-trace-py/blob/2.x/ddtrace/debugging/_debugger.py#L654)
for DI

```
94621 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4363974784 Debugger._on_configuration id: 4360362576
```

* Child processes: register callbacks
[on_configuration](https://github.com/DataDog/dd-trace-py/blob/2.x/ddtrace/debugging/_debugger.py#L654)
for DI

```
94638 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
94639 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
94640 register callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.STOPPED: 'stopped'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id: 4364006016
```

* Child processes: exec callbacks.
```
94621 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94638 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94639 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
94640 _exec_callback <bound method Debugger._on_configuration of Debugger(status=<ServiceStatus.RUNNING: 'running'>)> PubSub Isntance id: 4392569856 Debugger._on_configuration id:  4360362576
```

As a result, we're registering callback id **4364006016** but calling
**4360362576** (which is the instance of the parent process).

This PR removes the restart of publisher-subscriber instances in Remote
Configuration at fork and delegates it to the products that have
integration with Remote Config, namely AppSec and Dynamic
Instrumentation.

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Co-authored-by: Gabriele N. Tornetta <[email protected]>
(cherry picked from commit 1da4fc4)
@avara1986 avara1986 added ASM Application Security Monitoring Dynamic Instrumentation Dynamic Instrumentation/Live Debugger labels Nov 15, 2023
@avara1986 avara1986 marked this pull request as ready for review November 15, 2023 15:55
@avara1986 avara1986 requested review from a team as code owners November 15, 2023 15:55
@avara1986 avara1986 enabled auto-merge (squash) November 15, 2023 15:55
@avara1986 avara1986 merged commit 1fe678b into 2.0 Nov 16, 2023
@avara1986 avara1986 deleted the backport-7548-to-2.0 branch November 16, 2023 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASM Application Security Monitoring Dynamic Instrumentation Dynamic Instrumentation/Live Debugger
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants