Skip to content

fix(profiling): fix potential StackCollector deadlock #4108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

Kyle-Verhoog
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog commented Aug 16, 2022

It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

Here's an example stacktrace of this happening:

[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/dbapi/__init__.py", line 91, in _trace_method
[18:50:46 UTC]   [pytest:unit]                     		s.set_tag(sql.ROWS, row_count)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/span.py", line 497, in __exit__
[18:50:46 UTC]   [pytest:unit]                     		self.finish()
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/span.py", line 239, in finish
[18:50:46 UTC]   [pytest:unit]                     		cb(self)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/tracer.py", line 703, in _on_span_finish
[18:50:46 UTC]   [pytest:unit]                     		active = self.current_span()
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/tracer.py", line 813, in current_span
[18:50:46 UTC]   [pytest:unit]                     		active = self.context_provider.active()
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/gevent/provider.py", line 41, in active
[18:50:46 UTC]   [pytest:unit]                     		return self._update_active(ctx)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/provider.py", line 103, in _update_active
[18:50:46 UTC]   [pytest:unit]                     		self.activate(new_active)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/gevent/provider.py", line 34, in activate
[18:50:46 UTC]   [pytest:unit]                     		super(GeventContextProvider, self).activate(context)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/provider.py", line 45, in activate
[18:50:46 UTC]   [pytest:unit]                     		self._hooks.emit(self.activate, ctx)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/_hooks.py", line 131, in emit
[18:50:46 UTC]   [pytest:unit]                     		func(*args, **kwargs)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 503, in <lambda>
[18:50:46 UTC]   [pytest:unit]                     		and _finalize_fairy(None, rec, pool, ref, echo),
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
[18:50:46 UTC]   [pytest:unit]                     		fairy._reset(pool)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
[18:50:46 UTC]   [pytest:unit]                     		pool._dialect.do_rollback(self)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 500, in do_rollback
[18:50:46 UTC]   [pytest:unit]                     		dbapi_connection.rollback()
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/dbapi/__init__.py", line 261, in rollback
[18:50:46 UTC]   [pytest:unit]                     		return self._trace_method(self.__wrapped__.rollback, span_name, {}, *args, **kwargs)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/dbapi/__init__.py", line 242, in _trace_method
[18:50:46 UTC]   [pytest:unit]                     		with pin.tracer.trace(name, service=ext_service(pin, self._self_config)) as s:
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/tracer.py", line 782, in trace
[18:50:46 UTC]   [pytest:unit]                     		activate=True,
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/tracer.py", line 685, in _start_span
[18:50:46 UTC]   [pytest:unit]                     		self.context_provider.activate(span)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/contrib/gevent/provider.py", line 34, in activate
[18:50:46 UTC]   [pytest:unit]                     		super(GeventContextProvider, self).activate(context)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/provider.py", line 45, in activate
[18:50:46 UTC]   [pytest:unit]                     		self._hooks.emit(self.activate, ctx)
[18:50:46 UTC]   [pytest:unit]                     	File "/opt/dogweb/lib/python2.7/site-packages/ddtrace/_hooks.py", line 131, in emit
[18:50:46 UTC]   [pytest:unit]                     		func(*args, **kwargs)

@Kyle-Verhoog Kyle-Verhoog changed the title fix(profiling): fix possible deadlock fix(profiling): fix potential StackCollector deadlock Aug 16, 2022
@Kyle-Verhoog Kyle-Verhoog force-pushed the kylev/fix-profiling-deadlock branch from 23004ed to aa66610 Compare August 16, 2022 21:28
@Kyle-Verhoog Kyle-Verhoog requested a review from jd August 17, 2022 14:30
@jd
Copy link
Contributor

jd commented Aug 17, 2022

Would you have an example traceback to include? That'd make a good illustration for the commit at least.
Otherwise LGTM.

@Kyle-Verhoog
Copy link
Member Author

@jd added!

@brettlangdon brettlangdon marked this pull request as ready for review August 17, 2022 16:00
@brettlangdon brettlangdon requested a review from a team as a code owner August 17, 2022 16:00
brettlangdon
brettlangdon previously approved these changes Aug 17, 2022
jd
jd previously approved these changes Aug 17, 2022
@jd
Copy link
Contributor

jd commented Aug 17, 2022

@Mergifyio backport 1.4 1.3 0.x 0.61

@mergify
Copy link
Contributor

mergify bot commented Aug 17, 2022

backport 1.4 1.3 0.x 0.61

✅ Backports have been created

@Kyle-Verhoog Kyle-Verhoog dismissed stale reviews from jd and brettlangdon via 78c35be August 17, 2022 17:18
@Kyle-Verhoog Kyle-Verhoog force-pushed the kylev/fix-profiling-deadlock branch from 6ca0b9f to 78c35be Compare August 17, 2022 17:18
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.
@Kyle-Verhoog Kyle-Verhoog force-pushed the kylev/fix-profiling-deadlock branch from 78c35be to c5708ff Compare August 17, 2022 17:18
@Kyle-Verhoog Kyle-Verhoog merged commit b0967c4 into 1.x Aug 17, 2022
@Kyle-Verhoog Kyle-Verhoog deleted the kylev/fix-profiling-deadlock branch August 17, 2022 18:08
@github-actions github-actions bot added this to the v1.5.0 milestone Aug 17, 2022
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

(cherry picked from commit b0967c4)
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

(cherry picked from commit b0967c4)
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

(cherry picked from commit b0967c4)
mergify bot pushed a commit that referenced this pull request Aug 17, 2022
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

(cherry picked from commit b0967c4)
mergify bot added a commit that referenced this pull request Aug 18, 2022
#4115)

This is an automatic backport of pull request #4108 done by [Mergify](https://mergify.com).


---


<details>
<summary>Mergify commands and options</summary>

<br />

More conditions and actions can be found in the [documentation](https://docs.mergify.com/).

You can also trigger Mergify actions by commenting on this pull request:

- `@Mergifyio refresh` will re-evaluate the rules
- `@Mergifyio rebase` will rebase this PR on its base branch
- `@Mergifyio update` will merge the base branch into this PR
- `@Mergifyio backport <destination>` will backport this PR on `<destination>` branch

Additionally, on Mergify [dashboard](https://dashboard.mergify.com/) you can:

- look at your merge queues
- generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com
</details>
mergify bot pushed a commit that referenced this pull request Aug 25, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`. 

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.
mergify bot pushed a commit that referenced this pull request Aug 25, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.

(cherry picked from commit b0d8765)
mergify bot pushed a commit that referenced this pull request Aug 25, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.

(cherry picked from commit b0d8765)
mergify bot pushed a commit that referenced this pull request Aug 25, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.

(cherry picked from commit b0d8765)
mergify bot pushed a commit that referenced this pull request Aug 25, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.

(cherry picked from commit b0d8765)

# Conflicts:
#	ddtrace/internal/nogevent.py
Kyle-Verhoog added a commit that referenced this pull request Sep 1, 2022
It is possible that while the profiling StackCollector is handling the
on_span_activate event that a gc occurs which could result in a weakref
finalize callback being called which then could activate a new span.
This sequence of events results in a deadlock as the lock the
StackCollector uses is not reentrant.

A regression test would be great but we cannot think of a tractable test
case which wouldn't be flaky.

(cherry picked from commit b0967c4)

Co-authored-by: Kyle Verhoog <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Kyle-Verhoog pushed a commit that referenced this pull request Sep 6, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

(cherry picked from commit b0d8765)

Co-authored-by: Tahir H. Butt <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Kyle-Verhoog added a commit that referenced this pull request Sep 8, 2022
## Description

#4108 replaced an internal use of the original `threading.Lock` function rather than the `gevent.threading.Lock` with the original `threading.RLock` function. However, at least in Python 2, there is an indirection with the implementation of `threading._RLock` class where it the `threading._allocate_lock` called is in fact `gevent.threading._allocate_lock`.

When the gevent patched lock is used instead, we can encounter deadlocks or even gevent exceptions like the following:

```
ddtrace.profiling.collector.stack._ThreadSpanLinks.get_active_span_from_thread_id
  File "ddtrace/profiling/_threading.pyx", line 112, in ddtrace.profiling._threading._ThreadLink.get_object
  File "/usr/local/lib/python2.7/threading.py", line 174, in acquire
    rc = self.__block.acquire(blocking)
  File "src/gevent/_semaphore.py", line 198, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4541)
  File "src/gevent/_semaphore.py", line 226, in gevent._semaphore.Semaphore.acquire (src/gevent/gevent._semaphore.c:4367)
  File "src/gevent/_semaphore.py", line 166, in gevent._semaphore.Semaphore._do_wait (src/gevent/gevent._semaphore.c:3562)
  File "/usr/local/lib/python2.7/site-packages/gevent/hub.py", line 630, in switch
    return RawGreenlet.switch(self)
gevent.hub.LoopExit: ('This operation would block forever', <Hub at 0x7fe2bda052d0 epoll pending=0 ref=0 fileno=267>)
```

The added test for the issue: https://app.circleci.com/pipelines/github/DataDog/dd-trace-py/20715/workflows/602ec8d6-d8ba-4191-ada8-0da76adbdc6b/jobs/1405224

## Checklist
- [x] Title must conform to [conventional commit](https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional).
- [x] Add additional sections for `feat` and `fix` pull requests.
- [x] Ensure tests are passing for affected code.
- [x] [Library documentation](https://github.com/DataDog/dd-trace-py/tree/1.x/docs) and/or [Datadog's documentation site](https://github.com/DataDog/documentation/) is updated. Link to doc PR in description.

## Reviewer Checklist
- [x] Title is accurate.
- [x] Description motivates each change.
- [x] No unnecessary changes were introduced in this PR.
- [x] PR cannot be broken up into smaller PRs.
- [x] Avoid breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary.
- [x] Tests provided or description of manual testing performed is included in the code or PR.
- [x] Release note has been added for fixes and features, or else `changelog/no-changelog` label added.
- [x] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.

(cherry picked from commit b0d8765)

# Conflicts:
#	ddtrace/internal/nogevent.py

Co-authored-by: Tahir H. Butt <[email protected]>
Co-authored-by: Kyle Verhoog <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants