Skip to content

Send errors metrics for 5xx response from API Gateway, Lambda Function URL, or ALB #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
5 changes: 5 additions & 0 deletions datadog_lambda/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ def __init__(self, func):
os.environ.get("DD_TRACE_MANAGED_SERVICES", "true").lower() == "true"
)
self.response = None
self.already_submitted_errors_metric_before = False

if self.extractor_env:
extractor_parts = self.extractor_env.rsplit(".", 1)
Expand Down Expand Up @@ -143,6 +144,7 @@ def __call__(self, event, context, **kwargs):
return self.response
except Exception:
submit_errors_metric(context)
self.already_submitted_errors_metric_before = True
if self.span:
self.span.set_traceback()
raise
Expand Down Expand Up @@ -190,6 +192,9 @@ def _after(self, event, context):
status_code = extract_http_status_code_tag(self.trigger_tags, self.response)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to the original conversation, but want to point out, in case of invocation fails, self.response would still hold the value from the previous good invocation, and you would end up emitting an error metric based on the previous invocation instead of the current one.

if status_code:
self.trigger_tags["http.status_code"] = status_code
if not self.already_submitted_errors_metric_before:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way to solve this problem is to reset self.response to None in _before (or in _call before the try block). This way if the invocation fails, self.response would still be None and you won't have a status_code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in your current implementation already_submitted_errors_metric_before never resets to False once set True. __init__ only gets called once on cold start. You need to reset it in _before (or in _call before the try block)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. self.response is already initialized to None here:

self.response = None
.

And when the exception happens, self.response will still be None. So, we actually do not need the flag (self. already_submitted_errors_metric_before ) to avoid sending errors metrics twice since the statusCode will be None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's initialized to None, but it doesn't get reset to None between invocations. E.g., you have a good invocation, which would set self.response, then you have a failed invocation, self.response won't reset to None. extract_http_status_code_tag would extract the status code from the last good invocation instead of from the current failed invocation.

Even putting your use case aside, the state of self.response probably should be reset to None on before every invocation, keeping the invocation response value from the previous invocation in the current invocation is meaningless and can only lead to bugs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one context you missed here is that Lambda isn't truly stateless, a sandbox/container is reused for many invocations until it gets recycled by AWS. https://pfisterer.dev/posts/aws-lambda-container-reuse might be a good read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the post! Just want to be sure (since the post mentioned only about shared folder -- It's possible that a container is reused, which causes the /tmp folder to be shared between invocations.), does the singleton wrapper got reused between multiple invocations?
datadog_lambda_wrapper = _LambdaDecorator

I plan to reset self.response at the beginning of _before() anyway. Thanks for calling this out!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a new container starts up on first invocation, your python app get loaded into memory, that's called cold start, then from there, the container will be reused for subsequent invocations, until AWS recycles it, or AWS needs to launch more containers to handle more requests (one lambda container can only handle one request/invocation at a time)

if len(status_code) == 3 and status_code.startswith("5"):
submit_errors_metric(context)
# Create a new dummy Datadog subsegment for function trigger tags so we
# can attach them to X-Ray spans when hybrid tracing is used
if self.trigger_tags:
Expand Down