-
Notifications
You must be signed in to change notification settings - Fork 62
Goroutine Leak #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @antoniomika, what a nice investigation, thank you so much! I don't remember why I didn't put any |
I think we can write something like that: errorCacheCh := make(chan error)
defer close(errorCacheCh)
go func(vr *http.Request, cw *CustomWriter) {
defer func() {
if r := recover(); r != nil {
s.Configuration.GetLogger().Infof("recovered due to closed errorCacheCh chan, the request context has finished prematurely %s", req.URL)
}
}()
prometheus.Increment(prometheus.NoCachedResponseCounter)
errorCacheCh <- s.Upstream(cw, vr, next, requestCc, cachedKey, uri)
}(req, customWriter) souin/pkg/middleware/middleware.go Lines 941 to 945 in 842d55d
WDYT? |
@darkweak that should work, but shouldn't we be canceling the upstream request? The other reason we're looking into this as well is because we're a bit worried about souin causing a large number of upstream requests when cache is cleared (or some other issue occurs). Again not too familiar with the code base so not sure if that's even something we should be worried about. It's hard for us to tell what's happening because we're using the middleware, so we wouldn't see an uptick in requests to our load balancer to check so we're trying to debug by checking ancillary services. |
We should.
What do you mean? Do you want the cache to serve the content as stale? |
Kk, I was thinking the upstream request would take a context that we could cancel to stop it. I imagine that'd be a larger refactor though. I should have more time over the weekend to look into that. As for the second issue, it's a bit more complicated to explain but I'll try to include as much detail as I can. We use souin with the caching middleware (https://github.com/picosh/pico/blob/21ef05522522d6a53fb21bd6d07e2e0f831a8a8e/pkg/apps/pgs/web.go#L41). When a user uploads an item to pgs, we then clear the cache for that item (https://github.com/picosh/pico/blob/21ef05522522d6a53fb21bd6d07e2e0f831a8a8e/pkg/apps/pgs/web.go#L306). This all seems to work well, but when a large number of files are uploaded at once, we then see a large spike in get requests to our object store instance. This is semi expected as rsync will utilize range requests for checksumming so we're not too surprised this occurs, but the jump is much larger than we'd normally expect. The other thought I had was that souin then triggers an upstream request for the cached item that we've purged but I haven't been able to validate that. We have Prometheus enabled so if there's any metrics you can think of for us to check, I'd be happy to. This is all to say I don't really think this is a souin issue and more-so a problem of how we've implemented things. I just wanted to cover my bases so when I saw the goroutine leak, I figured it'd be a good time to look into it. |
Hello!
We're using souin over at pico (repo) for our caching system. Lately, we've been noticing goroutine leaks from souin over time. It doesn't look like it's terribly bad, but we're worried about implications of this long term.
For example, here's a graph from two different instances of pgs w/ souin middleware:
It does look like this is associated with some memory leak, but it's hard to tell by how much and I'd prefer to not throw in more confounders. Here's how memory looks though (in MBs). First resident memory:
And next virtual:
I went ahead and grabbed a dump of the goroutines stacks via a
SIGABRT
on the process. Those logs can be found hereI've traced it back to the middleware, specifically on the error channel here:
souin/pkg/middleware/middleware.go
Lines 942 to 971 in 842d55d
My best guess is the request context is completed prior to the upstream request finishing. This results in the error channel never being read and the request waiting. We probably want a way to cancel the upstream request when the context resolves and a defer to close that channel. We need to pay special attention to what is sent if it is closed though.
I'd provide a PR but I'm not super familiar with the souin codebase and if there are any issues posed with a change like this. Let me know if there's anything I can help you with though!
The text was updated successfully, but these errors were encountered: