Skip to content

High CPU usage for skylightd 0.7.0.beta.2 #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fotinakis opened this issue Jul 28, 2015 · 9 comments
Closed

High CPU usage for skylightd 0.7.0.beta.2 #42

fotinakis opened this issue Jul 28, 2015 · 9 comments

Comments

@fotinakis
Copy link

Hello friends,

I upgraded to try out the 0.7.0.beta.2 and ended up rolling back because skylightd would run at 100% CPU, though it still seemed to be recording data. Thought I would drop a note here to tell you some specs so you might be able to debug:

  • ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux-gnu]
  • in a docker container

screen shot 2015-07-27 at 9 42 36 pm

Running strace against skylightd just showed a deluge of this:

epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 195803598}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 195909570}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 195996671}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 196104675}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 296288855}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 296339991}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 296376051}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 296413528}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 396693892}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 396813606}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 396902756}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 396975309}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 497161485}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 497194028}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
stat("/home/app/src/tmp/skylight.pid", {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 497402379}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 497441887}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 597712944}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 597846767}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 597939427}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 598014700}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 698317108}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 698430898}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 698531066}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 698597138}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 799079490}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 799195151}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 799278846}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 799344487}) = 0
epoll_wait(5, {}, 1024, 100)            = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 899601619}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 899695690}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 899787062}) = 0
clock_gettime(CLOCK_MONOTONIC, {16593, 899849655}) = 0

@carllerche
Copy link
Contributor

Thanks for the report. I am on it 👍

@carllerche
Copy link
Contributor

I've tracked it down to a bug in Rust's Condvar::wait_timeout_ms I am working w/ them to figure this out.

@fotinakis
Copy link
Author

Nice! 🎱

@carllerche
Copy link
Contributor

Waiting for rust-lang/rust#27373 to hit the nightlies and then I can push out another beta.

@carllerche
Copy link
Contributor

I just pushed 0.7.0-beta.3 which should fix this issue. Let me know if it does for you! Thanks.

@lukeasrodgers
Copy link

We just deployed 0.7.0 and got massive CPU spikes.

Sample strace from skylightd:

[pid 11857] gettimeofday({1439395772, 145526}, NULL) = 0
[pid 11857] futex(0x7f848b0590f0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11857] futex(0x7f848b02c2c4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 238093079, {1439395772, 391526000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 242223209}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 242334330}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 242446974}) = 0
[pid 11857] gettimeofday({1439395772, 146224}, NULL) = 0
[pid 11857] futex(0x7f848b0590f0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11857] futex(0x7f848b02c2c4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 238093081, {1439395772, 392224000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 242911266}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 243024690}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 243145028}) = 0
[pid 11857] gettimeofday({1439395772, 146931}, NULL) = 0
[pid 11857] futex(0x7f848b0590f0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11857] futex(0x7f848b02c2c4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 238093083, {1439395772, 392931000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 243672050}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 243790545}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 243900707}) = 0
[pid 11857] gettimeofday({1439395772, 147698}, NULL) = 0
[pid 11857] futex(0x7f848b0590f0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11857] futex(0x7f848b02c2c4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 238093085, {1439395772, 393698000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 244392414}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 244512133}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 244629721}) = 0
[pid 11857] gettimeofday({1439395772, 148415}, NULL) = 0
[pid 11857] futex(0x7f848b0590f0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11857] futex(0x7f848b02c2c4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 238093087, {1439395772, 394415000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 245113848}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 245224620}) = 0
[pid 11857] clock_gettime(CLOCK_MONOTONIC, {8334098, 245338117}) = 0

I would have tried to collect more info but we had to revert to 0.6 and redeploy because our app was becoming highly unresponsive.

EDIT

some more info:

ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux-gnu]
3.2.0-27-virtual #43-Ubuntu SMP Fri Jul 6 14:45:58 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
rails 3.2.22

@halorgium
Copy link
Contributor

@lukeasrodgers could you contact us via Intercom? Make sure you are running at least 0.7.1.
If you are, we can follow up privately where we can gather more information about your account.

@lukeasrodgers
Copy link

@halorgium I would be happy to, but not sure when I'm going to get around to trying 0.7.1 -- not high priority, and also I'm not super jazzed about the potentially deleterious effects on our users. I will try to replicate in a non-production environment when I get some time. Also: what is Intercom?

@halorgium
Copy link
Contributor

@lukeasrodgers understood, we'd also love to get to the bottom of the problem.
We use Intercom for all of our customer questions and bug reports: https://docs.skylight.io/filing-bugs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants