-
Notifications
You must be signed in to change notification settings - Fork 215
Limit CI runs to 4 threads #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/ci.yml
Outdated
@@ -60,7 +60,7 @@ jobs: | |||
- name: Run api tests | |||
run: cargo test -p bootloader_api | |||
- name: Run integration tests | |||
run: cargo test | |||
run: cargo test -- --test-threads=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it already be limited to the amount of cores by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that by default, rust gets the available "parallelism" from the OS: https://github.com/rust-lang/rust/blob/master/library/test/src/helpers/concurrency.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux VM's get 2 cores, so thread::available_parallelism()
should return 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to be, unfortunately. Another option would be to get the list of VM tests and run them with a matrix instead. I can write the code to generate the matrix for GH actions.
But looking at the logs, it appeared it was running at least 4 tests in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And on top of that, oversubscription here wouldn't be terrible as long as it's limited. The tests spend a good amount of time doing IO, the CPU intense part is relatively quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the issue was simply a slow action runner.
Maybe, but 20 minutes without any outputs should never happen, even on slow machines. So my guess is that some panic occured in the above case.
Regarding the general slowness of the Windows tests: Is QEMU really expected to be so slow on Windows? Maybe the issue is that we're running multiple threads at the same time. Could you try to update this PR to --test-threads=1
to see whether this improves things?
One thing that we should definitely do is to increase the timeout for job, e.g. to timeout-minutes: 60
.
I think a better solution here is to generate a matrix from the list of tests, and run them all separately. This has the added benefit of quickly seeing which test failed, and makes the logs easier to read.
I'm not sure about this approach. It makes it easy to accidentally forget some tests (e.g. when a new test is added) and it spams the check list even more. Also, the number of free concurrent jobs is limited anyway, I think to 20 per organization. So running each test as a separate job will probably exceed this limit and lead to wait times, so we would not gain much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, I mean automate, via building a matrix dynamically using the output of cargo test -- --list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the rest makes sense. I'll try a single thread, and increased timeout and see how we do. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can also easily limit concurrency with the matrix too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the general slowness of the Windows tests: Is QEMU really expected to be so slow on Windows? Maybe the issue is that we're running multiple threads at the same time. Could you try to update this PR to --test-threads=1 to see whether this improves things?
Added. Building as I write this.
One thing that we should definitely do is to increase the timeout for job, e.g. to timeout-minutes: 60.
Also added.
Thanks for the update! Limiting the test runner to a single thread worked quite well on the first try: The integration tests were done in 8 minutes on Windows: https://github.com/rust-osdev/bootloader/actions/runs/3842801795/jobs/6544467654 I restarted the job for good measure, but unfortunately it hangs again on the second try: https://github.com/rust-osdev/bootloader/actions/runs/3842801795/jobs/6550043485 . So I it looks like there is really something going wrong sometimes which results in an endlessly running test. I think the best path forward is to finish #314 first to see whether we run into some panic. After we hopefully found the issue, we can experiment with different thread counts to improve the CI's run time. |
During CI/CD runs of #307 the number of parallel tests was having an impact on overall execution time.
In order to improve performance, this updates the github actions to limit test threads to 4.
This was previously the effective limit, as no test file had more than 4 tests in it.
Fixes #310