-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Host Task] Bad performance of consecutively submitted host tasks onto an in-order queue #18500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Initial analysis indicates that we're tracking unnecessary dependencies in llvm/sycl/source/detail/scheduler/commands.hpp Lines 170 to 175 in c568f2e
For example, we have 4 tasks We are currently tracking all direct and in-direct blocking relations between all enqueued commands, after
It appears redundant to track
The complexity of notifications is directly proportional to the number of entries in I'll try to draft a PR to fix this. |
…n time explosion for long dependency chains This commit addresses a performance issue observed when submitting consecutive host tasks to an in-order queue without explicit `wait()`. The execution time of each host task was found to increase significantly as the number of submissions grew: intel#18500. The root cause was identified as the unnecessary tracking of indirect blocking dependencies in `MBlockedUsers`. Previously, all direct and indirect blocking relations between enqueued commands were tracked, causing a siginificant increase in notification time upon task completion. For example, in a sequence of tasks `A, B, C, D`, `A.MBlockedUsers` would redundantly include `{C, D}`, even though these tasks are already blocked by `B`. To resolve this, the `enqueueCommand` function in the Scheduler was enhanced to include a `RecursionDepth` parameter. This change prevents excessive growth in the size of `Cmd->MBlockedUsers` in long dependency chains by tracking only direct blocking dependencies, thereby reduction notification time upon command completion.
#18501 fixes this: Results for different
|
repeat | 10 | 100 | 1000 | 3000 | 10000 |
---|---|---|---|---|---|
wait.out | 16 | 162 | 1617 | 4853 | 16184 |
nowait.out | 11 | 106 | 1396 | 12996 | 519977 |
wait.out + #18501 | 16 | 162 | 1615 | 4847 | 16154 |
nowait.out + #18501 | 11 | 106 | 1103 | 3162 | 11164 |
Avg time in ms
repeat | 10 | 100 | 1000 | 3000 | 10000 |
---|---|---|---|---|---|
wait.out | 1.6 | 1.62 | 1.617 | 1.618 | 1.6184 |
nowait.out | 1.1 | 1.06 | 1.396 | 4.332 | 51.9977 |
wait.out + #18501 | 1.6 | 1.62 | 1.615 | 1.616 | 1.6154 |
nowait.out + #18501 | 1.1 | 1.06 | 1.103 | 1.054 | 1.1164 |
Now consecutive submission w/o waiting constantly performs better than explicit waiting for the given test case.
Describe the bug
While submitting consecutive host tasks to an in-order queue without explicit
wait()
, the execution time of each host task explodes as the number of submission increases.To reproduce
Reproducing code
Compile
Compile the code w/ and w/o explicit
wait
for each submission.Run
Pass the number of consecutive submission (
repeat
) via first argument.Results for different
repeat
Total time in ms
Avg time in ms
Expected behavior
Even w/o explicit
wait()
for each submission (onto an in-order queue), the average execution time of each host task should be around 1ms. The 50x slowdown whenrepeat==10000
is not expected.Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: