Skip to content

Commit dda1347

Browse files
agray3teleprint-me
authored andcommitted
Avoid unnecessarily disabling CUDA graphs (ggml-org#7302)
As discussed in PR ggml-org#6766, CUDA graphs were being disabled in the presence of long prompts. This fixes the issue by avoiding the consective update counter from incrementing unnecessarily for tokens in which cuda graphs are disabled due to batch size > 1.
1 parent 6fb91c1 commit dda1347

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml-cuda.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2558,7 +2558,7 @@ GGML_CALL static enum ggml_status ggml_backend_cuda_graph_compute(ggml_backend_t
25582558
}
25592559

25602560
// Disable CUDA graphs (from the next token) if the use-case is demanding too many consecutive graph updates.
2561-
if (cuda_graph_update_required) {
2561+
if (use_cuda_graph && cuda_graph_update_required) {
25622562
cuda_ctx->cuda_graph->number_consecutive_updates++;
25632563
} else {
25642564
cuda_ctx->cuda_graph->number_consecutive_updates = 0;

0 commit comments

Comments
 (0)