-
Notifications
You must be signed in to change notification settings - Fork 11.8k
SYCL: Avoid using SYCL-Graph for unsupported nodes #13587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.
# ifndef NDEBUG | ||
GGML_LOG_DEBUG("%s: disabling SYCL graphs due to unsupported node type\n", __func__); | ||
# endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit but maybe worth having this as GGML_LOG_INFO
and printing which node type is unsupported?
@@ -3810,11 +3810,38 @@ static void ggml_backend_sycl_graph_compute_impl(ggml_backend_sycl_context * syc | |||
} | |||
} | |||
|
|||
#ifdef GGML_SYCL_GRAPH | |||
static bool check_node_graph_compatibility(ggml_cgraph * cgraph) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA checks if one of the buffer is split, meaning multiple devices are used, to disable the graph. I assume you don't test that, may be safe to disable it as well? Or do you expect SYCL-Graph would work well in such a case?
I'd suggest just checking if ggml_sycl_info().device_count > 1
to disable SYCL-Graph.
Currently on a CUDA backend to SYCL when running
GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0
there are two operations that throw an exception from the blocking waits during queue recording.-o CONCAT
: Use of blocking waits on a queue that's being recorded in ggml_sycl_op_concat.-o MUL_MAT_ID
: Blocking wait on a recording queue for a copy to host memory in ggml_sycl_mul_mat_id.We've noticed that
ggml-cuda.cu
has thecheck_node_graph_compatibility_and_refresh_copy_ops method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to
ggml-sycl.cpp
for checking if a graph can be used for the operations even if a user has asked for it to be enabled.