Skip to content

[UR][L0] Add L0 teardown notification support and simply teardown with proxy loader support #18496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

nrspruit
Copy link
Contributor

  • Add support for L0 teardown notification in the static L0 loader.
  • Add loading of the L0 dynamic loader in the windows proxy loader to ensure that the L0 dynamic loader is usable during teardown.

@nrspruit
Copy link
Contributor Author

related to oneapi-src/level-zero#333

@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 21:55 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:20 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:20 — with GitHub Actions Inactive
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from 8f1f54e to a21160d Compare May 15, 2025 22:37
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from a21160d to ecefdbe Compare May 15, 2025 22:44
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:45 — with GitHub Actions Inactive
@cperkinsintel
Copy link
Contributor

cperkinsintel commented May 16, 2025

@nrspruit - the test failures reported here seem legit. But they are strange.

The buffer_create.cpp test works fine for me if I build and run it manually. Everything working exactly as it should. But when run from llvm-lit, I can get this output:

# | 
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, nullptr)
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, ZeDevices.data())
# | ZE ---> zeDeviceGetProperties(ZeDevices[D], &device_properties)
# | ZE ---> zeDriverGetApiVersion(ZeDriver, &ZeApiVersion)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, nullptr)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, ZeExtensions.data())
# | ZE ---> zelLoaderTranslateHandle(ZEL_HANDLE_DRIVE
# | ...
# `---data was truncated--------

The data was truncated comes from llvm-lit. Essentially, there seemed to have been an error when zelLoaderTranslateHandle was called. I think the next argument it is supposed to log is the "ZeDriver", so there might be something going on. It didn't even output the last "R"

The test failed on Arc, I think. But I can repro the failure on pvc-01.

@nrspruit nrspruit force-pushed the L0_teardown_notification branch from ecefdbe to df6d57d Compare May 16, 2025 20:21
@nrspruit
Copy link
Contributor Author

@nrspruit - the test failures reported here seem legit. But they are strange.

The buffer_create.cpp test works fine for me if I build and run it manually. Everything working exactly as it should. But when run from llvm-lit, I can get this output:

# | 
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, nullptr)
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, ZeDevices.data())
# | ZE ---> zeDeviceGetProperties(ZeDevices[D], &device_properties)
# | ZE ---> zeDriverGetApiVersion(ZeDriver, &ZeApiVersion)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, nullptr)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, ZeExtensions.data())
# | ZE ---> zelLoaderTranslateHandle(ZEL_HANDLE_DRIVE
# | ...
# `---data was truncated--------

The data was truncated comes from llvm-lit. Essentially, there seemed to have been an error when zelLoaderTranslateHandle was called. I think the next argument it is supposed to log is the "ZeDriver", so there might be something going on. It didn't even output the last "R"

The test failed on Arc, I think. But I can repro the failure on pvc-01.

that error has existed since before this change, so it may be a real issue, but it is not part of this PR.

See here:
https://github.com/intel/llvm/actions/runs/15042177600/job/42277602962

@nrspruit nrspruit marked this pull request as ready for review May 16, 2025 20:24
@nrspruit nrspruit requested review from a team as code owners May 16, 2025 20:24
@nrspruit nrspruit requested a review from maarquitos14 May 16, 2025 20:25
@cperkinsintel
Copy link
Contributor

Test failures are known: #18463

@nrspruit nrspruit force-pushed the L0_teardown_notification branch from df6d57d to 7229f1d Compare May 16, 2025 20:47
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 20:48 — with GitHub Actions Inactive
@sarnex sarnex requested review from a team as code owners May 16, 2025 21:00
@sarnex sarnex requested a review from konradkusiak97 May 16, 2025 21:00
@sarnex sarnex requested a review from jchlanda May 16, 2025 21:00
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 21:28 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 21:28 — with GitHub Actions Inactive
@gmlueck
Copy link
Contributor

gmlueck commented May 16, 2025

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

@nrspruit
Copy link
Contributor Author

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

the rebase failed, let me fix

…h proxy loader support

- Add support for L0 teardown notification in the static L0 loader.
- Add loading of the L0 dynamic loader in the windows proxy loader
  to ensure that the L0 dynamic loader is usable during teardown.

Signed-off-by: Neil R. Spruit <[email protected]>
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from 7229f1d to 0987d29 Compare May 16, 2025 22:43
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 22:44 — with GitHub Actions Inactive
@nrspruit
Copy link
Contributor Author

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

the rebase failed, let me fix

fixed, for some reason the pulldown was included with I rebased.

@@ -168,6 +168,11 @@ void preloadLibraries() {
loadAdapter(UR_LIBRARY_NAME(adapter_cuda));
loadAdapter(UR_LIBRARY_NAME(adapter_hip));
loadAdapter(UR_LIBRARY_NAME(adapter_native_cpu));
// Load the Level Zero loader dynamic library to ensure it is loaded during
// the runtime. This is necessary to avoid the level zero loader from being
// unloaded prematurely. the Only trusted loader is the one that is loaded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// unloaded prematurely. the Only trusted loader is the one that is loaded
// unloaded prematurely. The only trusted loader is the one that is loaded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess cuda reviewers were pulled in because of the rebase problem, but still, LGTM :)

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants