Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Intel Triton runtime enumerates the sycl devices list which maybe not consistent with the Torch runtime. #1916

Open
chengjunlu opened this issue Aug 19, 2024 · 2 comments
Assignees
Labels
bug Something isn't working research

Comments

@chengjunlu
Copy link
Contributor

In the code Driver.c, we enumerate the SYCL devices list from the SYCL context directly and save it in an internal vector.

There maybe an issue that the IPEX uses the difference indexing to refer different SYCL device than Triton.

@vlad-penkin vlad-penkin added bug Something isn't working research labels Aug 19, 2024
@chengjunlu
Copy link
Contributor Author

chengjunlu commented Aug 23, 2024

Confirmed with Pytorch team.

The implementation of the upstream Pytorch is using the indexing of the order of the devices enumerated from the SYCL API as the torch device identity to refer the underlaying SYCL device.

  • No extra sorting the enumeration results.
  • No extra tiling and sub-partitioning on the SYCL devices.
  • No extra filter and reordering on iGPU and dGPU.

To support JIT the Triton kernel with Pytorch framework correctly, the Triton could enumerate the SYCL devices from SYCL runtime by the same practice. And the torch device identity should map to the same underlaying SYCL device correctly.

For long term, we want to decouple this logic of the Pytorch and Triton.
We propose that the Pytorch should supply a method to return the SYCL device without the assumption of how the SYCL devices are mapped.

@alexbaden
Copy link
Contributor

In the NVIDIA backend the active device is loaded directly from PyTorch: https://github.com/triton-lang/triton/blob/main/python/triton/backends/driver.py#L29
There is also a method for getting the current stream, akin to the sycl::queue.
If we can retrieve both those objects from PyTorch at the appropriate time then it is not clear that we need the internal state we are storing in driver.c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working research
Projects
None yet
Development

No branches or pull requests

5 participants