-
Notifications
You must be signed in to change notification settings - Fork 712
Add SYCL Kernels for XPU backend #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fix transpose
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
revert cpu changes
Signed-off-by: jiqing-feng <[email protected]>
remove check for better performance
Signed-off-by: jiqing-feng <[email protected]>
Can we use a more accurate title for the commit? or reviewers would get confused if all SYCL kernels are included in the PR. |
Fix xpu check
Signed-off-by: jiqing-feng <[email protected]>
fix device check
Signed-off-by: jiqing-feng <[email protected]>
fix tests
Hi @matthewdouglas . The PR is ready to be reviewed. The sycl kernel could get 0-150% speed-up compared to triton on 4bit models. Could you take the 1st round review? Please let me know if you have any concerns. Thanks! |
This is the first PR for SYCL kernels targeting QLoRA, I have added detailed description. |
Signed-off-by: jiqing-feng <[email protected]>
fix xpu log
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Remove ipex entirely
Signed-off-by: jiqing-feng <[email protected]>
fix lint
When I tried to compile it, I had issues with https://github.khronos.org/SYCL_Reference/iface/nd_range.html https://github.khronos.org/SYCL_Reference/iface/nd_item.html |
I replaced types as described above and tested implementation. In my experiment SYCL implementation was about 2x faster for token generation than triton. I guess due to fused dequant + matmul. Triton compiler currently have an issue with that: intel/intel-xpu-backend-for-triton#4327. However, some tests failed
|
* fix logs Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>
This is the pull request for the SYCL Kernels targeting the XPU backend.