Skip to content

[Draft][triton] Implemented 8bit optimizers on XPU #1692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Egor-Krivov
Copy link
Contributor

@Egor-Krivov Egor-Krivov commented Jul 1, 2025

Implemented 8bit optimizers in triton to use of XPU devices.

Currently there is no interface for kernel registration of 8bit optimizers, so I just hardcoded usage for testing.

Tested with BNB_TEST_DEVICE="xpu" pytest --show-capture=no -q tests/test_optim.py::test_optimizer8bit

Benchmarked essentially on the same test, getting better performance than torch optimizer.

This PR contains 3 implementations:

  1. Pure torch implementation that materializes during quantization, hence large memory usage. Can be used for testing purposes.
  2. Combination of torch and triton kernels for quantization+dequantization
  3. Pure triton implementation - fastest, used currenlty.

@Egor-Krivov Egor-Krivov changed the title [Draft] Implemented 8bit optimizers on XPU [Draft][triton] Implemented 8bit optimizers on XPU Jul 1, 2025
@matthewdouglas matthewdouglas added Intel Optimizers Issues or feature requests relating to optimizers labels Jul 1, 2025
@matthewdouglas matthewdouglas added this to the v0.47.0 milestone Jul 1, 2025
@matthewdouglas matthewdouglas self-requested a review July 1, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intel Optimizers Issues or feature requests relating to optimizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants