-
-
Notifications
You must be signed in to change notification settings - Fork 319
Description
OS
Windows
GPU Library
CUDA 12.x
Python version
3.11
Pytorch version
2.7.0
Model
exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl
Describe the bug
Attempting to build ExLlamaV2 on Windows 11 with RTX 5090 (CUDA 12.8 / Torch 2.7.0+cu128) succeeds at kernel build stage, but fails to import ExLlamaV2Generator due to inconsistent packaging across WHL and repo source. Some WHLs do not expose ExLlamaV2Generator, while latest source versions split generator code across multiple modules and require experimental/ folder which may be missing in partial source zips.
Reproduction steps
Install CUDA 12.8 + Torch 2.7.0+cu128 on Windows 11
Install exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl
Build CUDA kernel successfully via setup.py build_ext --inplace
Attempt import:
from exllamav2 import ExLlamaV2, ExLlamaV2Tokenizer, ExLlamaV2Generator
Fails with:
ImportError: cannot import name 'ExLlamaV2Generator'
Pull latest repo version → additional import failure due to missing experimental/ module.
Expected behavior
Fully functional import of:
from exllamav2 import ExLlamaV2, ExLlamaV2Tokenizer, ExLlamaV2Generator
after either:
Installing official WHL; or
Building fresh from latest repo source code, without missing subfolders.
Logs
Initial ImportError after WHL installed:
ImportError: cannot import name 'ExLlamaV2Generator' from 'exllamav2'
After repo pull (post-patch error):
ModuleNotFoundError: No module named 'exllamav2.experimental'
Additional context
Hardware: NVIDIA RTX 5090 (SM120)
OS: Windows 11 22H2
CUDA version: 12.8
PyTorch: 2.7.0+cu128 (official)
CUDA fused kernel compiles successfully and loads without error
Issue exists at Python packaging layer only — inconsistent module exports between WHL and repo.
The experimental/ folder is not present in release zips and breaks direct source installs.
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.