[BUG] ExLlamaV2Generator import broken across WHL & source repo — Windows 11 + RTX 5090 + CUDA 12.8 build inconsistencies

### OS

Windows

### GPU Library

CUDA 12.x

### Python version

3.11

### Pytorch version

2.7.0

### Model

exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl

### Describe the bug

Attempting to build ExLlamaV2 on Windows 11 with RTX 5090 (CUDA 12.8 / Torch 2.7.0+cu128) succeeds at kernel build stage, but fails to import ExLlamaV2Generator due to inconsistent packaging across WHL and repo source. Some WHLs do not expose ExLlamaV2Generator, while latest source versions split generator code across multiple modules and require experimental/ folder which may be missing in partial source zips.

### Reproduction steps

Install CUDA 12.8 + Torch 2.7.0+cu128 on Windows 11
Install exllamav2-0.3.1+cu128.torch2.7.0-cp311-cp311-win_amd64.whl
Build CUDA kernel successfully via setup.py build_ext --inplace
Attempt import:
from exllamav2 import ExLlamaV2, ExLlamaV2Tokenizer, ExLlamaV2Generator
Fails with:
ImportError: cannot import name 'ExLlamaV2Generator'

Pull latest repo version → additional import failure due to missing experimental/ module.

### Expected behavior

Fully functional import of:

from exllamav2 import ExLlamaV2, ExLlamaV2Tokenizer, ExLlamaV2Generator

after either:

Installing official WHL; or

Building fresh from latest repo source code, without missing subfolders.

### Logs

Initial ImportError after WHL installed:

ImportError: cannot import name 'ExLlamaV2Generator' from 'exllamav2'

After repo pull (post-patch error):

ModuleNotFoundError: No module named 'exllamav2.experimental'


### Additional context

Hardware: NVIDIA RTX 5090 (SM120)

OS: Windows 11 22H2

CUDA version: 12.8

PyTorch: 2.7.0+cu128 (official)

CUDA fused kernel compiles successfully and loads without error

Issue exists at Python packaging layer only — inconsistent module exports between WHL and repo.

The experimental/ folder is not present in release zips and breaks direct source installs.

### Acknowledgements

- [x] I have looked for similar issues before submitting this one.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will ask my questions politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] ExLlamaV2Generator import broken across WHL & source repo — Windows 11 + RTX 5090 + CUDA 12.8 build inconsistencies #797

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] ExLlamaV2Generator import broken across WHL & source repo — Windows 11 + RTX 5090 + CUDA 12.8 build inconsistencies #797

Description

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions