[BUG] ExllamaV2 version >0.2.8 broken for mistral 7b(v0.2) models on Nvidia 2060

### OS

Windows

### GPU Library

CUDA 12.x

### Python version

3.11

### Pytorch version

2.7.0+cu128

### Model

_No response_

### Describe the bug

After 0.2.8 all next ExllamaV2 libs make mistral 7b v0.2 models give gibberish (v0.1 is ok). It's like rope is way off. I see same things if I use tabby on Google colab with 7b-12b nemo, it return text with lost coherence, just random words. 

running torch-2.6.0+cu124 next to 0.3.1 not help (but it starts return words instead of gibberish, still incoherent)

![Image](https://github.com/user-attachments/assets/7e573a0a-1f9e-47e7-88ef-d98167e95b34)
![Image](https://github.com/user-attachments/assets/729fba7c-df0a-478f-a57e-d7d23b55baa9)

torch-2.6.0
![Image](https://github.com/user-attachments/assets/71e7bb14-1b27-49f0-b253-627f29491562)

### Reproduction steps

Run tabby with ExllamaV2 0.2.9+ on nvidia 20xx/google colab using mistral v0.2 7b models / 12b nemo

### Expected behavior

To work like ExllamaV2 0.2.8 with PyTorch version 2.6.0+cu124

### Logs

_No response_

### Additional context

_No response_

### Acknowledgements

- [x] I have looked for similar issues before submitting this one.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will ask my questions politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] ExllamaV2 version >0.2.8 broken for mistral 7b(v0.2) models on Nvidia 2060 #795

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] ExllamaV2 version >0.2.8 broken for mistral 7b(v0.2) models on Nvidia 2060 #795

Description

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions