Open
Description
OS
Windows
GPU Library
CUDA 12.x
Python version
3.11
Pytorch version
2.7.0+cu128
Model
No response
Describe the bug
After 0.2.8 all next ExllamaV2 libs make mistral 7b v0.2 models give gibberish (v0.1 is ok). It's like rope is way off. I see same things if I use tabby on Google colab with 7b-12b nemo, it return text with lost coherence, just random words.
running torch-2.6.0+cu124 next to 0.3.1 not help (but it starts return words instead of gibberish, still incoherent)
Reproduction steps
Run tabby with ExllamaV2 0.2.9+ on nvidia 20xx/google colab using mistral v0.2 7b models / 12b nemo
Expected behavior
To work like ExllamaV2 0.2.8 with PyTorch version 2.6.0+cu124
Logs
No response
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.