Skip to content

[BUG] ExllamaV2 version >0.2.8 broken for mistral 7b(v0.2) models on Nvidia 2060 #795

Open
@IceFog72

Description

@IceFog72

OS

Windows

GPU Library

CUDA 12.x

Python version

3.11

Pytorch version

2.7.0+cu128

Model

No response

Describe the bug

After 0.2.8 all next ExllamaV2 libs make mistral 7b v0.2 models give gibberish (v0.1 is ok). It's like rope is way off. I see same things if I use tabby on Google colab with 7b-12b nemo, it return text with lost coherence, just random words.

running torch-2.6.0+cu124 next to 0.3.1 not help (but it starts return words instead of gibberish, still incoherent)

Image
Image

torch-2.6.0
Image

Reproduction steps

Run tabby with ExllamaV2 0.2.9+ on nvidia 20xx/google colab using mistral v0.2 7b models / 12b nemo

Expected behavior

To work like ExllamaV2 0.2.8 with PyTorch version 2.6.0+cu124

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions