[BUG]  GLM-4-32B-0414 only produces gibberish

### OS

Linux

### GPU Library

CUDA 12.x

### Python version

3.10

### Pytorch version

2.6.0+cu124

### Model

https://huggingface.co/LatentWanderer/THUDM_GLM-4-32B-0414-6.5bpw-h8-exl2

### Describe the bug

When using the model examples/chat.py it only produces nonsense output.

### Reproduction steps

Run `python ./examples/chat.py -m THUDM_GLM-4-32B-0414-6.5bpw-h8-exl2 --mode glm` and give it any prompt.

### Expected behavior

It should reply to the prompt as intended.

### Logs

` -- Model: THUDM_GLM-4-32B-0414-6.5bpw-h8-exl2
 -- Options: ['gpu_split: 20,20']
 -- Loading tokenizer...
 -- Loading model...
 -- Loading model...
 -- Prompt format: glm
 -- System prompt:

You are a helpful AI assistant.

User: Who are you

siegeseice



 siegele e ebe ce e treats
 cosea ceta seatste te pe ce se coast cops te taipse t se
 t t te cop cap te taits taup seat taic ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta t ta ta ta ta
 ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta_print ta ta ta ta taic ta ta ta ta_tr ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta tata ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta_t ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta_tr ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta tac ta ta ta ta ta ta ta ta ta tacer ta tata ta ta ta ta ta ta ta ta ta`

### Additional context

I pulled the latest dev branch and run `pip install . --upgrade` before trying to use the model. Changing the temperature or sampler does not seem to change anything.
The [exl3](https://huggingface.co/LatentWanderer/THUDM_GLM-4-32B-0414-6.5bpw-h8-exl3) version works as expected, both quants were created using the same files.

### Acknowledgements

- [x] I have looked for similar issues before submitting this one.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will ask my questions politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] GLM-4-32B-0414 only produces gibberish #772

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] GLM-4-32B-0414 only produces gibberish #772

Description

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions