Skip to content

[BUG]Exllamav2 repeats itself in the answer #764

@manitadayon

Description

@manitadayon

OS

Linux

GPU Library

CUDA 12.x

Python version

3.11

Pytorch version

2.6.0

Model

No response

Describe the bug

I am using Exllamav2 Base generator and generate_sample with the following parameters:

temperature = 0
top_p = 1
stop_token = tokenizer.eos_token_id
The max_new_token = 3000

The model is the GPTQ quantized version of LLama 3.3 70B to 4 bits.
I have tried this model on VLLM and it works like a charm with no repeatition of response or part of the response.

This is my prompt:
Prompt = Use your reasoning and solve the following equation step by step: x* (sin(x) + x) = 0

The response is strange:
It is like this:

Step1: break down the problem to smaller pieces
Step2: since the right hand side is 0, so perhaps either x is zero or sin(x) is equal to -x.

.  .  .

Step6: Hence the answer is 0.
Step1: there are multiple ways to answer this problem, we should first break down the problem to more manageable pieces.
Step2: remember the right hand side is 0, hence either x = 0 or sin(x) + x  = 0

Step6: Therefore x only can be 0

That was just one example, that the response is only repeated twice, sometimes it is repeated 4 times, sometimes part of
The response is repeated.
Does anyone know any solution or any parameter that can prevent this:
I have tried tweaking:

temperature, stop_token, repeatition_penalty

But no luck. The same model works with no problem in VLLM.

Note: Due to this repartition, the time takes to respond to the prompt is high as well, like what should be taken 20s at most takes 3 minutes.

Reproduction steps

Described above.

Expected behavior

Generate response with no repetition.

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions