-
-
Notifications
You must be signed in to change notification settings - Fork 319
Open
Labels
bugSomething isn't workingSomething isn't working
Description
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.6
Model
No response
Describe the bug
I'm trying to run sampling on my own data and I can't get it to work on most of the small vocabulary sizes (the c extension crashes).
Reproduction steps
Minimal example:
import torch
from exllamav2.ext import exllamav2_ext as ext_c, none_tensor
import random
VOCAB_SIZE = 40001 # 40000 works, 40001 doesn't
# Generate uniform random logits between -10 and 10
logits = (torch.rand(1, 1, VOCAB_SIZE, dtype=torch.float32) * 20) - 10
output_tokens = torch.empty((1, 1), dtype=torch.long)
output_probs = torch.empty((1, 1), dtype=torch.float)
output_ktokens = none_tensor
output_kprobs = none_tensor
m = ext_c.sample_basic(
logits, # logits
0.8, # temp_scale
50, # top_k
0.8, # top_p
0, # top_a
0, # min_p
0, # tfs
0, # typical
random.random(), # random_num
output_tokens, # output_tokens_tensor
output_probs, # output_probs_tensor
output_kprobs, # output_kprobs_tensor
output_ktokens, # output_ktokens_tensor
none_tensor, # logit_filter
False, # mirostat
[], # mirostat_mu
1.5, # mirostat_tau
0.1, # mirostat_eta
1, # temp
none_tensor, # xtc_mask
0, # xtc_probability
0, # xtc_threshold
0, # min_temp
0, # max_temp
0, # temp_exponent
0, # smoothing_factor
0 # skew
)
print(f"Sampling finished. Output token: {output_tokens.item()}")
Expected behavior
I expect it to work (give the sampled output token).
Logs
double free or corruption (!prev)
Aborted (core dumped)
Additional context
Exllamav2 Version: 0.2.9+cu124.torch2.6.0
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working