[BUG] Sampler crashes with some vocabulary sizes?

### OS

Linux

### GPU Library

CUDA 12.x

### Python version

3.12

### Pytorch version

2.6

### Model

_No response_

### Describe the bug

I'm trying to run sampling on my own data and I can't get it to work on most of the small vocabulary sizes (the c extension crashes).

### Reproduction steps

Minimal example:
```python
import torch
from exllamav2.ext import exllamav2_ext as ext_c, none_tensor
import random

VOCAB_SIZE = 40001 # 40000 works, 40001 doesn't

# Generate uniform random logits between -10 and 10
logits = (torch.rand(1, 1, VOCAB_SIZE, dtype=torch.float32) * 20) - 10

output_tokens = torch.empty((1, 1), dtype=torch.long)
output_probs = torch.empty((1, 1), dtype=torch.float)

output_ktokens = none_tensor
output_kprobs = none_tensor

m = ext_c.sample_basic(
    logits,                 # logits
    0.8,            # temp_scale
    50,                  # top_k
    0.8,                  # top_p
    0,                  # top_a
    0,                  # min_p
    0,                    # tfs
    0,                # typical
    random.random(),        # random_num
    output_tokens,          # output_tokens_tensor
    output_probs,           # output_probs_tensor
    output_kprobs,          # output_kprobs_tensor
    output_ktokens,          # output_ktokens_tensor
    none_tensor,            # logit_filter
    False,                  # mirostat
    [],                     # mirostat_mu
    1.5,           # mirostat_tau
    0.1,           # mirostat_eta
    1,                    # temp
    none_tensor,            # xtc_mask
    0,        # xtc_probability
    0,          # xtc_threshold
    0,               # min_temp
    0,               # max_temp
    0,          # temp_exponent
    0,       # smoothing_factor
    0                    # skew
)
print(f"Sampling finished. Output token: {output_tokens.item()}")

```

### Expected behavior

I expect it to work (give the sampled output token).

### Logs

```
double free or corruption (!prev)
Aborted (core dumped)
```

### Additional context

Exllamav2 Version: 0.2.9+cu124.torch2.6.0

### Acknowledgements

- [x] I have looked for similar issues before submitting this one.
- [x] I understand that the developers have lives and my issue will be answered when possible.
- [x] I understand the developers of this program are human, and I will ask my questions politely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] Sampler crashes with some vocabulary sizes? #778

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Sampler crashes with some vocabulary sizes? #778

Description

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions