f8afc46 regresses Gemma tokenizer loading: tab tokens in vocab.txt cause std::stof exception during init

## Reproduction

1. Use the `google/gemma-3-270m-it` model.
2. Convert or load Cactus weights that include a `vocab.txt` file in `ID<TAB>token` format.
3. Initialize the model with `cactus_init(...)`.
4. Initialization fails with:

   `Exception during init: stof`

## Expected Behavior

Models should initialize successfully even when `vocab.txt` contains tokens with literal tab characters.

## Actual Behavior

Initialization fails because the tokenizer loader interprets a tab inside the token text as a score separator and then attempts to parse the remainder with `std::stof(...)`.

## Likely Root Cause

Before f8afc46, the loader treated everything after the first tab as part of the token.

After f8afc46, the loader looks for another tab inside that token text and treats it as `token<TAB>score`. That breaks valid entries where the token itself contains tabs.

Gemma is a concrete repro case because its vocabulary includes tokens such as `"\t\t\t"`. In that case, the parser ends up trying to parse `"\t"` as a float, which throws and causes model initialization to fail.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

f8afc46 regresses Gemma tokenizer loading: tab tokens in vocab.txt cause std::stof exception during init #577

Reproduction

Expected Behavior

Actual Behavior

Likely Root Cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

f8afc46 regresses Gemma tokenizer loading: tab tokens in vocab.txt cause std::stof exception during init #577

Description

Reproduction

Expected Behavior

Actual Behavior

Likely Root Cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions