fix: catch TypeError when accessing think token properties by nightguarder · Pull Request #1141 · jundot/omlx

nightguarder · 2026-05-09T14:17:27Z

Fix TypeError for non_thinking models

Catch TypeError alongside ValueError in three spots in scheduler.py. Without this, /v1/completions crashes on TranslateGemma.

Explanation

Some models (e.g. TranslateGemma) don't have _think_start_tokens initialized, so accessing the think_start_id property raises TypeError instead of ValueError.

The three existing except ValueError blocks miss this, causing engine loop crashes when serving such models via /v1/completions.

Catch (ValueError, TypeError) on all three sites:

_detect_needs_think_prefix in scheduler step
ThinkingBudgetProcessor think_start_id resolution
_resolve_think_end_token_ids think_end_id resolution

What this does

This enables support for TranslateGemma and other non-thinking models whose
tokenizer lacks _think_start_tokens. Previously blocked by TypeError,

Note:
that you still need to update your local translategemma chat template must be applied client-side and the
resulting prompt sent to /v1/completions instead. See original Issue for explanation #879

Files modified

omlx/scheduler.py

…nking models This enables support for TranslateGemma and other non-thinking models whose tokenizer lacks _think_start_tokens. Previously blocked by TypeError, TranslateGemma-4b-it now works correctly via /v1/completions with a client-side chat template. Note: TranslateGemma uses a custom chat template requiring source_lang_code/target_lang_code fields that OMLX /v1/chat/completions does not support. The chat template must be applied client-side and the resulting prompt sent to /v1/completions instead. See the model's chat_template.jinja for the exact prompt format. Catch (ValueError, TypeError) on all three sites: - _detect_needs_think_prefix in scheduler step - ThinkingBudgetProcessor think_start_id resolution - _resolve_think_end_token_ids think_end_id resolution

mlx-lm#1171 changed the MiniMax M2 tool parser to return a list when a single <minimax:tool_call> block contains multiple <invoke>s. Without list/dict flattening in api/tool_calling.py, parallel tool calls were silently dropped via AttributeError swallowed by the existing except. Flatten parser results in the native path using the same isinstance(list) pattern already used in the Gemma 4 fallback. Add regression tests for single-dict, multi-list, and multi-block-multi-invoke cases. Also picks up BatchKVCache/BatchRotatingKVCache.extend() batch-dim fix (jundot#1141) and the tree_reduce import fix (jundot#1165) from the same bump.

nightguarder mentioned this pull request May 9, 2026

translategemma-4b-it-4bit error #879

Closed

Merge branch 'main' into fix/think-token-typeerror-safety

ed36c99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: catch TypeError when accessing think token properties#1141

fix: catch TypeError when accessing think token properties#1141
nightguarder wants to merge 2 commits into
jundot:mainfrom
nightguarder:fix/think-token-typeerror-safety

nightguarder commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nightguarder commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix TypeError for non_thinking models

Explanation

What this does

Files modified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nightguarder commented May 9, 2026 •

edited

Loading