Skip to content

fix: catch TypeError when accessing think token properties#1141

Open
nightguarder wants to merge 2 commits into
jundot:mainfrom
nightguarder:fix/think-token-typeerror-safety
Open

fix: catch TypeError when accessing think token properties#1141
nightguarder wants to merge 2 commits into
jundot:mainfrom
nightguarder:fix/think-token-typeerror-safety

Conversation

@nightguarder
Copy link
Copy Markdown

@nightguarder nightguarder commented May 9, 2026

Fix TypeError for non_thinking models

Catch TypeError alongside ValueError in three spots in scheduler.py. Without this, /v1/completions crashes on TranslateGemma.

Explanation

Some models (e.g. TranslateGemma) don't have _think_start_tokens initialized, so accessing the think_start_id property raises TypeError instead of ValueError.

The three existing except ValueError blocks miss this, causing engine loop crashes when serving such models via /v1/completions.

Catch (ValueError, TypeError) on all three sites:

  • _detect_needs_think_prefix in scheduler step
  • ThinkingBudgetProcessor think_start_id resolution
  • _resolve_think_end_token_ids think_end_id resolution

What this does

This enables support for TranslateGemma and other non-thinking models whose
tokenizer lacks _think_start_tokens. Previously blocked by TypeError,

Note:
that you still need to update your local translategemma chat template must be applied client-side and the
resulting prompt sent to /v1/completions instead. See original Issue for explanation #879

Files modified

  • omlx/scheduler.py

…nking models

This enables support for TranslateGemma and other non-thinking models whose
tokenizer lacks _think_start_tokens. Previously blocked by TypeError,
TranslateGemma-4b-it now works correctly via /v1/completions with a
client-side chat template.

Note: TranslateGemma uses a custom chat template requiring
source_lang_code/target_lang_code fields that OMLX /v1/chat/completions
does not support. The chat template must be applied client-side and the
resulting prompt sent to /v1/completions instead. See the model's
chat_template.jinja for the exact prompt format.

Catch (ValueError, TypeError) on all three sites:
- _detect_needs_think_prefix in scheduler step
- ThinkingBudgetProcessor think_start_id resolution
- _resolve_think_end_token_ids think_end_id resolution
cangming009 pushed a commit to cangming009/omlx that referenced this pull request May 10, 2026
mlx-lm#1171 changed the MiniMax M2 tool parser to return a list when a
single <minimax:tool_call> block contains multiple <invoke>s. Without
list/dict flattening in api/tool_calling.py, parallel tool calls were
silently dropped via AttributeError swallowed by the existing except.

Flatten parser results in the native path using the same isinstance(list)
pattern already used in the Gemma 4 fallback. Add regression tests for
single-dict, multi-list, and multi-block-multi-invoke cases.

Also picks up BatchKVCache/BatchRotatingKVCache.extend() batch-dim fix
(jundot#1141) and the tree_reduce import fix (jundot#1165) from the same bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant