Skip to content

"There is no Stream(gpu, 1) in current thread." with MTP and Gemma 4 #1469

@wysie

Description

@wysie

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Use Gemma 4 (e.g. 26B)
  2. Download the MTP drafter for Gemma 4
  3. Turn on MTP.
  4. Queries will result in this error:
2026-05-28 00:02:50,189 - omlx.scheduler - ERROR - [-] - Error in batch generation step: There is no Stream(gpu, 1) in current thread.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 5889, in step
    responses.extend(self._step_vlm_mtp())
                     ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 3973, in _step_vlm_mtp
    token_val = next(state.generator)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/speculative/vlm_mtp.py", line 189, in run_vlm_mtp_decode
    for tok, _ in _mtp_rounds(
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/mlx_vlm/speculative/utils.py", line 572, in _mtp_rounds
    mx.async_eval(draft_tokens)
RuntimeError: There is no Stream(gpu, 1) in current thread.

2026-05-28 00:02:50,189 - omlx.engine_core - ERROR - [-] - Engine loop error: There is no Stream(gpu, 1) in current thread.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/engine_core.py", line 210, in _engine_loop
    output = await loop.run_in_executor(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 5889, in step
    responses.extend(self._step_vlm_mtp())
                     ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 3973, in _step_vlm_mtp
    token_val = next(state.generator)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/speculative/vlm_mtp.py", line 189, in run_vlm_mtp_decode
    for tok, _ in _mtp_rounds(
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/mlx_vlm/speculative/utils.py", line 572, in _mtp_rounds
    mx.async_eval(draft_tokens)
RuntimeError: There is no Stream(gpu, 1) in current thread.

Expected behavior
No error.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • macOS Version: 26.5
  • oMLX Version 0.3.12

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions