"There is no Stream(gpu, 1) in current thread." with MTP and Gemma 4

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Use Gemma 4 (e.g. 26B)
2. Download the MTP drafter for Gemma 4
3. Turn on MTP.
4. Queries will result in this error:

```
2026-05-28 00:02:50,189 - omlx.scheduler - ERROR - [-] - Error in batch generation step: There is no Stream(gpu, 1) in current thread.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 5889, in step
    responses.extend(self._step_vlm_mtp())
                     ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 3973, in _step_vlm_mtp
    token_val = next(state.generator)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/speculative/vlm_mtp.py", line 189, in run_vlm_mtp_decode
    for tok, _ in _mtp_rounds(
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/mlx_vlm/speculative/utils.py", line 572, in _mtp_rounds
    mx.async_eval(draft_tokens)
RuntimeError: There is no Stream(gpu, 1) in current thread.

2026-05-28 00:02:50,189 - omlx.engine_core - ERROR - [-] - Engine loop error: There is no Stream(gpu, 1) in current thread.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/engine_core.py", line 210, in _engine_loop
    output = await loop.run_in_executor(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 5889, in step
    responses.extend(self._step_vlm_mtp())
                     ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 3973, in _step_vlm_mtp
    token_val = next(state.generator)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/omlx/speculative/vlm_mtp.py", line 189, in run_vlm_mtp_decode
    for tok, _ in _mtp_rounds(
  File "/opt/homebrew/Cellar/omlx/HEAD-33c5ecf/libexec/lib/python3.11/site-packages/mlx_vlm/speculative/utils.py", line 572, in _mtp_rounds
    mx.async_eval(draft_tokens)
RuntimeError: There is no Stream(gpu, 1) in current thread.
```

**Expected behavior**
No error.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - macOS Version: 26.5
 - oMLX Version 0.3.12

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"There is no Stream(gpu, 1) in current thread." with MTP and Gemma 4 #1469

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"There is no Stream(gpu, 1) in current thread." with MTP and Gemma 4 #1469

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions