Skip to content

Multi-round inference leads to insufficient memory crash #314

@Edsuns

Description

@Edsuns

Comparing Qwen3.6 35B 4bit mlx to Qwen3.6 35B q4_k_m gguf, after running in Claude Code for a while, the MLX version consumes significantly more memory than the GGUF version, and the MLX version crashed due to insufficient memory.

Run reproduction long_prompt_demo.py on Mac mini M4 32GB:

lms get mlx-community/Qwen3.6-35B-A3B-4bit
python3 long_prompt_demo.py --model mlx-community/Qwen3.6-35B-A3B-4bit --max-kv-size 65536 --rounds 5 --prompt-length 100000

I have tested models that exhibit this problem:

Error log using lms:

2026-04-18 12:08:57  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Running Anthropic messages API on conversation with 19 messages.
2026-04-18 12:08:57  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Streaming Anthropic response...
2026-04-18 12:08:58 [DEBUG]
 [cache_wrapper][INFO]: Prompt cache: using 36384/36871 tokens from cache
2026-04-18 12:08:58  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 0.0%
2026-04-18 12:09:04  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 97.7%
2026-04-18 12:09:04  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 99.8%
2026-04-18 12:09:04  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 99.8%
2026-04-18 12:09:05 [DEBUG]
 libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
2026-04-18 12:09:05 [DEBUG]
 Fatal Python error: Aborted

Thread 0x
2026-04-18 12:09:05 [DEBUG]
 0000000de2c57000 (most recent call first):
  File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 455 in generate_step
  File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 705 in <genexpr>
  File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 716 in stream_generate
2026-04-18 12:09:05 [DEBUG]
 File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_engine/generate.py", line 543 in _sequential_generation

Thread 0x00000001f7d6d8c0 (most recent call first):
  <no Python frame>
2026-04-18 12:09:05 [DEBUG]
 
Extension modules: yaml._yaml
2026-04-18 12:09:05 [DEBUG]
 , regex._regex, numpy._core._multiarray_umath
2026-04-18 12:09:05 [DEBUG]
 , numpy.linalg._umath_linalg, markupsafe._speedups
2026-04-18 12:09:05 [DEBUG]
 , PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special
2026-04-18 12:09:05 [DEBUG]
 , numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand
2026-04-18 12:09:05 [DEBUG]
 , numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator
2026-04-18 12:09:05 [DEBUG]
 , sentencepiece._sentencepiece
2026-04-18 12:09:05 [DEBUG]
 , PIL._imagingft
2026-04-18 12:09:05 [DEBUG]
 , charset_normalizer.md, charset_normalizer.cd, requests.packages.charset_normalizer.md, requests.packages.chardet.md, requests.packages.charset_normalizer.cd, requests.packages.chardet.cd
2026-04-18 12:09:05 [DEBUG]
 , xxhash._xxhash
2026-04-18 12:09:06 [DEBUG]
  (total: 35)
2026-04-18 12:09:06 [ERROR]
 [unn/qwen3.6-35b-a3b-4bit] Anthropic streaming error: The model has crashed without additional information. (Exit code: null)
2026-04-18 12:09:06  [INFO]
 [unn/qwen3.6-35b-a3b-4bit] Finished streaming Anthropic response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions