2026-04-18 12:08:57 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Running Anthropic messages API on conversation with 19 messages.
2026-04-18 12:08:57 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Streaming Anthropic response...
2026-04-18 12:08:58 [DEBUG]
[cache_wrapper][INFO]: Prompt cache: using 36384/36871 tokens from cache
2026-04-18 12:08:58 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 0.0%
2026-04-18 12:09:04 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 97.7%
2026-04-18 12:09:04 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 99.8%
2026-04-18 12:09:04 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Prompt processing progress: 99.8%
2026-04-18 12:09:05 [DEBUG]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
2026-04-18 12:09:05 [DEBUG]
Fatal Python error: Aborted
Thread 0x
2026-04-18 12:09:05 [DEBUG]
0000000de2c57000 (most recent call first):
File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 455 in generate_step
File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 705 in <genexpr>
File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_lm/generate.py", line 716 in stream_generate
2026-04-18 12:09:05 [DEBUG]
File "/Users/xxx/.lmstudio/extensions/backends/vendor/_amphibian/app-mlx-generate-mac14-arm64@22/lib/python3.11/site-packages/mlx_engine/generate.py", line 543 in _sequential_generation
Thread 0x00000001f7d6d8c0 (most recent call first):
<no Python frame>
2026-04-18 12:09:05 [DEBUG]
Extension modules: yaml._yaml
2026-04-18 12:09:05 [DEBUG]
, regex._regex, numpy._core._multiarray_umath
2026-04-18 12:09:05 [DEBUG]
, numpy.linalg._umath_linalg, markupsafe._speedups
2026-04-18 12:09:05 [DEBUG]
, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special
2026-04-18 12:09:05 [DEBUG]
, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand
2026-04-18 12:09:05 [DEBUG]
, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator
2026-04-18 12:09:05 [DEBUG]
, sentencepiece._sentencepiece
2026-04-18 12:09:05 [DEBUG]
, PIL._imagingft
2026-04-18 12:09:05 [DEBUG]
, charset_normalizer.md, charset_normalizer.cd, requests.packages.charset_normalizer.md, requests.packages.chardet.md, requests.packages.charset_normalizer.cd, requests.packages.chardet.cd
2026-04-18 12:09:05 [DEBUG]
, xxhash._xxhash
2026-04-18 12:09:06 [DEBUG]
(total: 35)
2026-04-18 12:09:06 [ERROR]
[unn/qwen3.6-35b-a3b-4bit] Anthropic streaming error: The model has crashed without additional information. (Exit code: null)
2026-04-18 12:09:06 [INFO]
[unn/qwen3.6-35b-a3b-4bit] Finished streaming Anthropic response
Comparing Qwen3.6 35B 4bit mlx to Qwen3.6 35B q4_k_m gguf, after running in Claude Code for a while, the MLX version consumes significantly more memory than the GGUF version, and the MLX version crashed due to insufficient memory.
Run reproduction long_prompt_demo.py on Mac mini M4 32GB:
I have tested models that exhibit this problem:
Error log using lms: