fix(engine): keep per-engine MLX worker thread alive to fix DeepSeek V4 unload SIGSEGV (#1304 regression)#1542
Conversation
|
I think this very same thing happens with the just released Step 3.7 Flash too - always python crashed on exit. I tried this but had to fix the unload so I didn't bother sending pr to support Step 3.7. Could be 2 in 1 solves. |
|
Thanks for the really thorough writeup, the crash frame and your root-cause read were spot on and made this easy to chase down. I reproduced it on a 512 GB machine with DeepSeek-V4-Flash-8bit exactly as you described: Tracing it the rest of the way, the trigger is upstream MLX #3280, which moved the compile cache to a Keeping the worker thread alive the way this PR does is a clean fix for the unload crash and I nearly went with it. Two things nudged me elsewhere: it pins one idle thread plus stream per model load for the process lifetime, and it only relocates the crash to process exit, where even an immortal thread gets torn down and runs the same destructor (that lines up with the "crash on exit" @beamivalice mentioned). The fix I landed clears the cache instead of dodging it. So I'm going to close this in favor of that fix, but it landed straight off the back of your debugging, thanks a lot for it. I'll also flag the dead atexit lambda upstream so the real fix can eventually live in MLX. |
|
Thanks, that compile_clear_cache approach is much cleaner — glad the writeup helped. |
Unloading DeepSeek V4 crashed omlx serve with a native SIGSEGV.
Root cause: MLX's @mx.compile cache (CompilerCache) is a C++ thread_local; the per-engine executor introduced in #1304 runs V4's module-scope @mx.compile graphs, then EngineCore.close() called executor.shutdown(wait=True), exiting the worker thread → ~CompilerCache() freed those graphs' Python objects from a thread-exit handler (no GIL) → use-after-free. V4-only (only model with module-scope compiled graphs) and sync-immune.
Fix: keep per-engine MLX worker threads alive for the process lifetime (matching the pre-#1304 global-thread behavior), so the destructor never runs mid-process.
Summary
Unloading DeepSeek-V4-Flash crashed the whole omlx serve process with a native SIGSEGV. The per-engine executor thread added in #1304 runs V4's @mx.compile graphs, and exiting that thread at unload triggers a use-after-free in MLX's thread_local compile-cache destructor. Fix: don't tear down the per-engine MLX worker thread at unload (matching the pre-#1304 single-global-thread behavior).
Root cause
MLX's @mx.compile cache (CompilerCache) is a C++ thread_local holding compiled graphs that reference Python objects. DeepSeek V4 is the only model with module-scope @mx.compile graphs, so they populate the per-engine thread's cache. EngineCore.close() called executor.shutdown(wait=True) → the worker thread exits → dyld runs ~CompilerCache() → it frees those Python objects from a thread-exit handler (no GIL, after gc already freed them) → use-after-free. V4-only and sync-immune (a thread-exit destructor, not GPU work). Pre-#1304, the shared global MLX thread never exited mid-process, so this never surfaced.
.ips crash frame (EXC_BAD_ACCESS at 0x10):
_pthread_exit → dyld::ThreadLocalVariables::finalizeList
→ mlx::core::detail::CompilerCache::~CompilerCache()
→ __deallocate_node(…CompilerCache::CacheEntry…) → tupledealloc
Fix
EngineCore.close() no longer shuts down the per-engine executor; the executor (and its stream) is held in a process-lifetime registry so the thread — and its thread_local CompilerCache — is never destructed mid-process:
close(): was self._mlx_executor.shutdown(wait=True)
self._mlx_executor = None # thread kept alive via module-global registry
No synchronization (sync-immune); MLX exposes no compile-cache-clear API. Cost: one idle thread per model load, reclaimed at exit.
Test plan