Skip to content

[Bug]: Qwen3-VL-Embedding multimodal embedding model crashes on oMLX v0.3.8 #1306

@Footree

Description

@Footree

Summary

Qwen3-VL-Embedding (multimodal vision-language embedding model) crashes during embedding generation on oMLX v0.3.8. The server application stays alive, but the model crashes internally, returning empty responses.

Environment

Steps to Reproduce

  1. Download Qwen3-VL-Embedding via oMLX
  2. Send embedding request via OpenAI-compatible /v1/embeddings endpoint
  3. Model crashes during embedding generation

Expected Behavior

Qwen3-VL-Embedding should generate embeddings (same as Qwen3-Embedding, which works perfectly).

Actual Behavior

curl: (18) transfer closed with outstanding read data remaining
  • Response is empty
  • Model crashes internally during embedding generation
  • oMLX application process stays alive (health endpoint still responds)
  • Only the embedding API endpoint returns errors

Comparison

Model Type Status
Qwen3-Embedding-8B-mxfp8 Text-only embedding ✅ Works perfectly
Qwen3-VL-Embedding Multimodal VL embedding ❌ Crash on embed

Impact

Cannot use multimodal embeddings for applications that need to embed both text and images (e.g., visual memory search, image-text cross-modal retrieval). This blocks the upgrade path to VL-capable embedding models on oMLX.

Logs / Debug Info

  • Swagger UI: transfer closed error
  • curl: transfer closed with outstanding read data remaining
  • No visible error in health endpoint — application remains healthy
  • Suggests model-level crash inside the embedding pipeline, not a server-level crash

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions