Dedicated public lab for enhancing qmd with an OpenAI-compatible provider path that can use MLX-backed servers for embeddings, reranking, and query expansion.
This repo is deliberately not the production qmd install. It is the clean lane for development, testing, and eventual upstream-ready patches.
- qmd upstream: https://github.com/tobi/qmd
- qmd PR #619 - OpenAI-compatible backend: tobi/qmd#619
- qmd PR #689 - OpenAI embeddings/query-expansion/rerank path: tobi/qmd#689
- vMLX: https://github.com/jjang-ai/vmlx
- oMLX: https://github.com/jundot/omlx
- embed-rerank: https://github.com/joonsoo-me/embed-rerank
Current qmd is strong, but its shipped backend is GGUF/node-llama-cpp. For Rudy's Apple Silicon trace-search lane, a warm MLX server may be better for:
- persistent model residency
- shared local inference across tools
- faster iteration over embedding/rerank model choices
- clean separation between qmd retrieval logic and model runtime
workflows/qmd-openai-mlx-provider/
Goal: create and validate a qmd provider path where:
qmd embedBatch() -> POST /v1/embeddings
qmd rerank() -> POST /v1/rerank
qmd expandQuery() -> POST /v1/chat/completions
cd /Users/rudlord/ORGANIZED/ACTIVE_PROJECTS/qmd-mlx
# Prepare Python tooling and download small MLX models
scripts/setup-dev-env.sh
scripts/download-mlx-models.sh
# Clone upstream qmd into an ignored sandbox and fetch PR references
scripts/clone-qmd-sandbox.sh
# Run repo-level verification
scripts/verify.sh
# Run a deterministic fake-provider contract test for PR #619 surfaces
scripts/test-qmd-pr619-fake-openai.sh
# Start or reuse the local vMLX embedding server on 127.0.0.1:8092
scripts/start-vmlx-embedding-server.sh
# New terminal: run the local qmd PR #619 + vMLX integration diagnostic
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/test-qmd-pr619-vmlx.sh
# Run tiny public benchmark/eval lanes
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/benchmark-qmd-pr619-vmlx.sh
scripts/benchmark-qmd-pr619-gguf.shSmall enough to validate the path before wasting time on giant re-embeds:
Embedding: mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ
Reranker: mlx-community/Qwen3-Reranker-0.6B-mxfp8
Quality-mode candidates after the path works:
Embedding: mlx-community/Qwen3-Embedding-4B-4bit-DWQ
Reranker: Qwen3-Reranker-4B via an MLX-compatible server, if available and validated
No private traces. No model weights. No qmd indexes. No secrets.
The committed repo contains reproducible scripts and plans only. Generated work lives in ignored folders.
Measured on 2026-05-31 against qmd PR #619 in .sandbox/qmd, the deterministic fake OpenAI-compatible provider, and vMLX on http://127.0.0.1:8092/v1:
PASS: fake OpenAI-compatible contract covers /v1/embeddings, /v1/rerank, /v1/chat/completions, /v1/models, Authorization forwarding, qmd embed/vsearch/query, and isolated index writes.
PASS: qmd update/embed/vsearch/query/rerank work through PR #619's OpenAI-compatible provider after applying the local vMLX Qwen3 reranker patch in `patches/vmlx-1.5.49-qwen3-reranker-causal.patch`.
PASS: tiny public benchmark saved in `docs/benchmarks/qmd-pr619-public-benchmark-2026-05-31.md`; MLX full pipeline averaged 3049 ms vs stock GGUF 6651 ms on the same fixture.
FIXED LOCALLY: vMLX Qwen3 reranker was misrouted through `mlx_embeddings` and returned raw negative logit margins; the local patch routes it through `mlx_lm` and sigmoid-normalizes relevance scores.
QUIRK: vMLX lists qmd-embed in /v1/models, but /v1/embeddings rejects that alias. Use the exact local embedding-model path for embeddings.
Rerank root-cause details:
docs/vmlx-qwen3-rerank-root-cause.md