Skip to content

chipoto69/qmd-mlx

Repository files navigation

qmd-mlx

Dedicated public lab for enhancing qmd with an OpenAI-compatible provider path that can use MLX-backed servers for embeddings, reranking, and query expansion.

This repo is deliberately not the production qmd install. It is the clean lane for development, testing, and eventual upstream-ready patches.

References

Why

Current qmd is strong, but its shipped backend is GGUF/node-llama-cpp. For Rudy's Apple Silicon trace-search lane, a warm MLX server may be better for:

  • persistent model residency
  • shared local inference across tools
  • faster iteration over embedding/rerank model choices
  • clean separation between qmd retrieval logic and model runtime

First workflow

workflows/qmd-openai-mlx-provider/

Goal: create and validate a qmd provider path where:

qmd embedBatch()  -> POST /v1/embeddings
qmd rerank()      -> POST /v1/rerank
qmd expandQuery() -> POST /v1/chat/completions

Quick start

cd /Users/rudlord/ORGANIZED/ACTIVE_PROJECTS/qmd-mlx

# Prepare Python tooling and download small MLX models
scripts/setup-dev-env.sh
scripts/download-mlx-models.sh

# Clone upstream qmd into an ignored sandbox and fetch PR references
scripts/clone-qmd-sandbox.sh

# Run repo-level verification
scripts/verify.sh

# Run a deterministic fake-provider contract test for PR #619 surfaces
scripts/test-qmd-pr619-fake-openai.sh

# Start or reuse the local vMLX embedding server on 127.0.0.1:8092
scripts/start-vmlx-embedding-server.sh

# New terminal: run the local qmd PR #619 + vMLX integration diagnostic
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/test-qmd-pr619-vmlx.sh

# Run tiny public benchmark/eval lanes
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/benchmark-qmd-pr619-vmlx.sh
scripts/benchmark-qmd-pr619-gguf.sh

Default MLX models for the first experiment

Small enough to validate the path before wasting time on giant re-embeds:

Embedding: mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ
Reranker:  mlx-community/Qwen3-Reranker-0.6B-mxfp8

Quality-mode candidates after the path works:

Embedding: mlx-community/Qwen3-Embedding-4B-4bit-DWQ
Reranker:  Qwen3-Reranker-4B via an MLX-compatible server, if available and validated

Public repo hygiene

No private traces. No model weights. No qmd indexes. No secrets.

The committed repo contains reproducible scripts and plans only. Generated work lives in ignored folders.

Current local result

Measured on 2026-05-31 against qmd PR #619 in .sandbox/qmd, the deterministic fake OpenAI-compatible provider, and vMLX on http://127.0.0.1:8092/v1:

PASS: fake OpenAI-compatible contract covers /v1/embeddings, /v1/rerank, /v1/chat/completions, /v1/models, Authorization forwarding, qmd embed/vsearch/query, and isolated index writes.
PASS: qmd update/embed/vsearch/query/rerank work through PR #619's OpenAI-compatible provider after applying the local vMLX Qwen3 reranker patch in `patches/vmlx-1.5.49-qwen3-reranker-causal.patch`.
PASS: tiny public benchmark saved in `docs/benchmarks/qmd-pr619-public-benchmark-2026-05-31.md`; MLX full pipeline averaged 3049 ms vs stock GGUF 6651 ms on the same fixture.
FIXED LOCALLY: vMLX Qwen3 reranker was misrouted through `mlx_embeddings` and returned raw negative logit margins; the local patch routes it through `mlx_lm` and sigmoid-normalizes relevance scores.
QUIRK: vMLX lists qmd-embed in /v1/models, but /v1/embeddings rejects that alias. Use the exact local embedding-model path for embeddings.

Rerank root-cause details:

docs/vmlx-qwen3-rerank-root-cause.md

About

qmd + MLX OpenAI-compatible provider lab for Apple Silicon trace search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors