qmd-mlx

Dedicated public lab for enhancing qmd with an OpenAI-compatible provider path that can use MLX-backed servers for embeddings, reranking, and query expansion.

This repo is deliberately not the production qmd install. It is the clean lane for development, testing, and eventual upstream-ready patches.

References

qmd upstream: https://github.com/tobi/qmd
qmd PR #619 - OpenAI-compatible backend: tobi/qmd#619
qmd PR #689 - OpenAI embeddings/query-expansion/rerank path: tobi/qmd#689
vMLX: https://github.com/jjang-ai/vmlx
oMLX: https://github.com/jundot/omlx
embed-rerank: https://github.com/joonsoo-me/embed-rerank

Why

Current qmd is strong, but its shipped backend is GGUF/node-llama-cpp. For Rudy's Apple Silicon trace-search lane, a warm MLX server may be better for:

persistent model residency
shared local inference across tools
faster iteration over embedding/rerank model choices
clean separation between qmd retrieval logic and model runtime

First workflow

workflows/qmd-openai-mlx-provider/

Goal: create and validate a qmd provider path where:

qmd embedBatch()  -> POST /v1/embeddings
qmd rerank()      -> POST /v1/rerank
qmd expandQuery() -> POST /v1/chat/completions

Quick start

cd /Users/rudlord/ORGANIZED/ACTIVE_PROJECTS/qmd-mlx

# Prepare Python tooling and download small MLX models
scripts/setup-dev-env.sh
scripts/download-mlx-models.sh

# Clone upstream qmd into an ignored sandbox and fetch PR references
scripts/clone-qmd-sandbox.sh

# Run repo-level verification
scripts/verify.sh

# Run a deterministic fake-provider contract test for PR #619 surfaces
scripts/test-qmd-pr619-fake-openai.sh

# Start or reuse the local vMLX embedding server on 127.0.0.1:8092
scripts/start-vmlx-embedding-server.sh

# New terminal: run the local qmd PR #619 + vMLX integration diagnostic
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/test-qmd-pr619-vmlx.sh

# Run tiny public benchmark/eval lanes
QMD_MLX_BASE_URL=http://127.0.0.1:8092/v1 scripts/benchmark-qmd-pr619-vmlx.sh
scripts/benchmark-qmd-pr619-gguf.sh

Default MLX models for the first experiment

Small enough to validate the path before wasting time on giant re-embeds:

Embedding: mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ
Reranker:  mlx-community/Qwen3-Reranker-0.6B-mxfp8

Quality-mode candidates after the path works:

Embedding: mlx-community/Qwen3-Embedding-4B-4bit-DWQ
Reranker:  Qwen3-Reranker-4B via an MLX-compatible server, if available and validated

Public repo hygiene

No private traces. No model weights. No qmd indexes. No secrets.

The committed repo contains reproducible scripts and plans only. Generated work lives in ignored folders.

Current local result

Measured on 2026-05-31 against qmd PR #619 in .sandbox/qmd, the deterministic fake OpenAI-compatible provider, and vMLX on http://127.0.0.1:8092/v1:

PASS: fake OpenAI-compatible contract covers /v1/embeddings, /v1/rerank, /v1/chat/completions, /v1/models, Authorization forwarding, qmd embed/vsearch/query, and isolated index writes.
PASS: qmd update/embed/vsearch/query/rerank work through PR #619's OpenAI-compatible provider after applying the local vMLX Qwen3 reranker patch in `patches/vmlx-1.5.49-qwen3-reranker-causal.patch`.
PASS: tiny public benchmark saved in `docs/benchmarks/qmd-pr619-public-benchmark-2026-05-31.md`; MLX full pipeline averaged 3049 ms vs stock GGUF 6651 ms on the same fixture.
FIXED LOCALLY: vMLX Qwen3 reranker was misrouted through `mlx_embeddings` and returned raw negative logit margins; the local patch routes it through `mlx_lm` and sigmoid-normalizes relevance scores.
QUIRK: vMLX lists qmd-embed in /v1/models, but /v1/embeddings rejects that alias. Use the exact local embedding-model path for embeddings.

Rerank root-cause details:

docs/vmlx-qwen3-rerank-root-cause.md

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
patches		patches
scripts		scripts
tests		tests
workflows/qmd-openai-mlx-provider		workflows/qmd-openai-mlx-provider
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
agent.md		agent.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qmd-mlx

References

Why

First workflow

Quick start

Default MLX models for the first experiment

Public repo hygiene

Current local result

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

qmd-mlx

References

Why

First workflow

Quick start

Default MLX models for the first experiment

Public repo hygiene

Current local result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages