Add support for remote OpenAI-compatible embeddings by alexleach · Pull Request #480 · tobi/qmd

alexleach · 2026-03-28T06:50:12Z

This is mainly a tidy-up of #116, which fell behind main, but adds support for configurable remote endpoints.

There are many Issues and PRs already, requesting that support for remote, OpenAI-compatible endpoints is re-added to the code base, so I apologise for creating a new one, but it's clearly quite a popular request!

PRs

A lot of these are endpoint-specific in nature, either for Voyager, Gemini, OpenRouter. This PR is generic, allowing the use of any embedding provider that follows OpenAI's API-specification for embeddings.

Issues

I went through a few of them, and thought that @jonesj38's version was the closest to what I wanted. I made some minor changes, before it fell behind main and needed some clean merging. I created a PR to his repository, but it became quite messy and I have received no feedback from that in a couple of weeks.

Summary
Clearly there are a lot of use-cases for remote endpoints. My use-case, as mentioned in a couple existing PRs and Issues, is that node-llama-cpp does not build in docker on Mac Silicon. Even if it did, it wouldn't have support for Apple's GPU.

So, I need to host the models in Docker Model Runner, which is treated as an OpenAI-compatible remote endpoint.

Either way, I am using this fork, but I would much prefer if it was merged upstream so I can benefit from any future code changes, too. (It wasn't easy rebasing the fork on main!)

Replace the rerank() stub with a real listwise reranker using gpt-4o-mini. - Sends top candidates with query to gpt-4o-mini as a ranking task - Parses comma-separated index output, handles missing/duplicate indices - Skips API call for ≤2 documents (not worth the latency) - Falls back to original order on API failure - Cost: ~$0.001 per rerank call - Updated qmd.ts to route through OpenAI reranker instead of skipping The full qmd query pipeline with OpenAI now: 1. Query expansion (gpt-4o-mini) 2. BM25 + vector search (parallel) 3. RRF fusion 4. Cross-encoder reranking (gpt-4o-mini) ← NEW 5. Position-aware blending

Add `qmd serve` command that runs a lightweight HTTP server exposing embedding, reranking, and query expansion endpoints. Multiple QMD clients can share a single set of loaded models over the network instead of each loading their own into RAM. Changes: - New `src/serve.ts`: HTTP server wrapping LlamaCpp (embed/rerank/expand/tokenize) - New `src/llm-remote.ts`: RemoteLLM class implementing LLM interface via HTTP - Updated LLM interface: added embedBatch, tokenize, intent option - Updated store.ts: use LLM interface instead of concrete LlamaCpp type - CLI: added `serve` command, `--server` flag, and QMD_SERVER env var - README: documented remote model server usage and multi-agent setup Addresses: tobi#489 tobi#490 tobi#502 tobi#480 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add `qmd serve` command that runs a lightweight HTTP server exposing embedding, reranking, and query expansion endpoints. Multiple QMD clients can share a single set of loaded models over the network instead of each loading their own into RAM. Changes: - New `src/serve.ts`: HTTP server wrapping LlamaCpp (embed/rerank/expand/tokenize) - New `src/llm-remote.ts`: RemoteLLM class implementing LLM interface via HTTP - Updated LLM interface: added embedBatch, tokenize, intent option - Updated store.ts: use LLM interface instead of concrete LlamaCpp type - CLI: added `serve` command, `--server` flag, and QMD_SERVER env var - README: documented remote model server usage and multi-agent setup Addresses: tobi#489 tobi#490 tobi#502 tobi#480

alexleach and others added 9 commits March 27, 2026 21:53

feat: add OpenAI embedding support

571e930

feat: add OpenAI embedding and query expansion support

8cb2441

fix: use default embedding LLM for hybrid vector queries

ff83ba1

feat: configurable remote OpenAI URLs

094ef0a

fix: add baseUrl to EmbeddingProviderConfig

575af82

fix: don't build/launch node-llama-cpp when using OpenUI

447847e

fix: restore rerank function after conflict resolution

ccf3ff6

chore: bump openai package version

12c7d18

alexleach mentioned this pull request Apr 2, 2026

feat: support remote Ollama embeddings via OLLAMA_EMBED_URL #490

Closed

alexleach closed this Apr 2, 2026

alexleach deleted the feat/openai-embeddings-clean-backup branch April 2, 2026 09:56

paralizeer mentioned this pull request Apr 5, 2026

Feature request: Support remote Ollama embeddings via HTTP (OLLAMA_EMBED_URL) #489

Open

jaylfc mentioned this pull request Apr 5, 2026

feat: remote model server (qmd serve) for shared inference across clients #509

Closed

jaylfc mentioned this pull request Apr 5, 2026

feat: remote model server (qmd serve) for shared inference across clients #511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for remote OpenAI-compatible embeddings#480

Add support for remote OpenAI-compatible embeddings#480
alexleach wants to merge 9 commits intotobi:mainfrom
alexleach:feat/openai-embeddings-clean-backup

alexleach commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexleach commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants