Feature request: Support remote Ollama embeddings via HTTP (OLLAMA_EMBED_URL)

## Problem

QMD 2.0 uses `node-llama-cpp` for all embedding and tokenization operations. This requires local compilation via CMake on every run, which **fails on platforms without GPU/Vulkan support** (e.g., ARM64 VPS, Docker containers, CI runners) and is unusable when the Ollama instance runs on a separate machine (common with Tailscale, Docker networks, or dedicated GPU boxes).

The `OLLAMA_EMBED_URL` env var exists but only applies to `qmd vsearch` partially — `expandQuery()`, `chunkDocumentByTokens()`, `generateEmbeddings()`, and the `vsearch` CLI command all still require `node-llama-cpp` and trigger CMake compilation.

## Use Case

Running QMD on an ARM64 Oracle Cloud VPS with Ollama on a separate machine (connected via Tailscale). The Ollama instance serves `qwen3-embedding:0.6b` (1024 dims, MTEB #1 multilingual). There is no local GPU and no way to compile `node-llama-cpp` cleanly.

This is a common setup for anyone using:
- Remote Ollama (Docker, Tailscale, LAN)
- ARM64 servers (Oracle, Ampere, Graviton)
- Headless VPS without GPU drivers

## Proposed Solution

When `OLLAMA_EMBED_URL` is set, bypass all `node-llama-cpp` / `getDefaultLlamaCpp()` calls:

### 1. `ollamaEmbed()` helper function
```typescript
async function ollamaEmbed(text: string): Promise<EmbeddingResult> {
  const url = process.env.OLLAMA_EMBED_URL;
  const model = process.env.OLLAMA_EMBED_MODEL || "nomic-embed-text";
  const res = await fetch(`${url}/api/embed`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ model, input: text }),
  });
  const data = await res.json();
  return { embedding: data.embeddings[0], model };
}
```

### 2. Patch points (6 locations)

| Function | File | What to bypass |
|---|---|---|
| `getEmbedding()` | `store.ts` | `getDefaultLlamaCpp()` → `ollamaEmbed()` |
| `generateEmbeddings()` | `store.ts` | `withLLMSessionForLlm` → direct Ollama HTTP |
| `expandQuery()` | `store.ts` | LLM query expansion → pass-through `[{type:"vec", query}]` |
| `chunkDocumentByTokens()` | `store.ts` | `llm.tokenize()` → char-based estimation (`text.length / 3`) |
| `embedBatch` (2 sites) | `store.ts` | `llm.embedBatch()` → `ollamaEmbedBatch()` |
| `vectorSearch` CLI | `cli/qmd.ts` | `withLLMSession()` → direct call |

### 3. Environment variables

```bash
export OLLAMA_EMBED_URL=http://your-ollama:11434
export OLLAMA_EMBED_MODEL=qwen3-embedding:0.6b  # optional, defaults to nomic-embed-text
```

## Results (tested on ARM64 Oracle VPS)

- **Before:** Every `vsearch`/`embed`/`query` triggers CMake compilation (fails on ARM64 without Vulkan)
- **After:** Zero compilation, instant results via HTTP to remote Ollama
- `qmd embed --force` successfully re-indexes 7,100+ documents
- `qmd vsearch` returns results in <2s vs hanging on CMake indefinitely

## Notes

- `qmd search` (BM25) is unaffected — works perfectly without any of this
- `expandQuery` uses LLM for HYDE-style query expansion. When using Ollama path, we skip this and pass the raw query as a vector search. A future improvement could call Ollama `/api/generate` for query expansion.
- Char-based chunking (`text.length / avgCharsPerToken`) is a reasonable approximation that avoids requiring a local tokenizer
- The `OLLAMA_EMBED_MODEL` env var allows users to pick any embedding model available on their Ollama instance

Happy to submit a PR if there is interest.

Function	File	What to bypass
`getEmbedding()`	`store.ts`	`getDefaultLlamaCpp()` → `ollamaEmbed()`
`generateEmbeddings()`	`store.ts`	`withLLMSessionForLlm` → direct Ollama HTTP
`expandQuery()`	`store.ts`	LLM query expansion → pass-through `[{type:"vec", query}]`
`chunkDocumentByTokens()`	`store.ts`	`llm.tokenize()` → char-based estimation (`text.length / 3`)
`embedBatch` (2 sites)	`store.ts`	`llm.embedBatch()` → `ollamaEmbedBatch()`
`vectorSearch` CLI	`cli/qmd.ts`	`withLLMSession()` → direct call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Support remote Ollama embeddings via HTTP (OLLAMA_EMBED_URL) #489

Problem

Use Case

Proposed Solution

1. `ollamaEmbed()` helper function

2. Patch points (6 locations)

3. Environment variables

Results (tested on ARM64 Oracle VPS)

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: Support remote Ollama embeddings via HTTP (OLLAMA_EMBED_URL) #489

Description

Problem

Use Case

Proposed Solution

1. ollamaEmbed() helper function

2. Patch points (6 locations)

3. Environment variables

Results (tested on ARM64 Oracle VPS)

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `ollamaEmbed()` helper function