feat(ollama): detect and advertise vision capability with live E2E by roryford · Pull Request #1892 · roryford/ManifoldKit

roryford · 2026-06-16T08:33:40Z

What

Closes the #1 vision gap: the Ollama backend has always had the image wire path (OllamaImagesField), but it never advertised supportsVision and nothing tested image input. This detects, advertises, wires, and tests it end-to-end.

Changes

Detect: OllamaModelProbe reads the per-model vision flag from /api/show's capabilities: ["vision", ...] list (qwen2.5vl / moondream / llava), alongside the existing thinking detection. No template fallback — vision is advertised-only.
Advertise: OllamaBackend.isVisionModel (state-lock-guarded, mirrors _isThinkingModel) drives capabilities.supportsVision, routed through a new central BackendVisionCapability.ollamaSupportsImageInput(probedVision:) gate (matches the cloud families' pattern).
Wire: OllamaBackend now conforms to StructuredHistoryReceiver and lifts MessagePart.image payloads onto Ollama's message-level images: [base64] field. Snapshot-and-cleared like toolAwareHistory; text-only turns keep the existing plain-string path, preserving every existing wire-shape assertion.

Tests

OllamaManifestProbeTests (+91): deterministic vision detection via mocked /api/show — no live server.
BackendVisionCapabilityTests (+11): gate unit coverage.
OllamaVisionE2ETests (new): live E2E — generates a solid-color PNG in code, attaches it via setStructuredHistory, asserts grounded output from a real vision model. Cross-platform-gated (CoreGraphics/ImageIO); auto-skips when no vision model is installed.

Verification

swift build --build-tests — exit 0 (clean).
Live swift test --filter OllamaVisionE2ETests against local Ollama — orchestrator to run before marking ready.

Relates to the multimodal token-accounting gap (#1885): images now flow through a real path that the token estimator must account for.

Draft — orchestrator will run live verification + watch CI before merge. One of a 3-PR set (#1890 tool discovery + docs, #1891 cancellation E2E).

Ollama's image wire path (OllamaImagesField) has always existed, but the backend never advertised it and nothing tested it. Detect the per-model vision capability from /api/show's capabilities: ["vision", ...] list at load time, surface it via OllamaBackend.isVisionModel, and route it through a new central BackendVisionCapability.ollamaSupportsImageInput(probedVision:) gate so capabilities.supportsVision reflects reality. Wire a real multimodal request path: OllamaBackend now conforms to StructuredHistoryReceiver and lifts MessagePart.image payloads onto Ollama's message-level images: [base64] field (snapshot-and-cleared like toolAwareHistory; text-only turns still use the existing plain-string path, preserving wire-shape assertions). Tests: deterministic OllamaManifestProbeTests for vision detection (mocked /api/show, no live server) + BackendVisionCapability unit coverage, and a live OllamaVisionE2ETests that generates a solid-color PNG in code, attaches it, and asserts grounded output from a real vision model (qwen2.5vl/moondream/llava), auto-skipping when none is installed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

roryford marked this pull request as ready for review June 16, 2026 09:02

roryford merged commit e870118 into main Jun 16, 2026
11 checks passed

roryford deleted the feat/ollama-vision-capability-e2e branch June 16, 2026 09:02

github-actions Bot mentioned this pull request Jun 16, 2026

chore(main): release 0.53.0 #1893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ollama): detect and advertise vision capability with live E2E#1892

feat(ollama): detect and advertise vision capability with live E2E#1892
roryford merged 1 commit into
mainfrom
feat/ollama-vision-capability-e2e

roryford commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

roryford commented Jun 16, 2026

What

Changes

Tests

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant