feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4) by cbcoutinho · Pull Request #800 · cbcoutinho/nextcloud-mcp-server

cbcoutinho · 2026-05-17T16:43:12Z

Fixes #703

Summary

Adds joint text↔image embedding capability to BedrockProvider, enabling direct image semantic search without an intermediate captioning step.
Two model families wired up via prefix dispatch: amazon.titan-embed-image-v1 and cohere.embed-v4:0. Both validated end-to-end against the live API in eu-west-1.
Provider ABC extended with supports_image_embeddings, embed_image, embed_image_batch, embed_for_image_space, get_image_dimension — all defaulting to NotImplementedError so existing providers (Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected.
embed_for_image_space() deliberately separate from embed() so the text-document space and image space cannot silently collide via a config typo.
Cohere batch chunked at 64 per call (documented cap is 96; leaves headroom).
New env vars: BEDROCK_IMAGE_EMBEDDING_MODEL, BEDROCK_IMAGE_OUTPUT_DIM.

Eval results — A/B on 3-photo probe set, eu-west-1

Both models hit 5/5 rank-1, but Cohere v4 produces ~2x the rank-1 ↔ rank-2 cosine margin, which translates to much more robust threshold-based "no match" handling.

`amazon.titan-embed-image-v1` (1024-dim, ~0.5s/img)

query	rank-1	top-2	gap
ocean waves crashing on rocky coastline	Coast ✓ 0.40	Hummingbird 0.26	0.14
hummingbird with iridescent feathers	Hummingbird ✓ 0.47	Nut 0.28	0.19
close-up macro photograph of a nut	Nut ✓ 0.46	Hummingbird 0.33	0.13
seascape with cliffs and surf	Coast ✓ 0.41	Hummingbird 0.22	0.19
small fast bird hovering near a flower	Hummingbird ✓ 0.50	Nut 0.28	0.22
snowy mountain peak (distractor)	Coast 0.18	Hummingbird 0.14	—

Distractor ceiling 0.18, lowest matching score 0.40 → safe threshold window 0.22 wide.

`cohere.embed-v4:0` (1536-dim, ~1.7s/img)

query	rank-1	top-2	gap
ocean waves crashing on rocky coastline	Coast ✓ 0.45	Hummingbird 0.07	0.38
hummingbird with iridescent feathers	Hummingbird ✓ 0.45	Nut 0.11	0.34
close-up macro photograph of a nut	Nut ✓ 0.38	Hummingbird 0.11	0.27
seascape with cliffs and surf	Coast ✓ 0.38	Hummingbird 0.10	0.28
small fast bird hovering near a flower	Hummingbird ✓ 0.50	Nut 0.17	0.33
snowy mountain peak (distractor)	Coast 0.11	Hummingbird 0.11	—

Distractor ceiling 0.11, lowest matching score 0.38 → safe threshold window 0.27 wide.

Trade-offs

Cohere v4 — better separation, lower distractor ceiling, native batch endpoint (64+ per call); slower per single image, 1.5× larger vectors in Qdrant.
Titan G1 — faster per image, smaller vectors, configurable output dim (256/384/1024 — the smaller sizes are basically free in Qdrant for marginal accuracy loss); single-image per call.

Recommendation for the eventual indexer: Cohere v4 for query-quality-sensitive collections, Titan for high-volume cheap storage. Either way, the same BedrockProvider instance can host the choice — only the env var changes.

Test plan

uv run pytest tests/unit/providers/test_bedrock.py -v — 18/18 pass (9 new, 9 existing)
uv run pytest tests/unit/ -x -q — 1033/1033 pass (no regressions)
uv run ruff check / ruff format --check / ty check -- nextcloud_mcp_server — clean
Live end-to-end smoke through get_provider() against real Titan endpoint with Photos/Coast.jpg reproduces the scratch-eval cosine (0.398 vs 0.3999 — different sessions, same model).
Live A/B Titan vs Cohere v4 in eu-west-1 — both 5/5 rank-1, Cohere wider margins (see above).

Out of scope (intentional)

TwelveLabs Marengo support (async StartAsyncInvoke API, different control plane).
Consumer side: image indexer that calls embed_image against WebDAV-discovered images, and an MCP tool that queries via embed_for_image_space. Easier to land in a follow-up so this PR stays focused on the provider surface.
ADR document. Worth adding when consumer side ships — the naming choice for embed_for_image_space vs overloading embed deserves a paragraph.

🤖 Generated with Claude Code

This PR was generated with the help of AI, and reviewed by a Human

…modal + Cohere v4) Adds joint text↔image embedding capability to BedrockProvider, enabling direct image semantic search without an intermediate captioning step. Validated against amazon.titan-embed-image-v1 in eu-west-1: 5/5 rank-1 hits on a 3-photo probe set with ≥0.13 cosine margin to the runner-up, distractor query maxes out at 0.18 (well below a 0.30 acceptance threshold). 1024-dim output, ~0.6s per image including network. Provider ABC (providers/base.py) -------------------------------- * New ``supports_image_embeddings`` capability flag (default ``False``). * New ``embed_image(bytes, mime_type) -> list[float]``. * New ``embed_image_batch(list[bytes], mime_type) -> list[list[float]]`` with a sequential default impl; subclasses with native batch endpoints override. * New ``embed_for_image_space(text) -> list[float]`` — text query into the *image* embedding space. Deliberately a separate method from ``embed()`` so the text-document space and image space cannot silently collide via a config typo. * New ``get_image_dimension()``. * All new methods default to ``NotImplementedError`` so existing providers (Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected. BedrockProvider (providers/bedrock.py) -------------------------------------- * New ``image_embedding_model`` and ``image_output_dim`` ctor args. * Implements all four ABC methods. * Two model families supported via prefix dispatch (matching the existing ``_create_embedding_request`` style): - ``amazon.titan-embed-image-v1`` — single-image per call. Validated end-to-end against the real endpoint. Surfaces Titan's ``message`` error field as RuntimeError with the model's own text. - ``cohere.embed-v4:0`` — native multi-image batch. Chunked at 64 per invoke_model call (Cohere's documented cap is 96; we leave headroom). Mock-tested only — account-level AWS Marketplace subscription pending. * Request body construction lives in one place (``_create_image_embedding_request``); both single-image and batch paths route through it so adding fields stays a one-line change. Config + registry ----------------- * ``BEDROCK_IMAGE_EMBEDDING_MODEL`` / ``BEDROCK_IMAGE_OUTPUT_DIM`` env vars surfaced via Settings, threaded through ProviderRegistry. Tests ----- * 9 new unit tests in ``tests/unit/providers/test_bedrock.py`` covering Titan/Cohere single embed, Cohere batch, Cohere chunking >64 images, text-into-image-space for both families, Titan error-via-message field, disabled-capability path, dimension-not-detected path, and Titan sequential batch fallback. 18/18 bedrock tests pass; full unit suite green (1033 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-17T16:43:27Z

Claude finished @cbcoutinho's task in 2m 24s —— View job

PR Review: feat(providers): image embeddings in BedrockProvider

Todo:

Understand the request (PR review)
Read changed files: base.py, bedrock.py, config.py, registry.py, test_bedrock.py
Analyze code quality, correctness, security, and test coverage
Post full review

Overall this is a clean, well-reasoned implementation. The design decision to keep embed_for_image_space separate from embed is sound (documented in PR description), the eval numbers are convincing, and the test coverage is solid. A few issues worth addressing before merge.

🟡 Important

1. embed_image_batch Cohere path: unguarded results[0] after loop

bedrock.py:411

self._remember_image_dim(results[0])   # IndexError if API returns 0 embeddings
return results

If the Bedrock API returns a structurally valid JSON response but with an empty embeddings.float list (e.g., during throttling or an undocumented edge case), this will IndexError instead of giving a meaningful error. The if not images: return [] guard only covers the input — not the response. Add a check:

if results:
    self._remember_image_dim(results[0])
return results

Fix this →

2. _create_image_embedding_request allows empty request body (no image, no text)

bedrock.py:279

Both Titan and Cohere branches can produce a body with neither inputImage/inputText nor images/texts. This will fail at the API layer with an opaque error rather than a clear Python exception. Add a guard at the top of the function:

if not image_b64s and text is None:
    raise ValueError("Must provide at least one of image_b64s or text")

Fix this →

3. BEDROCK_IMAGE_OUTPUT_DIM not validated — invalid values fail silently at API time

config.py:114

Titan Multimodal G1 only accepts 256, 384, or 1024. The codebase already uses Validator(..., is_in=[...]) for other constrained settings (e.g., LOG_FORMAT at line 222). An invalid value currently passes startup cleanly and fails only on the first embed call:

# In _dynaconf validators list (config.py ~line 209)
Validator("BEDROCK_IMAGE_OUTPUT_DIM", is_in=[256, 384, 1024]),

This gives a clear error at startup rather than a confusing ValidationException from the Bedrock API mid-operation.

Fix this →

🟢 Nit

4. get_image_dimension() for Titan is knowable at construction time

bedrock.py:429

For Titan, self.image_output_dim is always the dimension — no probe call needed. Cohere's dimension is genuinely unknown until the first embed. You could short-circuit for Titan:

def get_image_dimension(self) -> int:
    if not self.supports_image_embeddings:
        raise NotImplementedError(...)
    if self.image_embedding_model and self.image_embedding_model.startswith("amazon.titan-embed-image"):
        return self.image_output_dim
    if self._image_dimension is None:
        raise RuntimeError(...)
    return self._image_dimension

5. No test for embed_image_batch([]) short-circuit

The if not images: return [] early-exit at bedrock.py:394 is untested. Easy one-liner test to add.

6. _invoke_image_model null-guard is unreachable

bedrock.py:347

The if not self.image_embedding_model: raise NotImplementedError at the top of _invoke_image_model is unreachable — every call site has already checked supports_image_embeddings. Not harmful, but it's dead code.

Positive notes

Prefix dispatch (startswith("amazon.titan-embed-image") / startswith("cohere.embed")) is consistent with the existing text embedding pattern — good uniformity.
Titan error propagation via response.get("message") is the correct approach; Titan embeds use that field for soft errors.
embed_for_image_space as a separate method is the right call. Overloading embed() would have been a footgun for callers who misconfigure BEDROCK_IMAGE_EMBEDDING_MODEL=BEDROCK_EMBEDDING_MODEL.
Cohere chunking at 64 vs 96 cap is a sensible headroom choice.
Test coverage — the 9 new tests cover the critical paths well (Titan, Cohere batch, text-to-image-space, error message surface, disabled guard, dimension detection, chunking at cap boundary, and Titan sequential fallback).

sonarqubecloud · 2026-05-17T16:44:18Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cbcoutinho mentioned this pull request Jul 2, 2026

feat(documents): add docling (docling-serve) document-parsing backend #1006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800

feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800
cbcoutinho wants to merge 1 commit into
masterfrom
feat/bedrock-image-embeddings

cbcoutinho commented May 17, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 17, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cbcoutinho commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eval results — A/B on 3-photo probe set, eu-west-1

amazon.titan-embed-image-v1 (1024-dim, ~0.5s/img)

cohere.embed-v4:0 (1536-dim, ~1.7s/img)

Trade-offs

Test plan

Out of scope (intentional)

Uh oh!

claude Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(providers): image embeddings in BedrockProvider

🟡 Important

🟢 Nit

Positive notes

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cbcoutinho commented May 17, 2026 •

edited

Loading

`amazon.titan-embed-image-v1` (1024-dim, ~0.5s/img)

`cohere.embed-v4:0` (1536-dim, ~1.7s/img)

claude Bot commented May 17, 2026 •

edited

Loading