feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800
feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800cbcoutinho wants to merge 1 commit into
Conversation
…modal + Cohere v4)
Adds joint text↔image embedding capability to BedrockProvider, enabling
direct image semantic search without an intermediate captioning step.
Validated against amazon.titan-embed-image-v1 in eu-west-1: 5/5 rank-1
hits on a 3-photo probe set with ≥0.13 cosine margin to the runner-up,
distractor query maxes out at 0.18 (well below a 0.30 acceptance
threshold). 1024-dim output, ~0.6s per image including network.
Provider ABC (providers/base.py)
--------------------------------
* New ``supports_image_embeddings`` capability flag (default ``False``).
* New ``embed_image(bytes, mime_type) -> list[float]``.
* New ``embed_image_batch(list[bytes], mime_type) -> list[list[float]]``
with a sequential default impl; subclasses with native batch endpoints
override.
* New ``embed_for_image_space(text) -> list[float]`` — text query into
the *image* embedding space. Deliberately a separate method from
``embed()`` so the text-document space and image space cannot silently
collide via a config typo.
* New ``get_image_dimension()``.
* All new methods default to ``NotImplementedError`` so existing providers
(Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected.
BedrockProvider (providers/bedrock.py)
--------------------------------------
* New ``image_embedding_model`` and ``image_output_dim`` ctor args.
* Implements all four ABC methods.
* Two model families supported via prefix dispatch (matching the existing
``_create_embedding_request`` style):
- ``amazon.titan-embed-image-v1`` — single-image per call. Validated
end-to-end against the real endpoint. Surfaces Titan's ``message``
error field as RuntimeError with the model's own text.
- ``cohere.embed-v4:0`` — native multi-image batch. Chunked at 64 per
invoke_model call (Cohere's documented cap is 96; we leave headroom).
Mock-tested only — account-level AWS Marketplace subscription
pending.
* Request body construction lives in one place
(``_create_image_embedding_request``); both single-image and batch
paths route through it so adding fields stays a one-line change.
Config + registry
-----------------
* ``BEDROCK_IMAGE_EMBEDDING_MODEL`` / ``BEDROCK_IMAGE_OUTPUT_DIM``
env vars surfaced via Settings, threaded through ProviderRegistry.
Tests
-----
* 9 new unit tests in ``tests/unit/providers/test_bedrock.py`` covering
Titan/Cohere single embed, Cohere batch, Cohere chunking >64 images,
text-into-image-space for both families, Titan error-via-message
field, disabled-capability path, dimension-not-detected path, and
Titan sequential batch fallback. 18/18 bedrock tests pass; full unit
suite green (1033 tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @cbcoutinho's task in 2m 24s —— View job PR Review: feat(providers): image embeddings in BedrockProviderTodo:
Overall this is a clean, well-reasoned implementation. The design decision to keep 🟡 Important1. self._remember_image_dim(results[0]) # IndexError if API returns 0 embeddings
return resultsIf the Bedrock API returns a structurally valid JSON response but with an empty if results:
self._remember_image_dim(results[0])
return results2. Both Titan and Cohere branches can produce a body with neither if not image_b64s and text is None:
raise ValueError("Must provide at least one of image_b64s or text")3. Titan Multimodal G1 only accepts 256, 384, or 1024. The codebase already uses # In _dynaconf validators list (config.py ~line 209)
Validator("BEDROCK_IMAGE_OUTPUT_DIM", is_in=[256, 384, 1024]),This gives a clear error at startup rather than a confusing 🟢 Nit4. For Titan, def get_image_dimension(self) -> int:
if not self.supports_image_embeddings:
raise NotImplementedError(...)
if self.image_embedding_model and self.image_embedding_model.startswith("amazon.titan-embed-image"):
return self.image_output_dim
if self._image_dimension is None:
raise RuntimeError(...)
return self._image_dimension5. No test for The 6. The Positive notes
|
|



Fixes #703
Summary
BedrockProvider, enabling direct image semantic search without an intermediate captioning step.amazon.titan-embed-image-v1andcohere.embed-v4:0. Both validated end-to-end against the live API in eu-west-1.ProviderABC extended withsupports_image_embeddings,embed_image,embed_image_batch,embed_for_image_space,get_image_dimension— all defaulting toNotImplementedErrorso existing providers (Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected.embed_for_image_space()deliberately separate fromembed()so the text-document space and image space cannot silently collide via a config typo.BEDROCK_IMAGE_EMBEDDING_MODEL,BEDROCK_IMAGE_OUTPUT_DIM.Eval results — A/B on 3-photo probe set, eu-west-1
Both models hit 5/5 rank-1, but Cohere v4 produces ~2x the rank-1 ↔ rank-2 cosine margin, which translates to much more robust threshold-based "no match" handling.
amazon.titan-embed-image-v1(1024-dim, ~0.5s/img)Distractor ceiling 0.18, lowest matching score 0.40 → safe threshold window 0.22 wide.
cohere.embed-v4:0(1536-dim, ~1.7s/img)Distractor ceiling 0.11, lowest matching score 0.38 → safe threshold window 0.27 wide.
Trade-offs
Recommendation for the eventual indexer: Cohere v4 for query-quality-sensitive collections, Titan for high-volume cheap storage. Either way, the same
BedrockProviderinstance can host the choice — only the env var changes.Test plan
uv run pytest tests/unit/providers/test_bedrock.py -v— 18/18 pass (9 new, 9 existing)uv run pytest tests/unit/ -x -q— 1033/1033 pass (no regressions)uv run ruff check/ruff format --check/ty check -- nextcloud_mcp_server— cleanget_provider()against real Titan endpoint withPhotos/Coast.jpgreproduces the scratch-eval cosine (0.398 vs 0.3999 — different sessions, same model).Out of scope (intentional)
embed_imageagainst WebDAV-discovered images, and an MCP tool that queries viaembed_for_image_space. Easier to land in a follow-up so this PR stays focused on the provider surface.embed_for_image_spacevs overloadingembeddeserves a paragraph.🤖 Generated with Claude Code
This PR was generated with the help of AI, and reviewed by a Human