Skip to content

feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800

Open
cbcoutinho wants to merge 1 commit into
masterfrom
feat/bedrock-image-embeddings
Open

feat(providers): image embeddings in BedrockProvider (Titan Multimodal + Cohere v4)#800
cbcoutinho wants to merge 1 commit into
masterfrom
feat/bedrock-image-embeddings

Conversation

@cbcoutinho

@cbcoutinho cbcoutinho commented May 17, 2026

Copy link
Copy Markdown
Owner

Fixes #703

Summary

  • Adds joint text↔image embedding capability to BedrockProvider, enabling direct image semantic search without an intermediate captioning step.
  • Two model families wired up via prefix dispatch: amazon.titan-embed-image-v1 and cohere.embed-v4:0. Both validated end-to-end against the live API in eu-west-1.
  • Provider ABC extended with supports_image_embeddings, embed_image, embed_image_batch, embed_for_image_space, get_image_dimension — all defaulting to NotImplementedError so existing providers (Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected.
  • embed_for_image_space() deliberately separate from embed() so the text-document space and image space cannot silently collide via a config typo.
  • Cohere batch chunked at 64 per call (documented cap is 96; leaves headroom).
  • New env vars: BEDROCK_IMAGE_EMBEDDING_MODEL, BEDROCK_IMAGE_OUTPUT_DIM.

Eval results — A/B on 3-photo probe set, eu-west-1

Both models hit 5/5 rank-1, but Cohere v4 produces ~2x the rank-1 ↔ rank-2 cosine margin, which translates to much more robust threshold-based "no match" handling.

amazon.titan-embed-image-v1 (1024-dim, ~0.5s/img)

query rank-1 top-2 gap
ocean waves crashing on rocky coastline Coast ✓ 0.40 Hummingbird 0.26 0.14
hummingbird with iridescent feathers Hummingbird ✓ 0.47 Nut 0.28 0.19
close-up macro photograph of a nut Nut ✓ 0.46 Hummingbird 0.33 0.13
seascape with cliffs and surf Coast ✓ 0.41 Hummingbird 0.22 0.19
small fast bird hovering near a flower Hummingbird ✓ 0.50 Nut 0.28 0.22
snowy mountain peak (distractor) Coast 0.18 Hummingbird 0.14

Distractor ceiling 0.18, lowest matching score 0.40 → safe threshold window 0.22 wide.

cohere.embed-v4:0 (1536-dim, ~1.7s/img)

query rank-1 top-2 gap
ocean waves crashing on rocky coastline Coast ✓ 0.45 Hummingbird 0.07 0.38
hummingbird with iridescent feathers Hummingbird ✓ 0.45 Nut 0.11 0.34
close-up macro photograph of a nut Nut ✓ 0.38 Hummingbird 0.11 0.27
seascape with cliffs and surf Coast ✓ 0.38 Hummingbird 0.10 0.28
small fast bird hovering near a flower Hummingbird ✓ 0.50 Nut 0.17 0.33
snowy mountain peak (distractor) Coast 0.11 Hummingbird 0.11

Distractor ceiling 0.11, lowest matching score 0.38 → safe threshold window 0.27 wide.

Trade-offs

  • Cohere v4 — better separation, lower distractor ceiling, native batch endpoint (64+ per call); slower per single image, 1.5× larger vectors in Qdrant.
  • Titan G1 — faster per image, smaller vectors, configurable output dim (256/384/1024 — the smaller sizes are basically free in Qdrant for marginal accuracy loss); single-image per call.

Recommendation for the eventual indexer: Cohere v4 for query-quality-sensitive collections, Titan for high-volume cheap storage. Either way, the same BedrockProvider instance can host the choice — only the env var changes.

Test plan

  • uv run pytest tests/unit/providers/test_bedrock.py -v — 18/18 pass (9 new, 9 existing)
  • uv run pytest tests/unit/ -x -q — 1033/1033 pass (no regressions)
  • uv run ruff check / ruff format --check / ty check -- nextcloud_mcp_server — clean
  • Live end-to-end smoke through get_provider() against real Titan endpoint with Photos/Coast.jpg reproduces the scratch-eval cosine (0.398 vs 0.3999 — different sessions, same model).
  • Live A/B Titan vs Cohere v4 in eu-west-1 — both 5/5 rank-1, Cohere wider margins (see above).

Out of scope (intentional)

  • TwelveLabs Marengo support (async StartAsyncInvoke API, different control plane).
  • Consumer side: image indexer that calls embed_image against WebDAV-discovered images, and an MCP tool that queries via embed_for_image_space. Easier to land in a follow-up so this PR stays focused on the provider surface.
  • ADR document. Worth adding when consumer side ships — the naming choice for embed_for_image_space vs overloading embed deserves a paragraph.

🤖 Generated with Claude Code


This PR was generated with the help of AI, and reviewed by a Human

…modal + Cohere v4)

Adds joint text↔image embedding capability to BedrockProvider, enabling
direct image semantic search without an intermediate captioning step.

Validated against amazon.titan-embed-image-v1 in eu-west-1: 5/5 rank-1
hits on a 3-photo probe set with ≥0.13 cosine margin to the runner-up,
distractor query maxes out at 0.18 (well below a 0.30 acceptance
threshold). 1024-dim output, ~0.6s per image including network.

Provider ABC (providers/base.py)
--------------------------------
* New ``supports_image_embeddings`` capability flag (default ``False``).
* New ``embed_image(bytes, mime_type) -> list[float]``.
* New ``embed_image_batch(list[bytes], mime_type) -> list[list[float]]``
  with a sequential default impl; subclasses with native batch endpoints
  override.
* New ``embed_for_image_space(text) -> list[float]`` — text query into
  the *image* embedding space. Deliberately a separate method from
  ``embed()`` so the text-document space and image space cannot silently
  collide via a config typo.
* New ``get_image_dimension()``.
* All new methods default to ``NotImplementedError`` so existing providers
  (Ollama, OpenAI, Mistral, Anthropic, Simple) are unaffected.

BedrockProvider (providers/bedrock.py)
--------------------------------------
* New ``image_embedding_model`` and ``image_output_dim`` ctor args.
* Implements all four ABC methods.
* Two model families supported via prefix dispatch (matching the existing
  ``_create_embedding_request`` style):
  - ``amazon.titan-embed-image-v1`` — single-image per call. Validated
    end-to-end against the real endpoint. Surfaces Titan's ``message``
    error field as RuntimeError with the model's own text.
  - ``cohere.embed-v4:0`` — native multi-image batch. Chunked at 64 per
    invoke_model call (Cohere's documented cap is 96; we leave headroom).
    Mock-tested only — account-level AWS Marketplace subscription
    pending.
* Request body construction lives in one place
  (``_create_image_embedding_request``); both single-image and batch
  paths route through it so adding fields stays a one-line change.

Config + registry
-----------------
* ``BEDROCK_IMAGE_EMBEDDING_MODEL`` / ``BEDROCK_IMAGE_OUTPUT_DIM``
  env vars surfaced via Settings, threaded through ProviderRegistry.

Tests
-----
* 9 new unit tests in ``tests/unit/providers/test_bedrock.py`` covering
  Titan/Cohere single embed, Cohere batch, Cohere chunking >64 images,
  text-into-image-space for both families, Titan error-via-message
  field, disabled-capability path, dimension-not-detected path, and
  Titan sequential batch fallback. 18/18 bedrock tests pass; full unit
  suite green (1033 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented May 17, 2026

Copy link
Copy Markdown

Claude finished @cbcoutinho's task in 2m 24s —— View job


PR Review: feat(providers): image embeddings in BedrockProvider

Todo:

  • Understand the request (PR review)
  • Read changed files: base.py, bedrock.py, config.py, registry.py, test_bedrock.py
  • Analyze code quality, correctness, security, and test coverage
  • Post full review

Overall this is a clean, well-reasoned implementation. The design decision to keep embed_for_image_space separate from embed is sound (documented in PR description), the eval numbers are convincing, and the test coverage is solid. A few issues worth addressing before merge.

🟡 Important

1. embed_image_batch Cohere path: unguarded results[0] after loop

bedrock.py:411

self._remember_image_dim(results[0])   # IndexError if API returns 0 embeddings
return results

If the Bedrock API returns a structurally valid JSON response but with an empty embeddings.float list (e.g., during throttling or an undocumented edge case), this will IndexError instead of giving a meaningful error. The if not images: return [] guard only covers the input — not the response. Add a check:

if results:
    self._remember_image_dim(results[0])
return results

Fix this →


2. _create_image_embedding_request allows empty request body (no image, no text)

bedrock.py:279

Both Titan and Cohere branches can produce a body with neither inputImage/inputText nor images/texts. This will fail at the API layer with an opaque error rather than a clear Python exception. Add a guard at the top of the function:

if not image_b64s and text is None:
    raise ValueError("Must provide at least one of image_b64s or text")

Fix this →


3. BEDROCK_IMAGE_OUTPUT_DIM not validated — invalid values fail silently at API time

config.py:114

Titan Multimodal G1 only accepts 256, 384, or 1024. The codebase already uses Validator(..., is_in=[...]) for other constrained settings (e.g., LOG_FORMAT at line 222). An invalid value currently passes startup cleanly and fails only on the first embed call:

# In _dynaconf validators list (config.py ~line 209)
Validator("BEDROCK_IMAGE_OUTPUT_DIM", is_in=[256, 384, 1024]),

This gives a clear error at startup rather than a confusing ValidationException from the Bedrock API mid-operation.

Fix this →


🟢 Nit

4. get_image_dimension() for Titan is knowable at construction time

bedrock.py:429

For Titan, self.image_output_dim is always the dimension — no probe call needed. Cohere's dimension is genuinely unknown until the first embed. You could short-circuit for Titan:

def get_image_dimension(self) -> int:
    if not self.supports_image_embeddings:
        raise NotImplementedError(...)
    if self.image_embedding_model and self.image_embedding_model.startswith("amazon.titan-embed-image"):
        return self.image_output_dim
    if self._image_dimension is None:
        raise RuntimeError(...)
    return self._image_dimension

5. No test for embed_image_batch([]) short-circuit

The if not images: return [] early-exit at bedrock.py:394 is untested. Easy one-liner test to add.


6. _invoke_image_model null-guard is unreachable

bedrock.py:347

The if not self.image_embedding_model: raise NotImplementedError at the top of _invoke_image_model is unreachable — every call site has already checked supports_image_embeddings. Not harmful, but it's dead code.


Positive notes

  • Prefix dispatch (startswith("amazon.titan-embed-image") / startswith("cohere.embed")) is consistent with the existing text embedding pattern — good uniformity.
  • Titan error propagation via response.get("message") is the correct approach; Titan embeds use that field for soft errors.
  • embed_for_image_space as a separate method is the right call. Overloading embed() would have been a footgun for callers who misconfigure BEDROCK_IMAGE_EMBEDDING_MODEL=BEDROCK_EMBEDDING_MODEL.
  • Cohere chunking at 64 vs 96 cap is a sensible headroom choice.
  • Test coverage — the 9 new tests cover the critical paths well (Titan, Cohere batch, text-to-image-space, error message surface, disabled guard, dimension detection, chunking at cap boundary, and Titan sequential fallback).

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Images / Photos

1 participant