feat: Add vector_db_id to chunk metadata #3304

are-ces · 2025-09-02T11:11:25Z

What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the vector_db_id into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)

Test Plan

Ran Llama Stack in debug mode.
Verified that vector_db_id was added to each chunk’s metadata.
Confirmed that the metadata was printed in the console when using the RAG tool.

Adding unit tests

leseb · 2025-09-03T09:31:20Z

@varshaprasad96 PTAL

varshaprasad96

/lgtm
This change looks good. Just to add - this is for vector_db, if you'd like to use vector_stores instead (which is based on top of open ai APIs) we also have filename from where the chunk was retrieved (

llama-stack/llama_stack/providers/utils/memory/openai_vector_store_mixin.py

Lines 449 to 450 in ef02b9e

    
           file_id=chunk.metadata.get("file_id", ""), 
        
           filename=chunk.metadata.get("filename", ""),

).

franciscojavierarceo

lgtm, feedback from #3255 was incorporated here 👍

leseb

rubber stamp based on @franciscojavierarceo's and @varshaprasad96's reviews.

# What does this PR do? When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the `vector_db_id` into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs. Relevant code: ```python for vector_db_id, result in zip(vector_db_ids, results): for chunk, score in zip(result.chunks, result.scores): if not hasattr(chunk, "metadata") or chunk.metadata is None: chunk.metadata = {} chunk.metadata["vector_db_id"] = vector_db_id chunks.append(chunk) scores.append(score) ``` ## Test Plan * Ran Llama Stack in debug mode. * Verified that `vector_db_id` was added to each chunk’s metadata. * Confirmed that the metadata was printed in the console when using the RAG tool. --------- Co-authored-by: are-ces <[email protected]> Co-authored-by: Francisco Arceo <[email protected]>

are-ces and others added 4 commits August 29, 2025 08:45

Add vector_db_id to chunk metadata

b50cb25

Adding unit tests

Merge branch 'main' into main

fc4a758

Merge branch 'main' into main

21a70bf

Fixed comment

030de4b

are-ces requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners September 2, 2025 11:11

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 2, 2025

are-ces mentioned this pull request Sep 2, 2025

feat: Add vector_db_id to chunk metadata #3255

Closed

Fixed issues with metadata

8a59cb3

varshaprasad96 approved these changes Sep 8, 2025

View reviewed changes

franciscojavierarceo approved these changes Sep 10, 2025

View reviewed changes

leseb approved these changes Sep 10, 2025

View reviewed changes

leseb merged commit 1c23aeb into llamastack:main Sep 10, 2025
21 checks passed

franciscojavierarceo mentioned this pull request Sep 10, 2025

fix: Add missing files_api parameter to MemoryToolRuntimeImpl test #3394

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add vector_db_id to chunk metadata #3304

feat: Add vector_db_id to chunk metadata #3304

Uh oh!

are-ces commented Sep 2, 2025

Uh oh!

leseb commented Sep 3, 2025

Uh oh!

varshaprasad96 left a comment •

edited

Loading

Uh oh!

franciscojavierarceo left a comment •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

Uh oh!

Uh oh!

	file_id=chunk.metadata.get("file_id", ""),
	filename=chunk.metadata.get("filename", ""),

feat: Add vector_db_id to chunk metadata #3304

feat: Add vector_db_id to chunk metadata #3304

Uh oh!

Conversation

are-ces commented Sep 2, 2025

What does this PR do?

Test Plan

Uh oh!

leseb commented Sep 3, 2025

Uh oh!

varshaprasad96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

varshaprasad96 left a comment •

edited

Loading

franciscojavierarceo left a comment •

edited

Loading