Skip to content

Conversation

are-ces
Copy link
Contributor

@are-ces are-ces commented Sep 2, 2025

What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the vector_db_id into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)

Test Plan

  • Ran Llama Stack in debug mode.
  • Verified that vector_db_id was added to each chunk’s metadata.
  • Confirmed that the metadata was printed in the console when using the RAG tool.

@leseb
Copy link
Collaborator

leseb commented Sep 3, 2025

@varshaprasad96 PTAL

Copy link
Contributor

@varshaprasad96 varshaprasad96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
This change looks good. Just to add - this is for vector_db, if you'd like to use vector_stores instead (which is based on top of open ai APIs) we also have filename from where the chunk was retrieved (

file_id=chunk.metadata.get("file_id", ""),
filename=chunk.metadata.get("filename", ""),
).

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, feedback from #3255 was incorporated here 👍

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubber stamp based on @franciscojavierarceo's and @varshaprasad96's reviews.

@leseb leseb merged commit 1c23aeb into llamastack:main Sep 10, 2025
21 checks passed
leseb pushed a commit that referenced this pull request Sep 10, 2025
# What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to
trace where retrieved chunks originate from. This PR adds the
`vector_db_id` into each chunk’s metadata, making it easier to
understand which database a given chunk came from. This is helpful for
debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

```python
for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)
```

## Test Plan

* Ran Llama Stack in debug mode.
* Verified that `vector_db_id` was added to each chunk’s metadata.
* Confirmed that the metadata was printed in the console when using the
RAG tool.

---------

Co-authored-by: are-ces <[email protected]>
Co-authored-by: Francisco Arceo <[email protected]>
iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
# What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to
trace where retrieved chunks originate from. This PR adds the
`vector_db_id` into each chunk’s metadata, making it easier to
understand which database a given chunk came from. This is helpful for
debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

```python
for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)
```

## Test Plan

* Ran Llama Stack in debug mode.
* Verified that `vector_db_id` was added to each chunk’s metadata.
* Confirmed that the metadata was printed in the console when using the
RAG tool.

---------

Co-authored-by: are-ces <[email protected]>
Co-authored-by: Francisco Arceo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants