Feature request: citations for RAG #630

sdjd93dj · 2025-02-08T21:06:42Z

Currently the solution is able to retrieve documents via the database and return messages from the LLMs based on the retrieved context. But it does not generate citations of the specific documents used to generate the messages. Note that cited documents are usually a subset of the context and not simply the entire context fed into the model.

maryamkhidir · 2025-02-12T10:28:43Z

@sdjd93dj What do you mean subset of the context, can you share a sample doc of this?
Also, you can view the documents used for context in the metadata response of the LLM.

sdjd93dj · 2025-02-12T16:55:28Z

If you go on perplexity, search looks like this: [cid:ab04da14-c57f-46bc-9690-45fd47cb721f] At the top, you can see all the sources fed into the context - a list of about 7 or 8 sources. These are all the sources that the LLM saw. But each sentence generated by the LLM probably has its roots in a subset of those sources (or perhaps a single one). You see this with the numbers in the generated text - the first sentence cites source 1, the following sentences each cite source 2, and so on. I.e., The LLM saw all sources, but the LLM thinks that the first sentence had more of its roots in source 1. In total, even though the LLM saw all 8 sources, sentence-level citations don't highlight all 8 because the system's generated text may not always draw all information from all 8. Langchain has a sample implementation of citation generation here: https://python.langchain.com/docs/how_to/qa_citations/

…

________________________________ From: Maryam Khidir ***@***.***> Sent: Wednesday, February 12, 2025 5:29:06 AM To: aws-samples/aws-genai-llm-chatbot Cc: Doanvo, Anhvinh; Mention Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630) @sdjd93dj<https://github.com/sdjd93dj> What do you mean subset of the context, can you share a sample doc of this? Also, you can view the documents used for context in the metadata response of the LLM. — Reply to this email directly, view it on GitHub<#630 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJSRH5JFSEXHSF5TN4BPYKL2PMO7FAVCNFSM6AAAAABWYB4D5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJTGMYDENJRGQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

sdjd93dj · 2025-02-12T16:57:27Z

In case my screenshot didn't go through, see here: https://imgur.com/a/xGONWPC

…

________________________________ From: Doanvo, Anhvinh Sent: Wednesday, February 12, 2025 11:55:04 AM To: aws-samples/aws-genai-llm-chatbot; aws-samples/aws-genai-llm-chatbot Cc: Mention Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630) If you go on perplexity, search looks like this: [cid:ab04da14-c57f-46bc-9690-45fd47cb721f] At the top, you can see all the sources fed into the context - a list of about 7 or 8 sources. These are all the sources that the LLM saw. But each sentence generated by the LLM probably has its roots in a subset of those sources (or perhaps a single one). You see this with the numbers in the generated text - the first sentence cites source 1, the following sentences each cite source 2, and so on. I.e., The LLM saw all sources, but the LLM thinks that the first sentence had more of its roots in source 1. In total, even though the LLM saw all 8 sources, sentence-level citations don't highlight all 8 because the system's generated text may not always draw all information from all 8. Langchain has a sample implementation of citation generation here: https://python.langchain.com/docs/how_to/qa_citations/

________________________________ From: Maryam Khidir ***@***.***> Sent: Wednesday, February 12, 2025 5:29:06 AM To: aws-samples/aws-genai-llm-chatbot Cc: Doanvo, Anhvinh; Mention Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630) @sdjd93dj<https://github.com/sdjd93dj> What do you mean subset of the context, can you share a sample doc of this? Also, you can view the documents used for context in the metadata response of the LLM. — Reply to this email directly, view it on GitHub<#630 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJSRH5JFSEXHSF5TN4BPYKL2PMO7FAVCNFSM6AAAAABWYB4D5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJTGMYDENJRGQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

github-project-automation bot added this to AWS GenAI Chatbot Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: citations for RAG #630

Feature request: citations for RAG #630

sdjd93dj commented Feb 8, 2025

maryamkhidir commented Feb 12, 2025

sdjd93dj commented Feb 12, 2025 via email

sdjd93dj commented Feb 12, 2025 via email

Feature request: citations for RAG #630

Feature request: citations for RAG #630

Comments

sdjd93dj commented Feb 8, 2025

maryamkhidir commented Feb 12, 2025

sdjd93dj commented Feb 12, 2025 via email

sdjd93dj commented Feb 12, 2025 via email