-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: citations for RAG #630
Comments
@sdjd93dj What do you mean subset of the context, can you share a sample doc of this? |
If you go on perplexity, search looks like this:
[cid:ab04da14-c57f-46bc-9690-45fd47cb721f]
At the top, you can see all the sources fed into the context - a list of about 7 or 8 sources. These are all the sources that the LLM saw. But each sentence generated by the LLM probably has its roots in a subset of those sources (or perhaps a single one). You see this with the numbers in the generated text - the first sentence cites source 1, the following sentences each cite source 2, and so on. I.e., The LLM saw all sources, but the LLM thinks that the first sentence had more of its roots in source 1.
In total, even though the LLM saw all 8 sources, sentence-level citations don't highlight all 8 because the system's generated text may not always draw all information from all 8.
Langchain has a sample implementation of citation generation here: https://python.langchain.com/docs/how_to/qa_citations/
…________________________________
From: Maryam Khidir ***@***.***>
Sent: Wednesday, February 12, 2025 5:29:06 AM
To: aws-samples/aws-genai-llm-chatbot
Cc: Doanvo, Anhvinh; Mention
Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630)
@sdjd93dj<https://github.com/sdjd93dj> What do you mean subset of the context, can you share a sample doc of this?
Also, you can view the documents used for context in the metadata response of the LLM.
—
Reply to this email directly, view it on GitHub<#630 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJSRH5JFSEXHSF5TN4BPYKL2PMO7FAVCNFSM6AAAAABWYB4D5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJTGMYDENJRGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
In case my screenshot didn't go through, see here: https://imgur.com/a/xGONWPC
…________________________________
From: Doanvo, Anhvinh
Sent: Wednesday, February 12, 2025 11:55:04 AM
To: aws-samples/aws-genai-llm-chatbot; aws-samples/aws-genai-llm-chatbot
Cc: Mention
Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630)
If you go on perplexity, search looks like this:
[cid:ab04da14-c57f-46bc-9690-45fd47cb721f]
At the top, you can see all the sources fed into the context - a list of about 7 or 8 sources. These are all the sources that the LLM saw. But each sentence generated by the LLM probably has its roots in a subset of those sources (or perhaps a single one). You see this with the numbers in the generated text - the first sentence cites source 1, the following sentences each cite source 2, and so on. I.e., The LLM saw all sources, but the LLM thinks that the first sentence had more of its roots in source 1.
In total, even though the LLM saw all 8 sources, sentence-level citations don't highlight all 8 because the system's generated text may not always draw all information from all 8.
Langchain has a sample implementation of citation generation here: https://python.langchain.com/docs/how_to/qa_citations/
________________________________
From: Maryam Khidir ***@***.***>
Sent: Wednesday, February 12, 2025 5:29:06 AM
To: aws-samples/aws-genai-llm-chatbot
Cc: Doanvo, Anhvinh; Mention
Subject: Re: [aws-samples/aws-genai-llm-chatbot] Feature request: citations for RAG (Issue #630)
@sdjd93dj<https://github.com/sdjd93dj> What do you mean subset of the context, can you share a sample doc of this?
Also, you can view the documents used for context in the metadata response of the LLM.
—
Reply to this email directly, view it on GitHub<#630 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJSRH5JFSEXHSF5TN4BPYKL2PMO7FAVCNFSM6AAAAABWYB4D5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJTGMYDENJRGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently the solution is able to retrieve documents via the database and return messages from the LLMs based on the retrieved context. But it does not generate citations of the specific documents used to generate the messages. Note that cited documents are usually a subset of the context and not simply the entire context fed into the model.
The text was updated successfully, but these errors were encountered: