-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get the AI to source their response from multiple documents? #780
Comments
In my opinion, this is the biggest issue that should be worked on. I am desperately looking for a tool that offers exactly this and have not found it yet. While privateGPT searches only the last added file, AnythingLLM searches at least more than one. But again, as you mentioned, it seems to me that it only searches 1-3 of the documents and not all of them. So i can never be sure what exactly it has considered and searched. |
" I am desperately looking for a tool that offers exactly this and have not found it yet." I went the same path as you did, tried privategpt first and ended up here. If anyone found a workaround in the short term this would be helpful. If someone knows what the limitation is and why, maybe I could help out in some way. I also do not know why this is tagged as a feature request. To me it seems like a bug or the fact it's simply not finished yet... enhancement I guess works as a tag, but most people might view this as a bug. |
It's not a bug, nor really a feature. I think it's a use-case issue or more likely just UX design on our part to not surface the controls more readily that give you the abilities that are being talked about here. It could also be an education thing since it may not be evident what each control parameter does. We dont want a complex UI and the vast majority of people will never touch these settings anyway. In general, on a workspace, AnythingLLM has to make some assumptions and constraints.
On the other side of this, we make some assumptions for the maximum size any particular piece of a query can be (system prompt + context, history, and current prompt). I wont go into how we do this here, but it is managed as best we can. And of course, if a document is referenced at all, it will be present in the citations returned from the response. Pinning basically makes a document certain to be included, but as mentioned above - modifying the # of chunks, the similarity threshold, or the embedder model used can make large improvements depending on your source documents. All of what I spoke to in this comment is live in the current build. We are working on better document chunking: #490 and that's just off the top of my head. Sorry for the wall of text, ill be using this comment as a reference for a bunch of other things going forward and wanted to type it out fully as to not repeat myself :) |
Thank you for the ideas! It's getting better, but I'm wondering if a timeout setting needs to get configured. I increased the default to query, and set to only query the documents. I'm not really sure is this is a new issue, but since my attempt is to query my documents: "Could not respond to message. However, lmstudio never finished and has yet to even start returning a token. CPU and GPU are also still at or above 45%. To me it seems that the chat/query has a timeout instead of waiting for a response. Another datapoint why it seems its a timeout vs finish or other network error: |
@vautieri Does LM studio show that is did indeed get a request in the inference server logs? Or does it simply not even reach the LMStudio endpoint. If it never reaches LMStudio then the issue is the connection settings in AnythingLLM |
Yes, lm studio was still processing the request, which it's processing went for dozens of seconds after the error msg was displayed back in tbe gui. Once my cpu/gpu dropped to 1% I checked lm studio for any error, which I saw none. I then sent the same request and it was instant(so it got a cached response, meaning some level of the software was still active enough to make the cached value. I'll look deeper into it/turn on more logging...I was thinking maybe the gui/middle layer had a timeout if lm studio didn't return within x amount of time. |
Note, I thought Cloudflare being a proxied IP address was causing the issue (to keep my outfacing IP address private). I did turn off the proxy dns setting as a quick test. A proxied address is a quick way for me to ensure it is a https connection external to my LAN. You provably support https native with a certificate, I just never looked being I'm simply testing and learning about the RAG capabilities at this point (and the person I am testing this with is not local). I'm pretty confident it's a timeout going on somewhere. |
closing this as stale as it is mostly a conversation thread anyway :) |
What would you like to see?
While asking the AI to generate and summary of a topic it gives a limited response based on one pdf source instead of the multiple sources available in the database. Let's say I have 5 articles on topic X. Each one has some overlap but also has unique information about topic X. I instruct AI to write a summary of the key points from the topic or maybe draft an introduction to a research paper on the topic and it only uses 1 of the 5 pdfs for information. I am actually sure this is user error but I am not exactly sure how to improve. To complicate matters sometimes it uses all 5 and sometimes just one but for the exact same prompt.
The text was updated successfully, but these errors were encountered: