Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the AI to source their response from multiple documents? #780

Closed
calmtortoise opened this issue Feb 22, 2024 · 8 comments
Closed
Labels
question Further information is requested

Comments

@calmtortoise
Copy link

What would you like to see?

While asking the AI to generate and summary of a topic it gives a limited response based on one pdf source instead of the multiple sources available in the database. Let's say I have 5 articles on topic X. Each one has some overlap but also has unique information about topic X. I instruct AI to write a summary of the key points from the topic or maybe draft an introduction to a research paper on the topic and it only uses 1 of the 5 pdfs for information. I am actually sure this is user error but I am not exactly sure how to improve. To complicate matters sometimes it uses all 5 and sometimes just one but for the exact same prompt.

@calmtortoise calmtortoise added enhancement New feature or request feature request labels Feb 22, 2024
@timothycarambat timothycarambat added question Further information is requested and removed enhancement New feature or request feature request labels Feb 22, 2024
@3x3cut0r
Copy link

3x3cut0r commented Feb 25, 2024

In my opinion, this is the biggest issue that should be worked on. I am desperately looking for a tool that offers exactly this and have not found it yet. While privateGPT searches only the last added file, AnythingLLM searches at least more than one. But again, as you mentioned, it seems to me that it only searches 1-3 of the documents and not all of them. So i can never be sure what exactly it has considered and searched.
I guess it has to do with the embedding model and the context window, which is also different depending on the model used. I don't know, does AnythingLLM search the chunks and do a "sentence similarity" analysis locally before sending them to the LLM? or how is this technically solved? It is hardly possible to send all chunks to the LLM every time.
I am also very interested on some insights here.

@vautieri
Copy link

vautieri commented Mar 7, 2024

" I am desperately looking for a tool that offers exactly this and have not found it yet." I went the same path as you did, tried privategpt first and ended up here. If anyone found a workaround in the short term this would be helpful. If someone knows what the limitation is and why, maybe I could help out in some way. I also do not know why this is tagged as a feature request. To me it seems like a bug or the fact it's simply not finished yet... enhancement I guess works as a tag, but most people might view this as a bug.

@timothycarambat timothycarambat changed the title [FEAT]: More like a question. How to get the AI to source their response from multiple documents? How to get the AI to source their response from multiple documents? Mar 8, 2024
@timothycarambat
Copy link
Member

It's not a bug, nor really a feature. I think it's a use-case issue or more likely just UX design on our part to not surface the controls more readily that give you the abilities that are being talked about here. It could also be an education thing since it may not be evident what each control parameter does. We dont want a complex UI and the vast majority of people will never touch these settings anyway.

In general, on a workspace, AnythingLLM has to make some assumptions and constraints.

  • We limit the "number of relevant" text chunks to a default of 4 per query. This is adjustable in the workspace though so you can go to as many as you want. Going up to 8 usually is a safe bet if results aren't great out the gate.

  • We assume a text chunk with a vector similarity score <20% to be "irrelevant. However, depending on document length, number of documents vectorized in the workspace, and even embedding length - this value is not definitive. In workspace settings you can modify it to be totally unrestricted! This would mean each query you would get the max number of results back (default of 4)

  • Lastly, if all else doesn't look good. We support document pinning which basically means under any circumstance we will inject the entire document text into the prompt for the pinned document. If the window is large enough, this can be every document in the workspace. On top of this, we will also vector search to be doubly sure a good result is found.

On the other side of this, we make some assumptions for the maximum size any particular piece of a query can be (system prompt + context, history, and current prompt). I wont go into how we do this here, but it is managed as best we can.

And of course, if a document is referenced at all, it will be present in the citations returned from the response. Pinning basically makes a document certain to be included, but as mentioned above - modifying the # of chunks, the similarity threshold, or the embedder model used can make large improvements depending on your source documents.

All of what I spoke to in this comment is live in the current build.

We are working on better document chunking: #490
Better embedder models for some uses: #658

and that's just off the top of my head. Sorry for the wall of text, ill be using this comment as a reference for a bunch of other things going forward and wanted to type it out fully as to not repeat myself :)

@vautieri
Copy link

vautieri commented Mar 8, 2024

Thank you for the ideas! It's getting better, but I'm wondering if a timeout setting needs to get configured. I increased the default to query, and set to only query the documents. I'm not really sure is this is a new issue, but since my attempt is to query my documents:

"Could not respond to message.
An error occurred while streaming response. network error"

However, lmstudio never finished and has yet to even start returning a token. CPU and GPU are also still at or above 45%. To me it seems that the chat/query has a timeout instead of waiting for a response.

Another datapoint why it seems its a timeout vs finish or other network error:
When I know LM studio has finished, I type the same exact question, such as "give me a summary of each document", I get an immediate response (must be its cached somewhere), but this means LM Studio did indeed finish an return results from the previous query to where things get cached.

@timothycarambat
Copy link
Member

@vautieri Does LM studio show that is did indeed get a request in the inference server logs? Or does it simply not even reach the LMStudio endpoint. If it never reaches LMStudio then the issue is the connection settings in AnythingLLM

@vautieri
Copy link

vautieri commented Mar 9, 2024

Yes, lm studio was still processing the request, which it's processing went for dozens of seconds after the error msg was displayed back in tbe gui. Once my cpu/gpu dropped to 1% I checked lm studio for any error, which I saw none. I then sent the same request and it was instant(so it got a cached response, meaning some level of the software was still active enough to make the cached value. I'll look deeper into it/turn on more logging...I was thinking maybe the gui/middle layer had a timeout if lm studio didn't return within x amount of time.

@vautieri
Copy link

vautieri commented Mar 18, 2024

Note, I thought Cloudflare being a proxied IP address was causing the issue (to keep my outfacing IP address private). I did turn off the proxy dns setting as a quick test. A proxied address is a quick way for me to ensure it is a https connection external to my LAN. You provably support https native with a certificate, I just never looked being I'm simply testing and learning about the RAG capabilities at this point (and the person I am testing this with is not local). I'm pretty confident it's a timeout going on somewhere.

@timothycarambat
Copy link
Member

closing this as stale as it is mostly a conversation thread anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants