How to get the AI to source their response from multiple documents? #780

calmtortoise · 2024-02-22T21:42:09Z

What would you like to see?

While asking the AI to generate and summary of a topic it gives a limited response based on one pdf source instead of the multiple sources available in the database. Let's say I have 5 articles on topic X. Each one has some overlap but also has unique information about topic X. I instruct AI to write a summary of the key points from the topic or maybe draft an introduction to a research paper on the topic and it only uses 1 of the 5 pdfs for information. I am actually sure this is user error but I am not exactly sure how to improve. To complicate matters sometimes it uses all 5 and sometimes just one but for the exact same prompt.

3x3cut0r · 2024-02-25T15:40:04Z

In my opinion, this is the biggest issue that should be worked on. I am desperately looking for a tool that offers exactly this and have not found it yet. While privateGPT searches only the last added file, AnythingLLM searches at least more than one. But again, as you mentioned, it seems to me that it only searches 1-3 of the documents and not all of them. So i can never be sure what exactly it has considered and searched.
I guess it has to do with the embedding model and the context window, which is also different depending on the model used. I don't know, does AnythingLLM search the chunks and do a "sentence similarity" analysis locally before sending them to the LLM? or how is this technically solved? It is hardly possible to send all chunks to the LLM every time.
I am also very interested on some insights here.

vautieri · 2024-03-07T22:22:34Z

" I am desperately looking for a tool that offers exactly this and have not found it yet." I went the same path as you did, tried privategpt first and ended up here. If anyone found a workaround in the short term this would be helpful. If someone knows what the limitation is and why, maybe I could help out in some way. I also do not know why this is tagged as a feature request. To me it seems like a bug or the fact it's simply not finished yet... enhancement I guess works as a tag, but most people might view this as a bug.

timothycarambat · 2024-03-08T04:30:46Z

It's not a bug, nor really a feature. I think it's a use-case issue or more likely just UX design on our part to not surface the controls more readily that give you the abilities that are being talked about here. It could also be an education thing since it may not be evident what each control parameter does. We dont want a complex UI and the vast majority of people will never touch these settings anyway.

In general, on a workspace, AnythingLLM has to make some assumptions and constraints.

We limit the "number of relevant" text chunks to a default of 4 per query. This is adjustable in the workspace though so you can go to as many as you want. Going up to 8 usually is a safe bet if results aren't great out the gate.
We assume a text chunk with a vector similarity score <20% to be "irrelevant. However, depending on document length, number of documents vectorized in the workspace, and even embedding length - this value is not definitive. In workspace settings you can modify it to be totally unrestricted! This would mean each query you would get the max number of results back (default of 4)
Lastly, if all else doesn't look good. We support document pinning which basically means under any circumstance we will inject the entire document text into the prompt for the pinned document. If the window is large enough, this can be every document in the workspace. On top of this, we will also vector search to be doubly sure a good result is found.

On the other side of this, we make some assumptions for the maximum size any particular piece of a query can be (system prompt + context, history, and current prompt). I wont go into how we do this here, but it is managed as best we can.

And of course, if a document is referenced at all, it will be present in the citations returned from the response. Pinning basically makes a document certain to be included, but as mentioned above - modifying the # of chunks, the similarity threshold, or the embedder model used can make large improvements depending on your source documents.

All of what I spoke to in this comment is live in the current build.

We are working on better document chunking: #490
Better embedder models for some uses: #658

and that's just off the top of my head. Sorry for the wall of text, ill be using this comment as a reference for a bunch of other things going forward and wanted to type it out fully as to not repeat myself :)

vautieri · 2024-03-08T14:13:35Z

Thank you for the ideas! It's getting better, but I'm wondering if a timeout setting needs to get configured. I increased the default to query, and set to only query the documents. I'm not really sure is this is a new issue, but since my attempt is to query my documents:

"Could not respond to message.
An error occurred while streaming response. network error"

However, lmstudio never finished and has yet to even start returning a token. CPU and GPU are also still at or above 45%. To me it seems that the chat/query has a timeout instead of waiting for a response.

Another datapoint why it seems its a timeout vs finish or other network error:
When I know LM studio has finished, I type the same exact question, such as "give me a summary of each document", I get an immediate response (must be its cached somewhere), but this means LM Studio did indeed finish an return results from the previous query to where things get cached.

timothycarambat · 2024-03-08T19:09:55Z

@vautieri Does LM studio show that is did indeed get a request in the inference server logs? Or does it simply not even reach the LMStudio endpoint. If it never reaches LMStudio then the issue is the connection settings in AnythingLLM

vautieri · 2024-03-09T01:32:24Z

Yes, lm studio was still processing the request, which it's processing went for dozens of seconds after the error msg was displayed back in tbe gui. Once my cpu/gpu dropped to 1% I checked lm studio for any error, which I saw none. I then sent the same request and it was instant(so it got a cached response, meaning some level of the software was still active enough to make the cached value. I'll look deeper into it/turn on more logging...I was thinking maybe the gui/middle layer had a timeout if lm studio didn't return within x amount of time.

vautieri · 2024-03-18T18:54:01Z

Note, I thought Cloudflare being a proxied IP address was causing the issue (to keep my outfacing IP address private). I did turn off the proxy dns setting as a quick test. A proxied address is a quick way for me to ensure it is a https connection external to my LAN. You provably support https native with a certificate, I just never looked being I'm simply testing and learning about the RAG capabilities at this point (and the person I am testing this with is not local). I'm pretty confident it's a timeout going on somewhere.

timothycarambat · 2024-04-26T18:40:44Z

closing this as stale as it is mostly a conversation thread anyway :)

calmtortoise added enhancement New feature or request feature request labels Feb 22, 2024

timothycarambat added question Further information is requested and removed enhancement New feature or request feature request labels Feb 22, 2024

timothycarambat changed the title ~~[FEAT]: More like a question. How to get the AI to source their response from multiple documents?~~ How to get the AI to source their response from multiple documents? Mar 8, 2024

timothycarambat mentioned this issue Mar 8, 2024

The accuracy of data retrieval is not high #645

Closed

timothycarambat closed this as completed Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the AI to source their response from multiple documents? #780

How to get the AI to source their response from multiple documents? #780

calmtortoise commented Feb 22, 2024

3x3cut0r commented Feb 25, 2024 •

edited

Loading

vautieri commented Mar 7, 2024 •

edited

Loading

timothycarambat commented Mar 8, 2024

vautieri commented Mar 8, 2024

timothycarambat commented Mar 8, 2024

vautieri commented Mar 9, 2024

vautieri commented Mar 18, 2024 •

edited

Loading

timothycarambat commented Apr 26, 2024

How to get the AI to source their response from multiple documents? #780

How to get the AI to source their response from multiple documents? #780

Comments

calmtortoise commented Feb 22, 2024

What would you like to see?

3x3cut0r commented Feb 25, 2024 • edited Loading

vautieri commented Mar 7, 2024 • edited Loading

timothycarambat commented Mar 8, 2024

vautieri commented Mar 8, 2024

timothycarambat commented Mar 8, 2024

vautieri commented Mar 9, 2024

vautieri commented Mar 18, 2024 • edited Loading

timothycarambat commented Apr 26, 2024

3x3cut0r commented Feb 25, 2024 •

edited

Loading

vautieri commented Mar 7, 2024 •

edited

Loading

vautieri commented Mar 18, 2024 •

edited

Loading