Skip to content

docs: add reference to collection of papers on context length #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hesreallyhim
Copy link

Description

The explanatory text claims that:

Although context lengths are getting larger, it has been shown that language models increase performance on tasks when they are given less (but more relevant) information.

However, no citations are offered to support this claim. Since I found the claim interesting and wanted to read up on some of the evidence, I thought it would be helpful to provide a link to a/some citation(s).

Fixes

  • Docs: I found a nicely curated list of relevant articles with empirical evidence, so I just added a line to the bottom of the page, where other notes are located, to inform interested users where they could find research supporting this claim.

@@ -328,6 +328,7 @@ function App() {
<p><b>Notes:</b> *Text splitters trim the whitespace on the end of the js, python, and markdown splitters which is why the text jumps around, *Overlap is locked at &lt;50% of chunk size *Simple analytics (privacy friendly) used to understand my hosting bill.</p>
<p>For implementations of text splitters, view LangChain
(<a href="https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/character_text_splitter" target="_blank" rel="noopener noreferrer">py</a>, <a href="https://js.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/character_text_splitter" target="_blank" rel="noopener noreferrer">js</a>) & Llama Index (<a href="https://docs.llamaindex.ai/en/stable/api/llama_index.node_parser.SentenceSplitter.html#llama_index.node_parser.SentenceSplitter" target="_blank" rel="noopener noreferrer">py</a>, <a href="https://ts.llamaindex.ai/modules/low_level/node_parser" target="_blank" rel="noopener noreferrer">js</a>)</p>
<p>For more information and empirical research regarding the impact of large contexts on the performance of LLMs, see e.g., the collection <a href="https://github.com/Xnhyacinth/Awesome-LLM-Long-Context-Modeling?tab=readme-ov-file#111-llm" target="_blank" rel="noopener noreferrer">Awesome-LLM-Long-Context-Modeling</a>.</p>
<p>MIT License, <a href="https://github.com/gkamradt/ChunkViz" target="_blank" rel="noopener noreferrer">Opened Sourced</a>, PRs Welcome</p>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another super tiny nit:

Suggested change
<p>MIT License, <a href="https://github.com/gkamradt/ChunkViz" target="_blank" rel="noopener noreferrer">Opened Sourced</a>, PRs Welcome</p>
<p>MIT License, <a href="https://github.com/gkamradt/ChunkViz" target="_blank" rel="noopener noreferrer">Open Source</a>, PRs Welcome</p>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant