11 Jun 15:55

lsorber

376fe0d

v1.0.0 Latest

Latest

🎉 RAGLite v1.0: DuckDB, Qwen3, parallel insertion, benchmarking, better retrieval quality

Release Highlights

🐤 support for DuckDB (#137)
🐻 support for Qwen3 (#124)
⚡️ parallel document insertion (#150)
🏁 benchmarking with raglite bench (#150)
🎯 better retrieval quality with improved multi-vector search, chunk quality, and chunk front matter (#123, #126, #132)
💎 new and improved query adapter algorithm (#146, #147, #149)

What's Changed

fix: don't convert markdown to markdown by @joachim-Heirbrant-SL in #116
fix: fix chunking of single-sentence chunks by @emilradix in #115
fix: incorporate headings and prevent windowing chunks by @emilradix in #117
fix: improve contextual chunk headings by @SimonJasansky in #118
feat: add option to use single chunk embeddings by @emilradix in #119
feat: add metadata at the document level by @emilradix in #122
feat: add support for reasoning tool use and upgrade to Qwen3 by @lsorber in #124
feat: add front matter to chunk content by @lsorber in #126
feat: introduce chunklets to improve chunking by @lsorber in #123
fix: remove mdformat by @emilradix in #128
feat: rank chunks by the L∞ norm of their multi-vector similarity by @lsorber in #132
feat: enable weighted reciprocal rank fusion by @emilradix in #136
fix: fix off-by-one error in parsing of Markdown headings by @joachim-Heirbrant-SL in #133
feat: improve config and API by @lsorber in #138
feat: replace SQLite with DuckDB by @lsorber in #137
ci: skip slow tests in CI by @lsorber in #139
fix: adapt oversampling to chunk size by @lsorber in #140
feat: make pandas an optional dependency by @lsorber in #141
fix: upgrade rerankers and recommended Cohere model by @lsorber in #142
fix: improve token assignment in late chunking by @lsorber in #144
fix: run checkpoint after DuckDB inserts by @lsorber in #145
feat: improve query adapter algorithm by @lsorber in #146
feat: add ability to control the gap in query adapter by @lsorber in #147
feat: optimally separate result sets in query adapter by @lsorber in #149
feat: parallelize inserts and add benchmarking by @lsorber in #150
docs: set Rerankers verbosity to 0 in README by @ThomasDelsart in #156
fix: fix parsing of font sizes for pdfs with no headings by @ThomasDelsart in #155

New Contributors

@SimonJasansky made their first contribution in #118

Full Changelog: v0.7.0...v1.0.0

Contributors

lsorber, ThomasDelsart, and 3 other contributors

Assets 2

17 Mar 13:22

lsorber

v0.7.0

f6495ae

v0.7.0

What's Changed

feat: make importing faster by @lsorber in #86
fix: avoid conflicting chunk ids by @joachim-Heirbrant-SL in #93
feat: add ability to directly insert Markdown content into the database by @ThomasDelsart in #96
feat: make llama-cpp-python an optional dependency by @rchretien in #97
feat: migrate from poetry-cookiecutter to substrate by @rchretien in #98
chore: upgrade scaffolding by @lsorber in #105
fix: revert pandoc extra name by @lsorber in #106
docs: improve inline comments by @lsorber in #107
fix: lazily raise module not found for optional deps by @lsorber in #109
feat: compute optimal sentence boundaries by @lsorber in #110
fix: fix CLI entrypoint regression by @lsorber in #111
feat: replace post-processing with declarative optimization by @lsorber in #112

New Contributors

@joachim-Heirbrant-SL made their first contribution in #93
@ThomasDelsart made their first contribution in #96
@rchretien made their first contribution in #97

Full Changelog: v0.6.2...v0.7.0

Contributors

lsorber, ThomasDelsart, and 2 other contributors

Assets 2

06 Jan 22:27

lsorber

v0.6.2

290e2c0

v0.6.2

What's Changed

fix: remove unnecessary stop sequence by @lsorber in #84

Full Changelog: v0.6.1...v0.6.2

Contributors

lsorber

Assets 2

06 Jan 14:18

lsorber

v0.6.1

d1e1f39

v0.6.1

What's Changed

fix: conditionally enable LlamaRAMCache by @lsorber in #83
fix(deps): exclude litellm versions that break get_model_info by @lsorber in #78
fix: improve (re)insertion speed by @lsorber in #80
fix: fix Markdown heading boundary probas by @lsorber in #81

Full Changelog: v0.6.0...v0.6.1

Contributors

lsorber

Assets 2

05 Jan 15:39

lsorber

v0.6.0

b19963d

v0.6.0

What's Changed

chore: update _extract.py by @eltociear in #70
feat: improve sentence splitting by @lsorber in #72
feat: add streaming tool use to llama-cpp-python by @lsorber in #71
feat: upgrade from xx_sent_ud_sm to SaT by @lsorber in #74
feat: add support for Python 3.12 by @lsorber in #69
chore: cruft update by @lsorber in #76

New Contributors

@eltociear made their first contribution in #70

Full Changelog: v0.5.1...v0.6.0

Contributors

lsorber and eltociear

Assets 2

18 Dec 15:15

lsorber

v0.5.1

bf598dc

v0.5.1

What's Changed

fix: improve output for empty databases by @lsorber in #68

Full Changelog: v0.5.0...v0.5.1

Contributors

lsorber

Assets 2

17 Dec 09:49

lsorber

v0.5.0

2e9bfaf

v0.5.0

What's Changed

style: reduce httpx log level by @lsorber in #59
feat: let LLM choose whether to retrieve context by @lsorber in #62
fix: support pgvector v0.7.0+ by @undo76 in #63
docs: add GitHub star history to README by @MattiaMolon in #65
feat: add MCP server by @lsorber in #67

New Contributors

@MattiaMolon made their first contribution in #65

Full Changelog: v0.4.1...v0.5.0

Contributors

undo76, lsorber, and MattiaMolon

Assets 2

05 Dec 20:50

lsorber

v0.4.1

0c5b7b5

v0.4.1

What's Changed

fix: support embedding with LiteLLM for Ragas by @undo76 in #56
fix: add and enable OpenAI strict mode by @undo76 in #55

Full Changelog: v0.4.0...v0.4.1

Contributors

undo76

Assets 2

04 Dec 16:31

lsorber

v0.4.0

abb4d1b

v0.4.0

What's Changed

feat: improve late chunking and optimize pgvector settings by @lsorber in #51
- Add a workaround for #24 to increase the embedder's context size from 512 to a user-definable size.
- Increase the default embedder context size to 1024 tokens (more degrades bge-m3's performance).
- Upgrade llama-cpp-python to the latest version.
- More robust testing of rerankers with Kendall's rank correlation coefficient.
- Optimise pgvector's settings.
- Offer better control of oversampling in hybrid and vector search.
- Upgrade to the PostgreSQL 17.

Full Changelog: v0.3.0...v0.4.0

Contributors

lsorber

Assets 2

03 Dec 18:26

lsorber

v0.3.0

0fd1970

v0.3.0

What's Changed

feat: support prompt caching and apply Anthropic's long-context prompt format by @undo76 in #52

Full Changelog: v0.2.1...v0.3.0

Contributors

undo76

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎉 RAGLite v1.0: DuckDB, Qwen3, parallel insertion, benchmarking, better retrieval quality

Release Highlights

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: superlinear-ai/raglite

v1.0.0

🎉 RAGLite v1.0: DuckDB, Qwen3, parallel insertion, benchmarking, better retrieval quality

Release Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.2

What's Changed

Contributors

Uh oh!

v0.6.1

What's Changed

Contributors

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.1

What's Changed

Contributors

Uh oh!

v0.5.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1

What's Changed

Contributors

Uh oh!

v0.4.0

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

Contributors

Uh oh!