Description
Add a RecursiveChunker that splits text using a priority list of separators (\n\n, \n, . , ) — falling back to the next separator when a chunk is still too long.
Motivation
The existing FixedSizeChunker splits text at arbitrary character boundaries, which can break mid-sentence or mid-paragraph and degrade retrieval quality. A recursive approach preserves semantic boundaries.
Acceptance criteria
Files to touch
ragframework/document/chunkers.py — add RecursiveChunker
ragframework/document/__init__.py — export it
tests/test_document/test_chunkers.py — add tests
Resources
- LangChain's
RecursiveCharacterTextSplitter for inspiration
- Existing chunkers in
ragframework/document/chunkers.py as reference
Description
Add a
RecursiveChunkerthat splits text using a priority list of separators (\n\n,\n,.,) — falling back to the next separator when a chunk is still too long.Motivation
The existing
FixedSizeChunkersplits text at arbitrary character boundaries, which can break mid-sentence or mid-paragraph and degrade retrieval quality. A recursive approach preserves semantic boundaries.Acceptance criteria
RecursiveChunkerinragframework/document/chunkers.pyTextChunkerfromragframework/base.pyseparators: list[str],chunk_size: int,chunk_overlap: intchunk_sizeragframework/document/__init__.pyCHANGELOG.mdupdated under[Unreleased]Files to touch
ragframework/document/chunkers.py— addRecursiveChunkerragframework/document/__init__.py— export ittests/test_document/test_chunkers.py— add testsResources
RecursiveCharacterTextSplitterfor inspirationragframework/document/chunkers.pyas reference