A modular, extensible Python framework for building Retrieval-Augmented Generation (RAG) pipelines. Plug in your own loaders, embedders, vector stores, and generators — or use the built-in implementations to get started in minutes.
- Modular by design — every component (loader, chunker, embedder, retriever, generator) is an abstract base class you can swap out
- Works out of the box — built-in text/Markdown loaders, fixed-size chunker, in-memory cosine retriever, and placeholder implementations that need no API keys
- Extensible ecosystem — simple contracts mean integrating OpenAI, HuggingFace, ChromaDB, FAISS, or any other tool is just a subclass away
- Batteries-optional — core dependency is
numpyonly; add[pdf],[openai],[chromadb], … as you need them - Fully tested — pytest-based test suite with coverage reporting
- Contributor-friendly — clear abstractions, good first issues, and detailed contributing guide
┌─────────────────────────────────────┐
│ RAGPipeline │
└─────────────┬───────────────────────┘
│
┌────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌──────────────┐ ┌──────────────────┐
│ DocumentLoader│──────▶│ TextChunker │──────┐ │ │
└───────────────┘ └──────────────┘ │ │ │
(TextFileLoader, (FixedSizeChunker, │ │ │
MarkdownLoader, SentenceChunker, │ │ │
PDFLoader*, …) SemanticChunker*) │ │ │
▼ │ │
┌──────────────┐ │
│ Embedder │ │
└──────┬───────┘ │
│ (RandomEmbedder, │
│ OpenAIEmbedder*, │
│ HFEmbedder*) │
▼ │
┌──────────────┐ │
│ Retriever │ │
└──────┬───────┘ │
│ (InMemoryRetriever,│
│ FAISSRetriever*, │
│ ChromaRetriever*) │
▼ │
┌──────────────┐ │
│ Generator │◀────────────┘
└──────────────┘
(EchoGenerator,
OpenAIGenerator*,
AnthropicGenerator*)
* = open contribution opportunity — see .github/GOOD_FIRST_ISSUES.md
Note: The package is not yet on PyPI. Install directly from GitHub:
# Core (numpy only)
pip install git+https://github.com/adaumsilva/RAG-framework.git
# With PDF support
pip install "ragframework[pdf] @ git+https://github.com/adaumsilva/RAG-framework.git"
# With OpenAI support
pip install "ragframework[openai] @ git+https://github.com/adaumsilva/RAG-framework.git"
# With HuggingFace embeddings
pip install "ragframework[huggingface] @ git+https://github.com/adaumsilva/RAG-framework.git"
# Everything
pip install "ragframework[all] @ git+https://github.com/adaumsilva/RAG-framework.git"Once published to PyPI, installation will simplify to pip install ragframework.
from ragframework import RAGPipeline, RAGConfig
from ragframework.document import TextFileLoader, FixedSizeChunker
from ragframework.embeddings import RandomEmbedder # swap for OpenAIEmbedder
from ragframework.retriever import InMemoryRetriever # swap for FAISSRetriever
from ragframework.generator import EchoGenerator # swap for OpenAIGenerator
pipeline = RAGPipeline(
loader=TextFileLoader(),
chunker=FixedSizeChunker(chunk_size=512, chunk_overlap=64),
embedder=RandomEmbedder(dim=384),
retriever=InMemoryRetriever(),
generator=EchoGenerator(),
config=RAGConfig(top_k=5),
)
# Ingest a document
n_chunks = pipeline.ingest("my_document.txt")
print(f"Indexed {n_chunks} chunks")
# Query
response = pipeline.query("What is this document about?")
print(response.answer)
for chunk in response.source_chunks:
print(f" Source: {chunk.metadata.get('source')} — {chunk.content[:80]}…")from ragframework.base import Embedder
class MyEmbedder(Embedder):
def embed(self, texts: list[str]) -> list[list[float]]:
# call your embedding API / model here
...That's it — plug MyEmbedder() into RAGPipeline and everything else stays the same.
Community contributions are the engine that drives this roadmap. Pick up a Good First Issue and open a PR!
| Priority | Item | Status |
|---|---|---|
| High | PDF document loader | Open |
| High | DOCX document loader | Open |
| High | OpenAI embeddings integration | Open |
| High | HuggingFace Sentence Transformers | Open |
| High | OpenAI / Anthropic generator | Open |
| Medium | FAISS vector store retriever | Open |
| Medium | ChromaDB retriever integration | Open |
| Medium | Semantic / recursive chunker | Open |
| Medium | Async pipeline support | Open |
| Low | Jupyter notebook examples | Open |
Contributions are what make open source great. Please read CONTRIBUTING.md before opening a PR.
- Fork the repo and create a branch:
git checkout -b feat/my-feature - Install dev dependencies:
pip install -e ".[dev]" - Write your code and tests
- Run the suite:
pytest tests/ -v - Open a pull request
Distributed under the MIT License. See LICENSE for more information.
Architecture inspired by RAG-Anything by HKUDS.