Skip to content

adaumsilva/RAG-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Framework

PyPI version Python 3.10+ License: MIT CI codecov GitHub issues PRs Welcome GitHub contributors GitHub stars

A modular, extensible Python framework for building Retrieval-Augmented Generation (RAG) pipelines. Plug in your own loaders, embedders, vector stores, and generators — or use the built-in implementations to get started in minutes.


Features

  • Modular by design — every component (loader, chunker, embedder, retriever, generator) is an abstract base class you can swap out
  • Works out of the box — built-in text/Markdown loaders, fixed-size chunker, in-memory cosine retriever, and placeholder implementations that need no API keys
  • Extensible ecosystem — simple contracts mean integrating OpenAI, HuggingFace, ChromaDB, FAISS, or any other tool is just a subclass away
  • Batteries-optional — core dependency is numpy only; add [pdf], [openai], [chromadb], … as you need them
  • Fully tested — pytest-based test suite with coverage reporting
  • Contributor-friendly — clear abstractions, good first issues, and detailed contributing guide

Architecture

                     ┌─────────────────────────────────────┐
                     │            RAGPipeline               │
                     └─────────────┬───────────────────────┘
                                   │
          ┌────────────────────────┼─────────────────────────┐
          │                        │                         │
          ▼                        ▼                         ▼
  ┌───────────────┐       ┌──────────────┐         ┌──────────────────┐
  │ DocumentLoader│──────▶│  TextChunker │──────┐  │                  │
  └───────────────┘       └──────────────┘      │  │                  │
  (TextFileLoader,        (FixedSizeChunker,     │  │                  │
   MarkdownLoader,         SentenceChunker,      │  │                  │
   PDFLoader*, …)          SemanticChunker*)     │  │                  │
                                                 ▼  │                  │
                                          ┌──────────────┐             │
                                          │   Embedder   │             │
                                          └──────┬───────┘             │
                                                 │  (RandomEmbedder,   │
                                                 │   OpenAIEmbedder*,  │
                                                 │   HFEmbedder*)      │
                                                 ▼                     │
                                          ┌──────────────┐             │
                                          │  Retriever   │             │
                                          └──────┬───────┘             │
                                                 │  (InMemoryRetriever,│
                                                 │   FAISSRetriever*,  │
                                                 │   ChromaRetriever*) │
                                                 ▼                     │
                                          ┌──────────────┐             │
                                          │  Generator   │◀────────────┘
                                          └──────────────┘
                                    (EchoGenerator,
                                     OpenAIGenerator*,
                                     AnthropicGenerator*)

  * = open contribution opportunity — see .github/GOOD_FIRST_ISSUES.md

Installation

Note: The package is not yet on PyPI. Install directly from GitHub:

# Core (numpy only)
pip install git+https://github.com/adaumsilva/RAG-framework.git

# With PDF support
pip install "ragframework[pdf] @ git+https://github.com/adaumsilva/RAG-framework.git"

# With OpenAI support
pip install "ragframework[openai] @ git+https://github.com/adaumsilva/RAG-framework.git"

# With HuggingFace embeddings
pip install "ragframework[huggingface] @ git+https://github.com/adaumsilva/RAG-framework.git"

# Everything
pip install "ragframework[all] @ git+https://github.com/adaumsilva/RAG-framework.git"

Once published to PyPI, installation will simplify to pip install ragframework.


Quick Start

from ragframework import RAGPipeline, RAGConfig
from ragframework.document import TextFileLoader, FixedSizeChunker
from ragframework.embeddings import RandomEmbedder   # swap for OpenAIEmbedder
from ragframework.retriever import InMemoryRetriever  # swap for FAISSRetriever
from ragframework.generator import EchoGenerator      # swap for OpenAIGenerator

pipeline = RAGPipeline(
    loader=TextFileLoader(),
    chunker=FixedSizeChunker(chunk_size=512, chunk_overlap=64),
    embedder=RandomEmbedder(dim=384),
    retriever=InMemoryRetriever(),
    generator=EchoGenerator(),
    config=RAGConfig(top_k=5),
)

# Ingest a document
n_chunks = pipeline.ingest("my_document.txt")
print(f"Indexed {n_chunks} chunks")

# Query
response = pipeline.query("What is this document about?")
print(response.answer)
for chunk in response.source_chunks:
    print(f"  Source: {chunk.metadata.get('source')}{chunk.content[:80]}…")

Implementing your own component

from ragframework.base import Embedder

class MyEmbedder(Embedder):
    def embed(self, texts: list[str]) -> list[list[float]]:
        # call your embedding API / model here
        ...

That's it — plug MyEmbedder() into RAGPipeline and everything else stays the same.


Roadmap

Community contributions are the engine that drives this roadmap. Pick up a Good First Issue and open a PR!

Priority Item Status
High PDF document loader Open
High DOCX document loader Open
High OpenAI embeddings integration Open
High HuggingFace Sentence Transformers Open
High OpenAI / Anthropic generator Open
Medium FAISS vector store retriever Open
Medium ChromaDB retriever integration Open
Medium Semantic / recursive chunker Open
Medium Async pipeline support Open
Low Jupyter notebook examples Open

Contributing

Contributions are what make open source great. Please read CONTRIBUTING.md before opening a PR.

  1. Fork the repo and create a branch: git checkout -b feat/my-feature
  2. Install dev dependencies: pip install -e ".[dev]"
  3. Write your code and tests
  4. Run the suite: pytest tests/ -v
  5. Open a pull request

License

Distributed under the MIT License. See LICENSE for more information.


Acknowledgements

Architecture inspired by RAG-Anything by HKUDS.

About

A modular, extensible Python framework for building Retrieval-Augmented Generation (RAG) pipelines. Plug in your own loaders, embedders, vector stores, and generators — or use the built-in implementations to get started in minutes.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages