🚀 Feature Request
Problem
Currently, the project does not have a structured pipeline for loading documents and splitting them into chunks for downstream processing.
Proposed Solution
Implement a data ingestion module that:
- Loads documents (e.g., from GitHub or local files)
- Splits them into smaller chunks using a text splitter
Why is this needed?
- Required for embedding generation
- Improves retrieval performance
- Forms the base for RAG pipeline
Possible Approach
- Use LangChain document loaders
- Use RecursiveCharacterTextSplitter for chunking
Additional Context
This will be the first step towards building a full retrieval-based QA system.
🚀 Feature Request
Problem
Currently, the project does not have a structured pipeline for loading documents and splitting them into chunks for downstream processing.
Proposed Solution
Implement a data ingestion module that:
Why is this needed?
Possible Approach
Additional Context
This will be the first step towards building a full retrieval-based QA system.