Add data loading and chunking pipeline for document processing

### 🚀 Feature Request

**Problem**
Currently, the project does not have a structured pipeline for loading documents and splitting them into chunks for downstream processing.

**Proposed Solution**
Implement a data ingestion module that:

* Loads documents (e.g., from GitHub or local files)
* Splits them into smaller chunks using a text splitter

**Why is this needed?**

* Required for embedding generation
* Improves retrieval performance
* Forms the base for RAG pipeline

**Possible Approach**

* Use LangChain document loaders
* Use RecursiveCharacterTextSplitter for chunking

**Additional Context**
This will be the first step towards building a full retrieval-based QA system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data loading and chunking pipeline for document processing #116

🚀 Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add data loading and chunking pipeline for document processing #116

Description

🚀 Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions