Skip to content

Add data loading and chunking pipeline for document processing #116

@alok-kumar0421

Description

@alok-kumar0421

🚀 Feature Request

Problem
Currently, the project does not have a structured pipeline for loading documents and splitting them into chunks for downstream processing.

Proposed Solution
Implement a data ingestion module that:

  • Loads documents (e.g., from GitHub or local files)
  • Splits them into smaller chunks using a text splitter

Why is this needed?

  • Required for embedding generation
  • Improves retrieval performance
  • Forms the base for RAG pipeline

Possible Approach

  • Use LangChain document loaders
  • Use RecursiveCharacterTextSplitter for chunking

Additional Context
This will be the first step towards building a full retrieval-based QA system.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions