-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Feature description
Objective
Develop a PPTX file parser to extract content from PowerPoint presentations for Ragbits' RAG pipeline.
Requirements
- Input:
.pptx
files - Extract: Text, images, tables, slide metadata
- Use
python-pptx
library
Motivation
PowerPoint decks hold a large share of domain knowledge that Ragbits can’t currently ingest. A dedicated PPTX parser lets us index that content, close the data gap, and boost RAG answer quality without changing authoring habits.
Additional context
No response
Metadata
Metadata
Assignees
Labels
featureNew feature or requestNew feature or request
Type
Projects
Status
In Progress