Skip to content

feat: PPTX parser #687

@maxpill

Description

@maxpill

Feature description

Objective

Develop a PPTX file parser to extract content from PowerPoint presentations for Ragbits' RAG pipeline.

Requirements

  • Input: .pptx files
  • Extract: Text, images, tables, slide metadata
  • Use python-pptx library

Motivation

PowerPoint decks hold a large share of domain knowledge that Ragbits can’t currently ingest. A dedicated PPTX parser lets us index that content, close the data gap, and boost RAG answer quality without changing authoring habits.

Additional context

No response

Metadata

Metadata

Assignees

Labels

featureNew feature or request

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions