End-to-end multimedia-to-text AI companion β runs π― locally π.
Convert video β audio β text for analysis, retrieval, and language model integration.
- π₯ Video to Audio conversion via
FFmpeg
oryt-dlp
- π Audio to Text transcription with
faster-whisper
(GPU-accelerated) - π€ LLM Integration with local models via Ollama
- π§° Built with
uv
andBun
, ultra-fast Python and JavaScript package managers - π» Runs completely offline for maximum privacy and minimal cost (just your β‘ bill π)
[UI] βββΆ [Media Orchestrator API] βββΆ [Transcription API]
β
ββββΆ [Agent Orchestrator API]
- UI: allows users to convert video into text and interact with an AI agent for summarization or deeper exploration of media content
- Agent Orchestrator API: wraps an Ollama SDK and an open-source model of choice and streams LLM-generated responses back to the client
- Media Orchestrator API: provides endpoints to convert and manage audio and text files derived from video. It also supports polling for real-time media conversion status updates
- Transcription API: transcribes audio to text and exposes an endpoint to check live transcription progress
bun run setup # Install all dependencies
bun run dev # Starts UI and backend services
The following software needs to be installed on your local machine before running.
We highly recommend the following:
- Bun β Fast JavaScript runtime & package manager
- uv β Blazing fast Python environment manager (written in Rust)
- Ollama β A streamlined, open-source platform for running and managing LLMs on your local machine. It simplifies downloading, setting up, and interacting with open-source models
βΉοΈ Make sure Ollama is running in the background for LLM-based workflows.
- FFmpeg β A powerful multimedia toolkit for handling audio, video, subtitles, and metadata
- yt-dlp β A feature-rich CLI tool for downloading videos and audio from thousands of websites (a modern fork of youtube-dl)
To enable GPU-accelerated transcription with faster-whisper
:
- NVIDIA GPU with sufficient VRAM for your chosen model
- NVIDIA GPU driver (version depends on your CUDA setup)
- CUDA Toolkit (typically version 11+)
- cuDNN (sometimes bundled with CUDA)
To ensure PyTorch is installed with CUDA support:
# ./transcription-api
uv pip install torch --index-url https://download.pytorch.org/whl/cu128 && uv sync