Skip to content

End-to-end multimedia to text AI companion. Runs πŸ’― locally πŸ‘€. Includes a full suite of tools to convert video β†’ audio β†’ text for analysis, retrieval, and language model integration.

Notifications You must be signed in to change notification settings

stevenxchung/DeepLearner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepLearner 🧠

End-to-end multimedia-to-text AI companion β€” runs πŸ’― locally πŸ‘€.
Convert video β†’ audio β†’ text for analysis, retrieval, and language model integration.

Python TypeScript Runs Locally

Demo

πŸš€ Features

  • πŸŽ₯ Video to Audio conversion via FFmpeg or yt-dlp
  • πŸ”‰ Audio to Text transcription with faster-whisper (GPU-accelerated)
  • πŸ€– LLM Integration with local models via Ollama
  • 🧰 Built with uv and Bun, ultra-fast Python and JavaScript package managers
  • πŸ’» Runs completely offline for maximum privacy and minimal cost (just your ⚑ bill πŸ‘€)

🏭 Architecture

[UI] ──▢ [Media Orchestrator API] ──▢ [Transcription API]
β”‚
└──▢ [Agent Orchestrator API]
  • UI: allows users to convert video into text and interact with an AI agent for summarization or deeper exploration of media content
  • Agent Orchestrator API: wraps an Ollama SDK and an open-source model of choice and streams LLM-generated responses back to the client
  • Media Orchestrator API: provides endpoints to convert and manage audio and text files derived from video. It also supports polling for real-time media conversion status updates
  • Transcription API: transcribes audio to text and exposes an endpoint to check live transcription progress

πŸ‘©β€πŸ’» Running

bun run setup # Install all dependencies
bun run dev   # Starts UI and backend services

🧱 Requirements

The following software needs to be installed on your local machine before running.

πŸ“¦ Package Managers

We highly recommend the following:

  • Bun – Fast JavaScript runtime & package manager
  • uv – Blazing fast Python environment manager (written in Rust)

πŸ€– Local LLMs

  • Ollama – A streamlined, open-source platform for running and managing LLMs on your local machine. It simplifies downloading, setting up, and interacting with open-source models

ℹ️ Make sure Ollama is running in the background for LLM-based workflows.

🎞️ Video to Audio Tools

  • FFmpeg – A powerful multimedia toolkit for handling audio, video, subtitles, and metadata
  • yt-dlp – A feature-rich CLI tool for downloading videos and audio from thousands of websites (a modern fork of youtube-dl)

πŸŽ™οΈ Audio to Text (Transcription)

To enable GPU-accelerated transcription with faster-whisper:

  • NVIDIA GPU with sufficient VRAM for your chosen model
  • NVIDIA GPU driver (version depends on your CUDA setup)
  • CUDA Toolkit (typically version 11+)
  • cuDNN (sometimes bundled with CUDA)

To ensure PyTorch is installed with CUDA support:

# ./transcription-api
uv pip install torch --index-url https://download.pytorch.org/whl/cu128 && uv sync

About

End-to-end multimedia to text AI companion. Runs πŸ’― locally πŸ‘€. Includes a full suite of tools to convert video β†’ audio β†’ text for analysis, retrieval, and language model integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published