DeepLearner 🧠

End-to-end multimedia-to-text AI companion — runs 💯 locally 👀.
Convert video → audio → text for analysis, retrieval, and language model integration.

🚀 Features

🎥 Video to Audio conversion via FFmpeg or yt-dlp
🔉 Audio to Text transcription with faster-whisper (GPU-accelerated)
🤖 LLM Integration with local models via Ollama
🧰 Built with uv and Bun, ultra-fast Python and JavaScript package managers
💻 Runs completely offline for maximum privacy and minimal cost (just your ⚡ bill 👀)

🏭 Architecture

[UI] ──▶ [Media Orchestrator API] ──▶ [Transcription API]
│
└──▶ [Agent Orchestrator API]

UI: allows users to convert video into text and interact with an AI agent for summarization or deeper exploration of media content
Agent Orchestrator API: wraps an Ollama SDK and an open-source model of choice and streams LLM-generated responses back to the client
Media Orchestrator API: provides endpoints to convert and manage audio and text files derived from video. It also supports polling for real-time media conversion status updates
Transcription API: transcribes audio to text and exposes an endpoint to check live transcription progress

👩‍💻 Running

bun run setup # Install all dependencies
bun run dev   # Starts UI and backend services

🧱 Requirements

The following software needs to be installed on your local machine before running.

📦 Package Managers

We highly recommend the following:

Bun – Fast JavaScript runtime & package manager
uv – Blazing fast Python environment manager (written in Rust)

🤖 Local LLMs

Ollama – A streamlined, open-source platform for running and managing LLMs on your local machine. It simplifies downloading, setting up, and interacting with open-source models

ℹ️ Make sure Ollama is running in the background for LLM-based workflows.

🎞️ Video to Audio Tools

FFmpeg – A powerful multimedia toolkit for handling audio, video, subtitles, and metadata
yt-dlp – A feature-rich CLI tool for downloading videos and audio from thousands of websites (a modern fork of youtube-dl)

🎙️ Audio to Text (Transcription)

To enable GPU-accelerated transcription with faster-whisper:

NVIDIA GPU with sufficient VRAM for your chosen model
NVIDIA GPU driver (version depends on your CUDA setup)
CUDA Toolkit (typically version 11+)
cuDNN (sometimes bundled with CUDA)

To ensure PyTorch is installed with CUDA support:

# ./transcription-api
uv pip install torch --index-url https://download.pytorch.org/whl/cu128 && uv sync

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
_assets		_assets
agent-orchestrator-api		agent-orchestrator-api
media-orchestrator-api		media-orchestrator-api
transcription-api		transcription-api
ui		ui
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepLearner 🧠

🚀 Features

🏭 Architecture

👩‍💻 Running

🧱 Requirements

📦 Package Managers

🤖 Local LLMs

🎞️ Video to Audio Tools

🎙️ Audio to Text (Transcription)

About

Uh oh!

Releases

Packages

Languages

stevenxchung/DeepLearner

Folders and files

Latest commit

History

Repository files navigation

DeepLearner 🧠

🚀 Features

🏭 Architecture

👩‍💻 Running

🧱 Requirements

📦 Package Managers

🤖 Local LLMs

🎞️ Video to Audio Tools

🎙️ Audio to Text (Transcription)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages