An automated video transcription tool powered by ElevenLabs Scribe API. This Node.js application extracts audio from video files and generates accurate transcriptions with speaker diarization support.
- 🎬 Video to Audio Extraction - Automatically extracts audio from MP4 and MOV files
- ☁️ ElevenLabs Scribe API Integration - Best-in-class transcription accuracy
- 🗣️ Speaker Diarization - Automatically identifies and labels different speakers
- 📝 Audio Event Tagging - Detects and annotates non-speech events like (laughter), (footsteps)
- 🔄 Smart Processing - Skips already processed files to save time and API costs
- 💾 Efficient Storage - Uses MP3 compression for audio files
- 📊 Interactive CLI - User-friendly menus with progress tracking
- 🌍 Multi-language Support - Supports 99 languages (default: English)
- ⚡ Large File Support - Handles files up to 3GB and 10 hours duration
- 🛡️ Robust Error Handling - Continues processing even if individual files fail
- Node.js (v24 or higher)
- ElevenLabs API key (Get one here)
- FFmpeg (bundled automatically via dependencies)
-
Clone the repository
git clone <repository-url> cd transcriber
-
Install dependencies
npm install
-
Configure environment variables
Create a
.envfile in the project root:cp .env.example .env
Edit
.envand add your ElevenLabs API key:ELEVENLABS_API_KEY=your_api_key_here
-
Place your video files in the
video/folder:cp /path/to/your/video.mp4 ./video/
-
Run the transcriber:
npm start
Or specify a language:
npm start -- --lang=ru
-
Follow the interactive prompts for each file:
- ✅ Continue - Process the current file
- ⏭️ Skip - Skip to the next file
- 🚪 Exit - Stop the program
-
Find your transcriptions in the
text/folder:cat ./text/your-video.txt
transcriber/
├── index.js # Main application entry point
├── package.json # Project dependencies and scripts
├── .env # Environment variables (not in repo)
├── .env.example # Environment template
├── video/ # Place video files here (.mp4, .mov)
├── audio/ # Extracted audio files (auto-generated)
└── text/ # Transcription results (auto-generated)
You can specify the transcription language in two ways:
Pass the --lang parameter when starting the application:
# English (default)
npm start
# Russian
npm start -- --lang=ru
# Spanish
npm start -- --lang=es
# Auto-detect language
npm start -- --lang=null
# Using node directly
node index.js --lang=ruEdit the TRANSCRIPTION_CONFIG object in index.js (lines 28-95):
// English (default)
language_code: "en",
// Russian
language_code: "ru",
// Auto-detect language
language_code: null,Note: Command line arguments override the configuration file setting.
All transcription parameters can be configured in index.js by modifying the TRANSCRIPTION_CONFIG object (lines 28-95):
// Enable speaker identification (default)
diarize: true,
// Auto-detect number of speakers
num_speakers: null,
// Or specify exact number (1-32)
num_speakers: 2,
// Adjust diarization sensitivity
diarization_threshold: 0.22, // Higher = fewer speaker splits// Tag events like (laughter), (applause), etc.
tag_audio_events: true,timestamps_granularity: "word", // Per-word timestamps
timestamps_granularity: "character", // Per-character timestamps
timestamps_granularity: "none", // No timestamps// Model selection
model_id: "scribe_v1", // Stable version
model_id: "scribe_v1_experimental", // Latest features
// Deterministic output
seed: 12345, // Same seed = same results
temperature: 0.0, // Lower = more deterministic
// Multi-channel audio
use_multi_channel: false, // Set true for multi-channel files (max 5)
// Privacy mode (Enterprise only)
enable_logging: false, // Zero-retention mode
// Webhook integration
webhook: false,
webhook_id: null,Supported Video Formats:
.mp4- MPEG-4 video files.mov- QuickTime video files
Output Formats:
- Audio: MP3 (compressed, efficient storage)
- Transcription: Plain text (.txt)
- Maximum file size: 3GB (ElevenLabs API limit)
- Maximum duration: 10 hours (ElevenLabs API limit)
- API timeout: 20 minutes per file
The application automatically validates files before processing and will skip files that exceed these limits.
Video File (.mp4/.mov)
↓
[1. Audio Extraction]
↓
Audio File (.mp3)
↓
[2. ElevenLabs Scribe API]
↓
Transcription (.txt)
- File Discovery - Scans the
video/folder for supported video files - Smart Skipping - Checks if transcription already exists (looks for matching .txt file)
- Interactive Menu - Prompts user for action (continue/skip/exit)
- Audio Extraction - Uses FFmpeg to extract audio as MP3 (skipped if .mp3 already exists)
- Metadata Validation - Checks file size and duration against API limits
- Transcription - Sends audio to ElevenLabs Scribe API with configured parameters
- Save Results - Writes transcription to
text/folder - Progress Tracking - Shows detailed progress and statistics
The application uses an interactive CLI with the following features:
- File-by-file confirmation - Control which files to process
- Progress indicators - "Processing file 3/10"
- Detailed file info - Size, duration, and status for each file
- Error recovery - Continues with remaining files if one fails
- Summary statistics - Shows total processed and skipped files at the end
# Process all videos with default language (English)
npm start
# Process with Russian language
npm start -- --lang=ru
# Process with auto-detected language
npm start -- --lang=null# Copy multiple videos
cp /path/to/videos/*.mp4 ./video/
# Run the transcriber
npm start
# The interactive menu will appear for each fileThe application automatically skips files that have already been transcribed:
# Run again - already processed files will be skipped automatically
npm start# View a specific transcription
cat ./text/my-video.txt
# List all transcriptions
ls -lh ./text/
# Search within transcriptions
grep "keyword" ./text/*.txtThis project uses the ElevenLabs Scribe API for transcription.
Key Features:
- 99 language support
- Speaker diarization
- Audio event detection
- Word-level timestamps
- Best-in-class accuracy
Pricing:
- Starting from $0.40 per hour of audio
- Enterprise plans available with volume discounts
- Pay-as-you-go, no monthly minimums
Rate Limits:
- File size: Max 3GB
- Duration: Max 10 hours per file
- Formats: MP3, WAV, M4A, and many more
Problem: The application can't find your ElevenLabs API key.
Solution:
- Ensure
.envfile exists in the project root - Check that
ELEVENLABS_API_KEYis set correctly - Verify no extra spaces or quotes around the key
- Restart the application after modifying
.env
Problem: Video file is larger than 3GB.
Solution:
- Compress the video before processing
- Split large videos into smaller segments
- Use a video compression tool (HandBrake, FFmpeg CLI)
Problem: The video/ folder is empty or contains unsupported formats.
Solution:
- Ensure video files are in
video/folder - Verify files have
.mp4or.movextensions - Check file permissions (must be readable)
Problem: Transcription contains errors or misidentified speakers.
Solutions:
- Audio quality - Ensure clear audio without excessive background noise
- Language setting - Verify
language_codematches the spoken language - Speaker count - Set
num_speakersif you know the exact number - Diarization threshold - Adjust if speakers are being merged or split incorrectly
Problem: Audio extraction fails with FFmpeg errors.
Solution: The application bundles FFmpeg automatically. If you encounter issues:
- Delete
node_modules/and reinstall:npm install - Check video file integrity (try playing it in a video player)
- Try converting the video to MP4 format first
Problem: API requests fail or timeout.
Solutions:
- Check your internet connection
- Verify API key is valid and has sufficient credits
- For very large files, ensure stable connection for 20+ minutes
- Try processing smaller files first to verify setup
- axios - HTTP client for API requests
- dotenv - Environment variable management
- fluent-ffmpeg - FFmpeg wrapper for audio/video processing
- @ffmpeg-installer/ffmpeg - Bundled FFmpeg binary
- @ffprobe-installer/ffprobe - Bundled FFprobe for metadata extraction
- form-data - Multipart form data for file uploads
- inquirer - Interactive CLI prompts
# Install dependencies
npm install
# Run the application
node index.jsThe entire application is contained in a single file (index.js) for simplicity:
- Lines 18-95: Configuration and constants
- Lines 107-126: File discovery functions
- Lines 134-154: Audio extraction
- Lines 161-183: Metadata extraction
- Lines 191-264: ElevenLabs API integration
- Lines 274-298: Interactive menu
- Lines 307-365: Audio transcription logic
- Lines 373-422: Video processing pipeline
- Lines 427-503: Main application entry point
ElevenLabs Scribe supports 99 languages. Common language codes:
| Language | Code | Language | Code |
|---|---|---|---|
| English | en |
Spanish | es |
| Russian | ru |
French | fr |
| German | de |
Italian | it |
| Portuguese | pt |
Chinese | zh |
| Japanese | ja |
Korean | ko |
| Arabic | ar |
Hindi | hi |
For a complete list, see: ElevenLabs Language Support
- Use MP3 for storage - The application automatically converts to MP3, which is much smaller than WAV
- Process in batches - The interactive menu allows you to skip files if needed
- Reuse extracted audio - Already extracted MP3 files are reused if you run the script again
- Monitor API usage - Check your ElevenLabs dashboard to track costs
- Set correct language - Language detection works but is slower than specifying the language
This project is licensed under the ISC License.
For issues related to:
- This application - Open an issue in this repository
- ElevenLabs API - Contact ElevenLabs support
- FFmpeg - See FFmpeg documentation
Contributions are welcome! Please feel free to submit a Pull Request.
- Interactive CLI with file-by-file confirmation
- Full ElevenLabs Scribe API parameter support
- Speaker diarization and audio event tagging
- Smart file skipping to avoid reprocessing
- Comprehensive error handling and validation
- Support for MP4 and MOV video formats
- Automatic audio extraction to MP3
- English language default with 99 language support