Video Transcriber

An automated video transcription tool powered by ElevenLabs Scribe API. This Node.js application extracts audio from video files and generates accurate transcriptions with speaker diarization support.

Features

🎬 Video to Audio Extraction - Automatically extracts audio from MP4 and MOV files
☁️ ElevenLabs Scribe API Integration - Best-in-class transcription accuracy
🗣️ Speaker Diarization - Automatically identifies and labels different speakers
📝 Audio Event Tagging - Detects and annotates non-speech events like (laughter), (footsteps)
🔄 Smart Processing - Skips already processed files to save time and API costs
💾 Efficient Storage - Uses MP3 compression for audio files
📊 Interactive CLI - User-friendly menus with progress tracking
🌍 Multi-language Support - Supports 99 languages (default: English)
⚡ Large File Support - Handles files up to 3GB and 10 hours duration
🛡️ Robust Error Handling - Continues processing even if individual files fail

Prerequisites

Node.js (v24 or higher)
ElevenLabs API key (Get one here)
FFmpeg (bundled automatically via dependencies)

Installation

Clone the repository

git clone <repository-url>
cd transcriber

Install dependencies
```
npm install
```
Configure environment variables

Create a .env file in the project root:
```
cp .env.example .env
```
Edit .env and add your ElevenLabs API key:
```
ELEVENLABS_API_KEY=your_api_key_here
```

Quick Start

Place your video files in the video/ folder:
```
cp /path/to/your/video.mp4 ./video/
```
Run the transcriber:
```
npm start
```
Or specify a language:
```
npm start -- --lang=ru
```
Follow the interactive prompts for each file:
- ✅ Continue - Process the current file
- ⏭️ Skip - Skip to the next file
- 🚪 Exit - Stop the program
Find your transcriptions in the text/ folder:
```
cat ./text/your-video.txt
```

Project Structure

transcriber/
├── index.js           # Main application entry point
├── package.json       # Project dependencies and scripts
├── .env              # Environment variables (not in repo)
├── .env.example      # Environment template
├── video/            # Place video files here (.mp4, .mov)
├── audio/            # Extracted audio files (auto-generated)
└── text/             # Transcription results (auto-generated)

Configuration

Language Configuration

You can specify the transcription language in two ways:

1. Command Line Argument (Recommended)

Pass the --lang parameter when starting the application:

# English (default)
npm start

# Russian
npm start -- --lang=ru

# Spanish
npm start -- --lang=es

# Auto-detect language
npm start -- --lang=null

# Using node directly
node index.js --lang=ru

2. Modify Configuration File

Edit the TRANSCRIPTION_CONFIG object in index.js (lines 28-95):

// English (default)
language_code: "en",

// Russian
language_code: "ru",

// Auto-detect language
language_code: null,

Note: Command line arguments override the configuration file setting.

Other Transcription Settings

All transcription parameters can be configured in index.js by modifying the TRANSCRIPTION_CONFIG object (lines 28-95):

Speaker Diarization

// Enable speaker identification (default)
diarize: true,

// Auto-detect number of speakers
num_speakers: null,

// Or specify exact number (1-32)
num_speakers: 2,

// Adjust diarization sensitivity
diarization_threshold: 0.22,  // Higher = fewer speaker splits

Audio Event Tagging

// Tag events like (laughter), (applause), etc.
tag_audio_events: true,

Timestamp Granularity

timestamps_granularity: "word",      // Per-word timestamps
timestamps_granularity: "character", // Per-character timestamps
timestamps_granularity: "none",      // No timestamps

Advanced Options

// Model selection
model_id: "scribe_v1",              // Stable version
model_id: "scribe_v1_experimental", // Latest features

// Deterministic output
seed: 12345,          // Same seed = same results
temperature: 0.0,     // Lower = more deterministic

// Multi-channel audio
use_multi_channel: false, // Set true for multi-channel files (max 5)

// Privacy mode (Enterprise only)
enable_logging: false,    // Zero-retention mode

// Webhook integration
webhook: false,
webhook_id: null,

File Format Support

Supported Video Formats:

.mp4 - MPEG-4 video files
.mov - QuickTime video files

Output Formats:

Audio: MP3 (compressed, efficient storage)
Transcription: Plain text (.txt)

File Size Limits

Maximum file size: 3GB (ElevenLabs API limit)
Maximum duration: 10 hours (ElevenLabs API limit)
API timeout: 20 minutes per file

The application automatically validates files before processing and will skip files that exceed these limits.

How It Works

Processing Pipeline

Video File (.mp4/.mov)
    ↓
[1. Audio Extraction]
    ↓
Audio File (.mp3)
    ↓
[2. ElevenLabs Scribe API]
    ↓
Transcription (.txt)

Step-by-Step Process

File Discovery - Scans the video/ folder for supported video files
Smart Skipping - Checks if transcription already exists (looks for matching .txt file)
Interactive Menu - Prompts user for action (continue/skip/exit)
Audio Extraction - Uses FFmpeg to extract audio as MP3 (skipped if .mp3 already exists)
Metadata Validation - Checks file size and duration against API limits
Transcription - Sends audio to ElevenLabs Scribe API with configured parameters
Save Results - Writes transcription to text/ folder
Progress Tracking - Shows detailed progress and statistics

Interactive Features

The application uses an interactive CLI with the following features:

File-by-file confirmation - Control which files to process
Progress indicators - "Processing file 3/10"
Detailed file info - Size, duration, and status for each file
Error recovery - Continues with remaining files if one fails
Summary statistics - Shows total processed and skipped files at the end

Usage Examples

Basic Usage

# Process all videos with default language (English)
npm start

# Process with Russian language
npm start -- --lang=ru

# Process with auto-detected language
npm start -- --lang=null

Batch Processing Multiple Files

# Copy multiple videos
cp /path/to/videos/*.mp4 ./video/

# Run the transcriber
npm start

# The interactive menu will appear for each file

Processing Only New Files

The application automatically skips files that have already been transcribed:

# Run again - already processed files will be skipped automatically
npm start

Viewing Transcriptions

# View a specific transcription
cat ./text/my-video.txt

# List all transcriptions
ls -lh ./text/

# Search within transcriptions
grep "keyword" ./text/*.txt

API Information

ElevenLabs Scribe

This project uses the ElevenLabs Scribe API for transcription.

Key Features:

99 language support
Speaker diarization
Audio event detection
Word-level timestamps
Best-in-class accuracy

Pricing:

Starting from $0.40 per hour of audio
Enterprise plans available with volume discounts
Pay-as-you-go, no monthly minimums

Rate Limits:

File size: Max 3GB
Duration: Max 10 hours per file
Formats: MP3, WAV, M4A, and many more

View full documentation

Troubleshooting

"API key not found" error

Problem: The application can't find your ElevenLabs API key.

Solution:

Ensure .env file exists in the project root
Check that ELEVENLABS_API_KEY is set correctly
Verify no extra spaces or quotes around the key
Restart the application after modifying .env

"File exceeds size limit" error

Problem: Video file is larger than 3GB.

Solution:

Compress the video before processing
Split large videos into smaller segments
Use a video compression tool (HandBrake, FFmpeg CLI)

"No video files found" warning

Problem: The video/ folder is empty or contains unsupported formats.

Solution:

Ensure video files are in video/ folder
Verify files have .mp4 or .mov extensions
Check file permissions (must be readable)

Transcription errors or poor quality

Problem: Transcription contains errors or misidentified speakers.

Solutions:

Audio quality - Ensure clear audio without excessive background noise
Language setting - Verify language_code matches the spoken language
Speaker count - Set num_speakers if you know the exact number
Diarization threshold - Adjust if speakers are being merged or split incorrectly

FFmpeg errors

Problem: Audio extraction fails with FFmpeg errors.

Solution: The application bundles FFmpeg automatically. If you encounter issues:

Delete node_modules/ and reinstall: npm install
Check video file integrity (try playing it in a video player)
Try converting the video to MP4 format first

Network or timeout errors

Problem: API requests fail or timeout.

Solutions:

Check your internet connection
Verify API key is valid and has sufficient credits
For very large files, ensure stable connection for 20+ minutes
Try processing smaller files first to verify setup

Development

Project Dependencies

axios - HTTP client for API requests
dotenv - Environment variable management
fluent-ffmpeg - FFmpeg wrapper for audio/video processing
@ffmpeg-installer/ffmpeg - Bundled FFmpeg binary
@ffprobe-installer/ffprobe - Bundled FFprobe for metadata extraction
form-data - Multipart form data for file uploads
inquirer - Interactive CLI prompts

Running in Development

# Install dependencies
npm install

# Run the application
node index.js

Modifying the Code

The entire application is contained in a single file (index.js) for simplicity:

Lines 18-95: Configuration and constants
Lines 107-126: File discovery functions
Lines 134-154: Audio extraction
Lines 161-183: Metadata extraction
Lines 191-264: ElevenLabs API integration
Lines 274-298: Interactive menu
Lines 307-365: Audio transcription logic
Lines 373-422: Video processing pipeline
Lines 427-503: Main application entry point

Language Support

ElevenLabs Scribe supports 99 languages. Common language codes:

Language	Code	Language	Code
English	`en`	Spanish	`es`
Russian	`ru`	French	`fr`
German	`de`	Italian	`it`
Portuguese	`pt`	Chinese	`zh`
Japanese	`ja`	Korean	`ko`
Arabic	`ar`	Hindi	`hi`

For a complete list, see: ElevenLabs Language Support

Performance Tips

Use MP3 for storage - The application automatically converts to MP3, which is much smaller than WAV
Process in batches - The interactive menu allows you to skip files if needed
Reuse extracted audio - Already extracted MP3 files are reused if you run the script again
Monitor API usage - Check your ElevenLabs dashboard to track costs
Set correct language - Language detection works but is slower than specifying the language

License

This project is licensed under the ISC License.

Support

For issues related to:

This application - Open an issue in this repository
ElevenLabs API - Contact ElevenLabs support
FFmpeg - See FFmpeg documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Changelog

Current Version (1.0.0)

Interactive CLI with file-by-file confirmation
Full ElevenLabs Scribe API parameter support
Speaker diarization and audio event tagging
Smart file skipping to avoid reprocessing
Comprehensive error handling and validation
Support for MP4 and MOV video formats
Automatic audio extraction to MP3
English language default with 99 language support

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
audio		audio
text		text
types		types
video		video
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

kossakovsky/transcriber

Folders and files

Latest commit

History

Repository files navigation

Video Transcriber

Features

Prerequisites

Installation

Quick Start

Project Structure

Configuration

Language Configuration

1. Command Line Argument (Recommended)

2. Modify Configuration File

Other Transcription Settings

Speaker Diarization

Audio Event Tagging

Timestamp Granularity

Advanced Options

File Format Support

File Size Limits

How It Works

Processing Pipeline

Step-by-Step Process

Interactive Features

Usage Examples

Basic Usage

Batch Processing Multiple Files

Processing Only New Files

Viewing Transcriptions

API Information

ElevenLabs Scribe

Troubleshooting

"API key not found" error

"File exceeds size limit" error

"No video files found" warning

Transcription errors or poor quality

FFmpeg errors

Network or timeout errors

Development

Project Dependencies

Running in Development

Modifying the Code

Language Support

Performance Tips

License

Support

Contributing

Changelog

Current Version (1.0.0)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages