Audio Transcription App v2.0.0

A powerful desktop application for audio transcription and AI-powered analysis. Transcribe audio files using OpenAI's advanced models, then chat with your transcripts using intelligent AI assistance.

✨ Key Features

🎙️ Recording & Transcription

Multiple transcription models - Choose from gpt-4o-transcribe, whisper-1, or gpt-4o-transcribe-diarize
Speaker diarization - Automatic speaker identification with reference audio support
Direct audio recording - Record audio directly in the app with live waveform visualization
Drag-and-drop upload - Support for MP3, WAV, M4A, WEBM, MP4, OGG, FLAC formats
Large file support - Automatic chunking for files of any size (no 25MB limit)
Custom prompts - Guide transcription with context-specific prompts
Summary generation - AI-powered summaries with customizable templates

🚀 Performance Optimizations (NEW in v2.0.0)

Parallel chunk processing - 60-80% faster transcription for large files
Dynamic rate limiting - Intelligent API request management (80 RPM)
Audio speed optimization - Optional 2-3x speed-up for 23-33% cost savings
Opus compression - Bandwidth optimization with 5-10x file size reduction

💬 AI-Powered Analysis

Intelligent chat interface - Ask questions about your transcripts using OpenAI Agents SDK
Context-aware responses - AI fetches only relevant sections (90% token reduction)
Multi-transcript support - Compare and analyze multiple transcripts simultaneously
Advanced tools - Search, chunk retrieval, speaker extraction, transcript comparison
Chat history - Persistent conversation history per transcript

📚 Transcript Library

Organized storage - All transcripts saved automatically with metadata
Smart search - Find transcripts by name, content, or date
Filtering options - View All, Starred, or Recent transcripts
Export formats - TXT, VTT, or Markdown with one click
Secure storage - API keys stored in system keychain (macOS/Windows)

🎨 Modern Interface

Two-tab design - Separate Recording and Analysis workspaces
Dark mode - Beautiful dark theme with system-aware switching
Resizable panels - Customize your workspace layout
Apple-inspired design - Clean, minimal, and intuitive

📦 Installation

macOS

Download and install Audio Transcription-2.0.0-arm64.dmg

Windows

Download and run Audio Transcription Setup 2.0.0.exe or use the portable version

🚀 Quick Start

Launch the app and enter your OpenAI API key
Recording tab: Upload or record audio, then transcribe
Analysis tab: View transcripts and chat with AI about the content

📖 Usage Guide

Recording Tab

Upload or Record Audio
- Drag and drop an audio file
- Click "Choose File" to browse
- Or use "Record" to capture audio directly
Configure Transcription
- Select transcription model (gpt-4o-transcribe recommended)
- Enable speaker diarization if needed
- Add optional context prompt
- Choose summary template
Transcribe
- Click "Transcribe" and monitor progress
- Large files automatically chunked and processed in parallel
- Transcript auto-saves to Analysis tab when complete

Analysis Tab

Transcript Library (Left Panel)
- Search transcripts by name
- Filter: All / Starred / Recent
- Click to view transcript
Transcript Viewer (Middle Panel)
- Read full transcript with formatting
- Export to TXT, VTT, or Markdown
- Star important transcripts
AI Chat (Right Panel)
- Select one or more transcripts for context
- Ask questions about the content
- AI intelligently searches and references specific sections
- Chat history saved per transcript

Example Chat Queries

"What were the main topics discussed?"
"Summarize the key decisions made"
"What did [Speaker Name] say about [topic]?"
"Find all mentions of [keyword]"
"Compare how the speakers approached [topic]"

🔧 Technical Details

Architecture

Frontend: React 19, Vite, TailwindCSS
Backend: Electron 28, Node.js
AI: OpenAI Agents SDK, gpt-4o, Whisper models
Storage: electron-store with system keychain integration
Audio: FFmpeg with fluent-ffmpeg wrapper

Transcription Models

gpt-4o-transcribe - Latest model, best quality, $0.006/minute
whisper-1 - Previous generation, $0.006/minute
gpt-4o-transcribe-diarize - Automatic speaker identification

Performance Features

Parallel chunk processing (5 concurrent)
Dynamic rate limiting (80 RPM)
Optional audio speed optimization (1x-3x)
Optional Opus compression for uploads
Automatic format conversion (OGG, FLAC → MP3)

Security & Privacy

API keys stored in system keychain (macOS Keychain/Windows Credential Manager)
All data stored locally (no cloud sync)
Chat history encrypted with OS-level encryption

🔑 API Key Setup

Get your API key from OpenAI Platform
Click the key icon in the app header
Paste your API key and click "Save"
Key is securely stored in your system keychain

🧪 Testing

Prerequisites

Node.js 20+
OpenAI API key

Setup

# Install dependencies
npm install

# Run in development mode
npm start

# Run tests
export OPENAI_API_KEY=your-api-key-here
npm test

# Build for macOS
npm run build:mac

# Build for Windows
npm run build:win

Test Files

test-ffmpeg.js - FFmpeg infrastructure tests
test-transcription-service.js - Integration tests for optimizations

📊 Performance Benchmarks

Large File Example (60 min audio, 10 chunks)

v1.0.0 (Sequential):

Processing time: ~320 seconds
Cost: $0.36

v2.0.0 (Parallel):

Processing time: ~70 seconds (78% faster)
Cost: $0.36

v2.0.0 (Parallel + 2.5x Speed):

Processing time: ~50 seconds
Cost: $0.27 (25% savings)

🛠️ Development

See CLAUDE.md for comprehensive development documentation including:

Project architecture
Backend services structure
Adding new features
Agent tools and guardrails
Testing strategies

📝 Credits

Created by Patrick C. Freyer and Alexander Achba

Open Source Libraries

📄 License

MIT

🔗 Links

Version 2.0.0 - Major redesign with AI chat, performance optimizations, and comprehensive analysis features

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
build		build
dist-electron		dist-electron
docs		docs
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
index.html		index.html
main.js		main.js
migrate-to-compressed-storage.js		migrate-to-compressed-storage.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
preload.js		preload.js
tailwind.config.js		tailwind.config.js
test-cache-implementation.js		test-cache-implementation.js
test-compression.js		test-compression.js
test-ffmpeg.js		test-ffmpeg.js
test-path-validation.js		test-path-validation.js
test-storage-optimization.js		test-storage-optimization.js
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation

Audio Transcription App v2.0.0

✨ Key Features

🎙️ Recording & Transcription

🚀 Performance Optimizations (NEW in v2.0.0)

💬 AI-Powered Analysis

📚 Transcript Library

🎨 Modern Interface

📦 Installation

macOS

Windows

🚀 Quick Start

📖 Usage Guide

Recording Tab

Analysis Tab

Example Chat Queries

🔧 Technical Details

Architecture

Transcription Models

Performance Features

Security & Privacy

🔑 API Key Setup

🧪 Testing

Prerequisites

Setup

Test Files

📊 Performance Benchmarks

Large File Example (60 min audio, 10 chunks)

🛠️ Development

📝 Credits

Open Source Libraries

📄 License

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages