A powerful desktop application for audio transcription and AI-powered analysis. Transcribe audio files using OpenAI's advanced models, then chat with your transcripts using intelligent AI assistance.
- Multiple transcription models - Choose from
gpt-4o-transcribe,whisper-1, orgpt-4o-transcribe-diarize - Speaker diarization - Automatic speaker identification with reference audio support
- Direct audio recording - Record audio directly in the app with live waveform visualization
- Drag-and-drop upload - Support for MP3, WAV, M4A, WEBM, MP4, OGG, FLAC formats
- Large file support - Automatic chunking for files of any size (no 25MB limit)
- Custom prompts - Guide transcription with context-specific prompts
- Summary generation - AI-powered summaries with customizable templates
- Parallel chunk processing - 60-80% faster transcription for large files
- Dynamic rate limiting - Intelligent API request management (80 RPM)
- Audio speed optimization - Optional 2-3x speed-up for 23-33% cost savings
- Opus compression - Bandwidth optimization with 5-10x file size reduction
- Intelligent chat interface - Ask questions about your transcripts using OpenAI Agents SDK
- Context-aware responses - AI fetches only relevant sections (90% token reduction)
- Multi-transcript support - Compare and analyze multiple transcripts simultaneously
- Advanced tools - Search, chunk retrieval, speaker extraction, transcript comparison
- Chat history - Persistent conversation history per transcript
- Organized storage - All transcripts saved automatically with metadata
- Smart search - Find transcripts by name, content, or date
- Filtering options - View All, Starred, or Recent transcripts
- Export formats - TXT, VTT, or Markdown with one click
- Secure storage - API keys stored in system keychain (macOS/Windows)
- Two-tab design - Separate Recording and Analysis workspaces
- Dark mode - Beautiful dark theme with system-aware switching
- Resizable panels - Customize your workspace layout
- Apple-inspired design - Clean, minimal, and intuitive
Download and install Audio Transcription-2.0.0-arm64.dmg
Download and run Audio Transcription Setup 2.0.0.exe or use the portable version
- Launch the app and enter your OpenAI API key
- Recording tab: Upload or record audio, then transcribe
- Analysis tab: View transcripts and chat with AI about the content
-
Upload or Record Audio
- Drag and drop an audio file
- Click "Choose File" to browse
- Or use "Record" to capture audio directly
-
Configure Transcription
- Select transcription model (gpt-4o-transcribe recommended)
- Enable speaker diarization if needed
- Add optional context prompt
- Choose summary template
-
Transcribe
- Click "Transcribe" and monitor progress
- Large files automatically chunked and processed in parallel
- Transcript auto-saves to Analysis tab when complete
-
Transcript Library (Left Panel)
- Search transcripts by name
- Filter: All / Starred / Recent
- Click to view transcript
-
Transcript Viewer (Middle Panel)
- Read full transcript with formatting
- Export to TXT, VTT, or Markdown
- Star important transcripts
-
AI Chat (Right Panel)
- Select one or more transcripts for context
- Ask questions about the content
- AI intelligently searches and references specific sections
- Chat history saved per transcript
- "What were the main topics discussed?"
- "Summarize the key decisions made"
- "What did [Speaker Name] say about [topic]?"
- "Find all mentions of [keyword]"
- "Compare how the speakers approached [topic]"
- Frontend: React 19, Vite, TailwindCSS
- Backend: Electron 28, Node.js
- AI: OpenAI Agents SDK, gpt-4o, Whisper models
- Storage: electron-store with system keychain integration
- Audio: FFmpeg with fluent-ffmpeg wrapper
- gpt-4o-transcribe - Latest model, best quality, $0.006/minute
- whisper-1 - Previous generation, $0.006/minute
- gpt-4o-transcribe-diarize - Automatic speaker identification
- Parallel chunk processing (5 concurrent)
- Dynamic rate limiting (80 RPM)
- Optional audio speed optimization (1x-3x)
- Optional Opus compression for uploads
- Automatic format conversion (OGG, FLAC β MP3)
- API keys stored in system keychain (macOS Keychain/Windows Credential Manager)
- All data stored locally (no cloud sync)
- Chat history encrypted with OS-level encryption
- Get your API key from OpenAI Platform
- Click the key icon in the app header
- Paste your API key and click "Save"
- Key is securely stored in your system keychain
- Node.js 20+
- OpenAI API key
# Install dependencies
npm install
# Run in development mode
npm start
# Run tests
export OPENAI_API_KEY=your-api-key-here
npm test
# Build for macOS
npm run build:mac
# Build for Windows
npm run build:wintest-ffmpeg.js- FFmpeg infrastructure teststest-transcription-service.js- Integration tests for optimizations
v1.0.0 (Sequential):
- Processing time: ~320 seconds
- Cost: $0.36
v2.0.0 (Parallel):
- Processing time: ~70 seconds (78% faster)
- Cost: $0.36
v2.0.0 (Parallel + 2.5x Speed):
- Processing time: ~50 seconds
- Cost: $0.27 (25% savings)
See CLAUDE.md for comprehensive development documentation including:
- Project architecture
- Backend services structure
- Adding new features
- Agent tools and guardrails
- Testing strategies
Created by Patrick C. Freyer and Alexander Achba
MIT
Version 2.0.0 - Major redesign with AI chat, performance optimizations, and comprehensive analysis features
