feat: add MLX-based local LLM post-processing for transcription enrichment#12
feat: add MLX-based local LLM post-processing for transcription enrichment#12
Conversation
Add support for passing context prompts to Whisper models (both OpenAI API and local WhisperKit) to improve transcription accuracy for custom words and domain-specific terminology. Changes: - Add whisperPrompt storage to SettingsManager - Update OpenAIClient to accept and send prompt parameter in API requests - Update LocalWhisperService to pass prompt to WhisperKit DecodingOptions - Update TranscriptionService to retrieve and pass prompt to both providers - Add UI in SettingsView General tab for configuring the prompt - Text editor with character counter - Warning when prompt exceeds ~800 characters (~224 tokens) The prompt parameter helps Whisper models better recognize: - Technical terminology and jargon - Product/company names - Custom vocabulary specific to the user's domain References: - WhisperKit prompt support: argmaxinc/WhisperKit#370 - OpenAI API prompt parameter: https://platform.openai.com/docs/guides/speech-to-text
…hment Integrate Apple's MLX framework to provide on-device AI post-processing of transcriptions with Qwen 2.5 models. This feature enhances transcriptions by adding proper punctuation, fixing spelling errors, and improving formatting while keeping all processing completely local. Key Features: - Native Swift integration with MLX framework (Apple Silicon optimized) - Three Qwen 2.5 model sizes: 0.5B, 1.5B, and 3B (4-bit quantized) - Automatic model download from HuggingFace mlx-community - Optional enrichment toggle (can be disabled) - Fallback to original transcription if enrichment fails - Zero-cost performance with Apple's Neural Engine Architecture: - LLMModels.swift: Enum defining available Qwen model variants - LocalLLMService.swift: MLX inference service for text enrichment - LLMModelManager.swift: Download and lifecycle management for models - Updated SettingsManager with LLM configuration options - Updated TranscriptionService with post-processing pipeline - New UI in Settings → General for model selection and management Model Options: - Qwen 2.5 (0.5B): ~300MB download, ~1GB RAM, fastest inference - Qwen 2.5 (1.5B): ~900MB download, ~2GB RAM, balanced (recommended) - Qwen 2.5 (3B): ~1.8GB download, ~3GB RAM, best quality Technical Details: - Uses MLX Swift for native Apple Silicon optimization - 4-bit quantization for reduced memory footprint - Runs on Neural Engine + GPU for maximum performance - No external API calls - completely offline - Temperature-controlled generation (default 0.3) Dependencies Added: - ml-explore/mlx-swift: Core MLX framework - ml-explore/mlx-swift-examples: MLXLLM and utilities Benefits: - No API costs or rate limits - Complete privacy (no data leaves device) - Faster than cloud-based solutions - Works offline - Optimized for M-series chips References: - Apple MLX: https://github.com/ml-explore/mlx-swift - MLX Community Models: https://huggingface.co/mlx-community - Apple Research: https://machinelearning.apple.com/research/exploring-llms-mlx-m5
…conflict The mlx-swift-examples package has an incompatible swift-transformers version with WhisperKit, causing dependency resolution to fail: - WhisperKit 0.15.0 requires swift-transformers 1.1.2..<1.2.0 - mlx-swift-examples requires swift-transformers 1.0.0..<1.1.0 This fix replaces the MLX-based LLM integration with LLM.swift, which uses llama.cpp directly without the swift-transformers dependency. Changes: - Replace mlx-swift and mlx-swift-examples with LLM.swift in Package.swift - Update LocalLLMService to use LLM.swift API (llama.cpp backend) - Update LLMModelManager to download GGUF files directly from HuggingFace - Update LLMModels with Qwen 2.5 GGUF model URLs (Q4_K_M quantization) - Add Sendable conformance to LLMModel enum - Fix promptTokens handling in LocalWhisperService - Add missing log variable in TranscriptionService Technical Details: - Uses official Qwen GGUF models from HuggingFace - Models stored in ~/Library/Application Support/SpeakEasy/LLMModels/ - Direct HTTP download with progress tracking - llama.cpp provides optimized Apple Silicon inference
Resolve conflicts with project rename (OpenVoicy -> SpeakEasy): - Update Package.swift paths to Sources/SpeakEasy - Move LLM files to SpeakEasy folder structure - Merge LLM settings into SettingsManager.swift - Merge LLM UI into SettingsView.swift Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add stricter system prompt to avoid chatty/translating responses - Use low temperature (0.1) for more deterministic output - Limit max tokens to 512 to prevent runaway generation - Add safeguards: empty output, output too long (3x input), repetition loops - Add detailed [FLOW] logging to trace transcription processing stages - Add [LLM] logging to trace exact input/output of LLM enrichment - Add detectRepetition() to catch LLM repetition loops (e.g. "I am coming to the party" x100) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@claude resume this PR |
1 similar comment
|
@claude resume this PR |
|
Claude finished @psg2's task in 2m 20s —— View job PR Review Complete ✓I've reviewed this PR and the recent improvements. Here's a comprehensive analysis: Tasks
SummaryThis PR successfully adds on-device LLM post-processing for transcription enrichment using Qwen 2.5 models via llama.cpp (LLM.swift). The implementation is well-designed with good error handling and fallback mechanisms. Key Accomplishments✅ Dependency Resolution FixedThe original MLX-based approach had a critical dependency conflict:
Solution: Switched to LLM.swift which uses llama.cpp directly, avoiding the conflict entirely. ✅ Improved Reliability (commit
|
Summary
Features
Local LLM Post-Processing
Whisper Prompt Support
Architecture
New files:
LLMModels.swift- Model definitions for Qwen variantsLocalLLMService.swift- MLX inference serviceLLMModelManager.swift- Download and lifecycle managementUpdated files:
SettingsManager.swift- LLM and prompt settings storageTranscriptionService.swift- Post-processing pipeline integrationSettingsView.swift- UI for model selection and configurationDependencies Added
Test plan
🤖 Generated with Claude Code