feat(transcription): add on-device local transcription via FluidAudio#3
Open
feat(transcription): add on-device local transcription via FluidAudio#3
Conversation
Add on-device speech recognition and speaker diarization using FluidAudio (Parakeet TDT v3 ASR + offline diarization). Users get speaker-attributed transcripts without any API key, account, or internet connection. Architecture: - Dual-track independent processing: system audio (Track 0) and mic (Track 1) are transcribed and diarized separately, then merged with speaker attribution via temporal overlap matching - Tab switcher UI between Local and Soniox providers (like track picker) - Per-provider transcript files (transcript-local.json, transcript-soniox.json) - Per-provider speaker names stored in TranscriptDocument.speakers field - Auto-transcription after recording stops (when models downloaded + local default) - Legacy transcript.json migrated to transcript-soniox.json on first load New file: - LocalTranscriptionService.swift: FluidAudio integration, track extraction, ASR, diarization, temporal overlap matching, speaker merge Modified files: - Package.swift: FluidAudio dependency - TranscriptionService.swift: TranscriptionProvider enum, per-provider storage, speakers field, legacy migration with fallback - MainWindowView.swift: tab switcher, per-provider state, split transcription methods, speaker name resolution from transcript then metadata - SettingsView.swift: default provider picker, model download management - AudioMonitor.swift: auto-transcription hook after recording stops Existing Soniox transcription is completely untouched (zero changes to API client). Recording, playback, AEC, and export pipelines are unmodified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add on-device speech recognition and speaker diarization using FluidAudio (Parakeet TDT v3 ASR + offline diarization). Users get speaker-attributed transcripts without any API key, account, or internet connection. Soniox cloud transcription remains available as an alternative.
What changed
New file
LocalTranscriptionService.swift- FluidAudio integration: dual-track extraction via AVAssetReader, ASR on both tracks, diarization on both tracks, temporal overlap matching for speaker assignment, merge with sequential integer speaker IDsModified files
Package.swift- Added FluidAudio v0.12.4 dependencyTranscriptionService.swift-TranscriptionProviderenum (.local,.soniox), per-providersidecarURL/load/save,speakersfield onTranscriptDocument, legacytranscript.jsonmigration with fallbackMainWindowView.swift- Segmented tab switcher between Local/Soniox, per-provider transcript and status state, split transcription methods, speaker name resolution (transcript.speakers -> metadata.speakers -> default)SettingsView.swift- Default provider picker, model download button with progressAudioMonitor.swift- Auto-transcription hook after recording stops (fires when local provider is default and models are downloaded)Architecture
What does NOT change
Per-provider transcript storage
Each provider writes its own sidecar file:
transcript-local.json- local on-device transcriptiontranscript-soniox.json- Soniox cloud transcriptiontranscript.jsonis migrated totranscript-soniox.jsonon first loadSpeaker names are stored per-provider in
TranscriptDocument.speakers, independent of metadata. Renaming a speaker on one tab does not affect the other.Smoke test checklist
transcript.json- verify they appear under Soniox tab