feat(transcription): add on-device local transcription via FluidAudio by tenequm · Pull Request #3 · tenequm/blackbox

tenequm · 2026-03-18T11:55:13Z

Summary

Add on-device speech recognition and speaker diarization using FluidAudio (Parakeet TDT v3 ASR + offline diarization). Users get speaker-attributed transcripts without any API key, account, or internet connection. Soniox cloud transcription remains available as an alternative.

What changed

New file

LocalTranscriptionService.swift - FluidAudio integration: dual-track extraction via AVAssetReader, ASR on both tracks, diarization on both tracks, temporal overlap matching for speaker assignment, merge with sequential integer speaker IDs

Modified files

Package.swift - Added FluidAudio v0.12.4 dependency
TranscriptionService.swift - TranscriptionProvider enum (.local, .soniox), per-provider sidecarURL/load/save, speakers field on TranscriptDocument, legacy transcript.json migration with fallback
MainWindowView.swift - Segmented tab switcher between Local/Soniox, per-provider transcript and status state, split transcription methods, speaker name resolution (transcript.speakers -> metadata.speakers -> default)
SettingsView.swift - Default provider picker, model download button with progress
AudioMonitor.swift - Auto-transcription hook after recording stops (fires when local provider is default and models are downloaded)

Architecture

Recording finishes
  -> auto-transcribe (if enabled + models ready)
  -> LocalTranscriptionService:
      1. Extract Track 0 (system) + Track 1 (mic) from M4A
         (prefers audio-processed.m4a when AEC ran)
      2. AVAssetReader per track -> 16kHz mono Float32
      3. ASR each track with AsrManager (Parakeet TDT v3)
      4. Diarize each track with OfflineDiarizerManager
      5. Assign speakers to ASR segments via temporal overlap
      6. Merge: mic speakers first ("You"), then remote ("Speaker N")
      7. Save as transcript-local.json
  -> Single-track fallback if only 1 track exists

What does NOT change

Soniox API client (zero modifications)
Recording pipeline (AudioRecorder, SCStream, AVAudioEngine)
AEC post-processing
Playback, waveform, track switcher
Transcript renderer (provider-agnostic, takes any TranscriptDocument)
Transcript export
Onboarding, HUD

Per-provider transcript storage

Each provider writes its own sidecar file:

transcript-local.json - local on-device transcription
transcript-soniox.json - Soniox cloud transcription
Legacy transcript.json is migrated to transcript-soniox.json on first load

Speaker names are stored per-provider in TranscriptDocument.speakers, independent of metadata. Renaming a speaker on one tab does not affect the other.

Smoke test checklist

Open app with existing recordings that have transcript.json - verify they appear under Soniox tab
Local tab: click Transcribe on a recording - verify model download + diarized transcript
Soniox tab: enter API key, Transcribe - verify works exactly as before
Switch between Local/Soniox tabs - verify each shows its own transcript
Rename a speaker on Local tab - verify Soniox tab speakers are unaffected
Settings: change default provider, download models
Record a call with local as default + models ready - verify auto-transcription
Single-track recording - verify transcription still works
Cancel mid-transcription - verify clean state

Add on-device speech recognition and speaker diarization using FluidAudio (Parakeet TDT v3 ASR + offline diarization). Users get speaker-attributed transcripts without any API key, account, or internet connection. Architecture: - Dual-track independent processing: system audio (Track 0) and mic (Track 1) are transcribed and diarized separately, then merged with speaker attribution via temporal overlap matching - Tab switcher UI between Local and Soniox providers (like track picker) - Per-provider transcript files (transcript-local.json, transcript-soniox.json) - Per-provider speaker names stored in TranscriptDocument.speakers field - Auto-transcription after recording stops (when models downloaded + local default) - Legacy transcript.json migrated to transcript-soniox.json on first load New file: - LocalTranscriptionService.swift: FluidAudio integration, track extraction, ASR, diarization, temporal overlap matching, speaker merge Modified files: - Package.swift: FluidAudio dependency - TranscriptionService.swift: TranscriptionProvider enum, per-provider storage, speakers field, legacy migration with fallback - MainWindowView.swift: tab switcher, per-provider state, split transcription methods, speaker name resolution from transcript then metadata - SettingsView.swift: default provider picker, model download management - AudioMonitor.swift: auto-transcription hook after recording stops Existing Soniox transcription is completely untouched (zero changes to API client). Recording, playback, AEC, and export pipelines are unmodified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transcription): add on-device local transcription via FluidAudio#3

feat(transcription): add on-device local transcription via FluidAudio#3
tenequm wants to merge 1 commit intomainfrom
feat/local-transcription

tenequm commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tenequm commented Mar 18, 2026

Summary

What changed

New file

Modified files

Architecture

What does NOT change

Per-provider transcript storage

Smoke test checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant