Skip to content

v0.2.0#3

Draft
chris-colinsky wants to merge 1 commit intomainfrom
release/v0.2.0
Draft

v0.2.0#3
chris-colinsky wants to merge 1 commit intomainfrom
release/v0.2.0

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

Summary

This PR implements the Universal Alignment Strategy, decoupling the Transcription Engine (ASR) from the Word-Level Aligner. It transitions Audio Refinery to a modular architecture where ASR backends are hot-swappable while maintaining precise millisecond-level word timestamps via torchaudio MMS-FA.

To support this shift on various hardware, this PR introduces a Hybrid VRAM Lifecycle with co-resident and sequential strategies, a VRAM Budget Preflight system to prevent OOM errors, and an interactive audio-refinery plan command for validating and saving pipeline configurations.

Changes

  • Refactoring: Refactored src/transcriber.py into an ASR-only, backend-agnostic wrapper for faster-whisper.
  • Forced Alignment: Added src/aligner.py using torchaudio.pipelines.MMS_FA with a 30s chunking strategy.
  • Text Normalization: Added src/text_normalizer.py for deterministic, phoneme-safe text expansion (numbers/currency).
  • Speaker Merging: Created src/merger.py to decouple speaker assignment from the transcription logic.
  • VRAM Management: Introduced src/vram_preflight.py and src/model_budgets.toml for upfront resource validation.
  • Hybrid Lifecycle: Implemented co-resident (batch-optimized) and sequential (VRAM-optimized) strategies with automatic fallback.
  • Interactive Planner: Added audio-refinery plan CLI command to visualize VRAM budgets and estimate runtimes.
  • Run Profiles: Added support for saving and replaying validated configurations via --plan <profile.toml>.
  • BREAKING: Removed words field from TranscriptionResult; all word-level data now resides in the new AlignmentResult.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing behavior to change)
  • Documentation update

Testing

  • Unit tests pass: make test
  • All checks pass: make all-checks
  • New tests added for new functionality
  • Integration tests verified (if applicable — requires GPU)

Checklist

  • Code follows the project style (ruff lint + format clean)
  • CHANGELOG.md updated under [Unreleased]
  • Documentation updated if CLI options or behavior changed
  • No new dependency pins added without explanation

Related Issues

Closes #

@chris-colinsky chris-colinsky changed the title v0.2.0 specs v0.2.0 Apr 16, 2026
@chris-colinsky chris-colinsky self-assigned this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant