fix(transcribe): auto-fallback to CPU + int8 when CUDA is unavailable#21
Merged
Conversation
Instead of raising ValueError when the requested CUDA device is not present, automatically fall back to CPU and downgrade compute_type from float16 to int8 (float16 is unsupported on CPU). Also indicate whether CPU is forced or a fallback in the model-loading print message.
Follow-up to #19 (cherry-picked) addressing review feedback: - __post_init__ now emits a second warning when compute_type is flipped from float16 to int8 because the device fell back to CPU. Previously the user only saw the device fallback message; the compute_type change was silent. - TranscriptionConfig gains an internal _device_auto_fallback flag set when device is auto-flipped to cpu. _load_whisperx_asr_model reads the flag instead of re-sniffing torch at print time, so the "(forced)" vs "(fallback — no GPU)" annotation is accurate even when the user explicitly passes --device cpu on a no-GPU machine. - Removed dead conditional `fallback = "cpu" if value == "cuda" else "cpu"`. - tests/test_transcribe.py: rewrote the two raise-expecting tests (test_invalid_torch_device_{cuda,mps}_raises) to assert the new fallback behavior, and added three tests covering the compute_type warning, the no-spurious-warning case when compute_type is already int8, and that explicit --device cpu does not set _device_auto_fallback. - CHANGELOG: v0.7.1 entry crediting @fadenb.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #19. Picks up @fadenb's commit (preserved with original authorship via cherry-pick) and adds the follow-ups from review.
Summary
TranscriptionConfig.__post_init__no longer raisesValueErrorwhendevice='cuda'(ortorch_device='cuda'/'mps') is requested but the accelerator is unavailable. It warns and auto-falls back tocpu, downgradingcompute_typefromfloat16→int8when the device flips (float16 is unsupported on CPU).--device cpu) or auto-selected because no GPU was found.Changes vs. #19
compute_typeis downgraded (review ask). Previously the user only saw the device fallback message; the silentfloat16→int8flip now emits its own log line._device_auto_fallbackflag onTranscriptionConfig._load_whisperx_asr_modelreads it instead of re-sniffing torch at print time, so the(forced)vs(fallback — no GPU)annotation is honest when the user explicitly passes--device cpuon a no-GPU machine (previously mislabeled as "fallback").TestTranscriptionConfig:test_invalid_torch_device_cuda_raises— rewritten to assert fallback +int8downgrade + flag set.test_invalid_torch_device_mps_raises— rewritten to asserttorch_deviceflips butdevice/compute_typeare untouched.test_cuda_unavailable_logs_both_warnings— usescaplogto verify both warning lines emit.test_cuda_unavailable_with_int8_does_not_log_compute_type_change— guards against spurious downgrade message whencompute_type=int8already.test_explicit_cpu_is_not_marked_as_auto_fallback— guards the_device_auto_fallbacksemantics that drive the load-line annotation.fallback = "cpu" if value == "cuda" else "cpu"→fallback = "cpu".v0.7.1entry crediting @fadenb.Test plan
pytest tests/test_transcribe.py -k TranscriptionConfig -v— 18/18 pass.ruff check meet/transcribe.py meet/cli.py tests/test_transcribe.py tests/test_utils.py— clean.int8) — preserved.Closes #19.