feat: add language parameter to Whisper transcription API#691
feat: add language parameter to Whisper transcription API#691data-bot-coasys wants to merge 7 commits intodevfrom
Conversation
Add optional language parameter to the transcription stream API, allowing callers to specify the spoken language instead of relying on Whisper's auto-detection (which often fails for non-English languages like German). Changes: - ai_service/mod.rs: Add language param, include in model cache key, pass WhisperLanguage to WhisperBuilder - mutation_resolvers.rs: Thread language param through GraphQL mutation - AIResolver.ts: Add language arg to schema - AIClient.ts: Add language param to client method and GraphQL query
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAn optional Changes
Sequence DiagramsequenceDiagram
participant Client as TypeScript Client
participant Resolver as GraphQL Resolver
participant Service as AI Service
participant Whisper as Whisper Model
Client->>Resolver: openTranscriptionStream(modelId, params, language?)
Resolver->>Service: open_transcription_stream(modelId, params, language?)
Service->>Service: parse language -> WhisperLanguage or error
Service->>Service: derive lang_key (or "auto") and include in cache key
alt language provided
Service->>Whisper: build pipeline with language
else
Service->>Whisper: build pipeline with auto
end
Whisper-->>Service: configured pipeline
Service-->>Resolver: transcription stream endpoint / id
Resolver-->>Client: transcription stream ID / results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1135-1147: The cache key currently uses the raw user input
(lang_key) which can produce duplicate entries; instead parse the input into the
WhisperLanguage enum (using lang.parse::<WhisperLanguage>() as already done),
then use whisper_lang.to_string() as the canonical lowercase cache key when
constructing the cache lookup and storage; update references around
WhisperLanguage, lang.parse, WhisperBuilder::with_language, and the lang_key
usage so the cache uses whisper_lang.to_string() before calling
builder.build().await? to ensure a single canonical key per language.
Address CodeRabbit review: use parsed WhisperLanguage.to_string() for the model cache key instead of raw user input, preventing duplicate cache entries for equivalent language codes.
HexaField
left a comment
There was a problem hiding this comment.
LGTM — clean threading of the optional language parameter through the full stack. Cache key separation by language is the right call since WhisperLanguage is set at build time. Backward compatible, well-documented PR description. 👍
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rust-executor/src/ai_service/mod.rs (1)
1123-1174:⚠️ Potential issue | 🟠 MajorTokio mutex held for the entire duration of model download/build.
shared_whisper_models.lock().awaitis acquired at line 1124 and is held acrossbuilder.build().await?(line 1156), which performs model download and GPU initialization — a potentially minutes-long operation. Because allopen_transcription_streamcalls share onetokio::sync::Mutex, concurrent callers for any model size or language are fully serialized behind this single lock until the initial load completes.The standard fix is a double-checked pattern: check the cache under a brief lock, release the lock, build outside it, then re-acquire and insert with
entry().or_insert_with(...)to tolerate a concurrent build:🔧 Proposed double-checked locking fix
- let whisper_model = { - let mut shared_models = self.shared_whisper_models.lock().await; - // Parse language upfront and use canonical form for cache key - let whisper_lang = if let Some(ref lang) = language { - Some( - lang.parse::<WhisperLanguage>() - .map_err(|_| anyhow::anyhow!("Unsupported whisper language: {}", lang))?, - ) - } else { - None - }; - let lang_key = whisper_lang - .as_ref() - .map(|l| l.to_string()) - .unwrap_or_else(|| "auto".to_string()); - let model_key = format!("{:?}_{}", model_size, lang_key); - - if !shared_models.contains_key(&model_key) { - log::info!( - "Loading shared Whisper model {} ({:?}, language: {}) ...", - model_id, model_size, lang_key - ); - let mut builder = WhisperBuilder::default() - .with_source(model_size) - .with_device(Self::new_candle_device()); - - if let Some(whisper_lang) = whisper_lang { - builder = builder.with_language(Some(whisper_lang)); - } - - let model = builder.build().await?; - log::info!( - "Shared Whisper model {:?} (language: {}) loaded! ...", - model_size, lang_key - ); - shared_models.insert(model_key.clone(), Arc::new(model)); - } else { - log::info!( - "Reusing existing shared Whisper model {:?} (language: {}) for new stream", - model_size, lang_key - ); - } - - // Clone the Arc - this is CHEAP! Just increments a reference count - shared_models.get(&model_key).unwrap().clone() - }; + // Parse language and compute cache key *before* touching the lock. + let whisper_lang = if let Some(ref lang) = language { + Some( + lang.parse::<WhisperLanguage>() + .map_err(|_| anyhow::anyhow!("Unsupported whisper language: {}", lang))?, + ) + } else { + None + }; + let lang_key = whisper_lang + .as_ref() + .map(|l| l.to_string()) + .unwrap_or_else(|| "auto".to_string()); + let model_key = format!("{:?}_{}", model_size, lang_key); + + // Fast path: return cached model without building. + let whisper_model = { + let shared_models = self.shared_whisper_models.lock().await; + shared_models.get(&model_key).cloned() + }; + + let whisper_model = match whisper_model { + Some(m) => { + log::info!( + "Reusing existing shared Whisper model {:?} (language: {}) for new stream", + model_size, lang_key + ); + m + } + None => { + // Build outside the lock – may race with another caller for the + // same key, but `or_insert_with` below ensures only one Arc is + // stored; the duplicate is simply dropped. + log::info!( + "Loading shared Whisper model {} ({:?}, language: {}) ...", + model_id, model_size, lang_key + ); + let mut builder = WhisperBuilder::default() + .with_source(model_size) + .with_device(Self::new_candle_device()); + if let Some(wl) = whisper_lang { + builder = builder.with_language(Some(wl)); + } + let new_model = Arc::new(builder.build().await?); + log::info!( + "Shared Whisper model {:?} (language: {}) loaded!", + model_size, lang_key + ); + let mut shared_models = self.shared_whisper_models.lock().await; + shared_models + .entry(model_key.clone()) + .or_insert(new_model) + .clone() + } + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust-executor/src/ai_service/mod.rs` around lines 1123 - 1174, The code holds the tokio Mutex shared_whisper_models across an await (builder.build().await?) which serializes all open_transcription_stream callers; fix by applying a double-checked locking pattern: acquire the lock briefly to compute whisper_lang, lang_key and model_key and check/shared_models.contains_key(&model_key) — if present clone and return; otherwise drop the lock, perform the long-running builder.build().await outside the mutex, then re-acquire the mutex and insert the Arc-wrapped model only if still absent (e.g., using entry/or_insert or contains_key check) and finally clone the Arc to return; ensure you reference shared_whisper_models, model_key, WhisperBuilder, and builder.build().await in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1126-1129: The parsing of WhisperLanguage in the whisper_lang
construction (using lang.parse::<WhisperLanguage>()) is case-sensitive and fails
for uppercase codes like "DE"; normalize the input by converting the referenced
language string (language / lang) to lowercase (e.g., via to_ascii_lowercase()
or to_lowercase()) before calling parse::<WhisperLanguage>(), and keep the same
map_err path (you may still include the original lang string in the error
message if desired).
---
Outside diff comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1123-1174: The code holds the tokio Mutex shared_whisper_models
across an await (builder.build().await?) which serializes all
open_transcription_stream callers; fix by applying a double-checked locking
pattern: acquire the lock briefly to compute whisper_lang, lang_key and
model_key and check/shared_models.contains_key(&model_key) — if present clone
and return; otherwise drop the lock, perform the long-running
builder.build().await outside the mutex, then re-acquire the mutex and insert
the Arc-wrapped model only if still absent (e.g., using entry/or_insert or
contains_key check) and finally clone the Arc to return; ensure you reference
shared_whisper_models, model_key, WhisperBuilder, and builder.build().await in
your changes.
Case-insensitive handling so 'DE', 'De', 'de' all work. Addresses CodeRabbit review comment.
…across await - Language string is already lowercased before parse (handles uppercase codes like 'DE') - Refactored mutex usage: check cache briefly, drop lock, build model outside mutex, re-acquire and insert with entry().or_insert() for safe double-checked locking
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
rust-executor/src/ai_service/mod.rs (2)
57-61:⚠️ Potential issue | 🟡 MinorUpdate stale
shared_whisper_modelsfield comment — key schema changed.The doc comment still describes the key as
WhisperSource (Tiny/Small/Medium/Large)but after this PR the key is"{size:?}_{lang_key}"(e.g."Small_auto","Small_de").📝 Proposed fix
- /// Shared Whisper models - ONE model per size, shared across ALL streams using that size - /// Key = WhisperSource (Tiny/Small/Medium/Large), Value = Arc<Whisper> - /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once - /// Saves 500MB-1.5GB per stream! + /// Shared Whisper models - ONE model per (size, language) config, shared across all streams with that config. + /// Key = "{WhisperSource:?}_{lang_key}" (e.g. "Small_auto", "Small_de"), Value = Arc<Whisper> + /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once. + /// Saves 500MB-1.5GB per stream!🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust-executor/src/ai_service/mod.rs` around lines 57 - 61, The field comment for shared_whisper_models is stale about the key format; update the doc comment on shared_whisper_models to reflect the new key schema which is "{size:?}_{lang_key}" (examples: "Small_auto", "Small_de") instead of WhisperSource names, and note that the value remains Arc<Whisper> and models are shared across streams; keep mention of Arc/Mutex and memory savings but replace the old key description with the new "{size:?}_{lang_key}" pattern.
57-61:⚠️ Potential issue | 🟡 MinorUpdate stale field comment — cache key now includes language.
The doc comment still describes the key as
WhisperSource (Tiny/Small/Medium/Large)but after this PR the key is"{:?}_{lang_key}"(e.g.,"Small_de"or"Small_auto").📝 Proposed fix
- /// Shared Whisper models - ONE model per size, shared across ALL streams using that size - /// Key = WhisperSource (Tiny/Small/Medium/Large), Value = Arc<Whisper> - /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once - /// Saves 500MB-1.5GB per stream! + /// Shared Whisper models - ONE model per (size, language) config, shared across all streams with that config. + /// Key = "{WhisperSource:?}_{lang_key}" (e.g. "Small_auto", "Small_de"), Value = Arc<Whisper> + /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once. + /// Saves 500MB-1.5GB per stream! shared_whisper_models: Arc<Mutex<HashMap<String, Arc<Whisper>>>>,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust-executor/src/ai_service/mod.rs` around lines 57 - 61, The doc comment for shared_whisper_models is stale about the cache key; update the comment above shared_whisper_models (type Arc<Mutex<HashMap<String, Arc<Whisper>>>>) to state that the HashMap key includes the model size and language (format "{:?}_{lang_key}", e.g., "Small_de" or "Small_auto") rather than just WhisperSource, and mention that the map holds one Arc<Whisper> per size+language combination shared across streams.
🧹 Nitpick comments (2)
rust-executor/src/ai_service/mod.rs (2)
1166-1168: Simplifywith_languagecall by passingOptiondirectly.The
WhisperBuilder::with_languagemethod acceptsOption<WhisperLanguage>directly. Remove theif letunwrap/re-wrap pattern and passwhisper_langas-is.♻️ Proposed simplification
- if let Some(whisper_lang) = whisper_lang { - builder = builder.with_language(Some(whisper_lang)); - } + builder = builder.with_language(whisper_lang);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust-executor/src/ai_service/mod.rs` around lines 1166 - 1168, The code currently unwraps and re-wraps whisper_lang before calling WhisperBuilder::with_language; instead, remove the if let block and pass the Option directly to with_language (i.e., call with_language(whisper_lang)) on the builder variable so the Option<WhisperLanguage> is used as intended by WhisperBuilder::with_language.
1166-1168: Optional: simplify thewith_languagecall — unwrapping and re-wrapping theOptionis redundant.
WhisperBuilder::with_languageacceptsOption<WhisperLanguage>directly. The current code unnecessarily unwraps and re-wraps the option:♻️ Proposed simplification
- if let Some(whisper_lang) = whisper_lang { - builder = builder.with_language(Some(whisper_lang)); - } + builder = builder.with_language(whisper_lang);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust-executor/src/ai_service/mod.rs` around lines 1166 - 1168, The code redundantly unwraps and re-wraps the Option before calling with_language; instead of the if-let block around whisper_lang, call builder.with_language(whisper_lang) directly (using the existing whisper_lang variable) so WhisperBuilder::with_language receives the Option<WhisperLanguage> as intended and the conditional can be removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 57-61: The field comment for shared_whisper_models is stale about
the key format; update the doc comment on shared_whisper_models to reflect the
new key schema which is "{size:?}_{lang_key}" (examples: "Small_auto",
"Small_de") instead of WhisperSource names, and note that the value remains
Arc<Whisper> and models are shared across streams; keep mention of Arc/Mutex and
memory savings but replace the old key description with the new
"{size:?}_{lang_key}" pattern.
- Around line 57-61: The doc comment for shared_whisper_models is stale about
the cache key; update the comment above shared_whisper_models (type
Arc<Mutex<HashMap<String, Arc<Whisper>>>>) to state that the HashMap key
includes the model size and language (format "{:?}_{lang_key}", e.g., "Small_de"
or "Small_auto") rather than just WhisperSource, and mention that the map holds
one Arc<Whisper> per size+language combination shared across streams.
---
Nitpick comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1166-1168: The code currently unwraps and re-wraps whisper_lang
before calling WhisperBuilder::with_language; instead, remove the if let block
and pass the Option directly to with_language (i.e., call
with_language(whisper_lang)) on the builder variable so the
Option<WhisperLanguage> is used as intended by WhisperBuilder::with_language.
- Around line 1166-1168: The code redundantly unwraps and re-wraps the Option
before calling with_language; instead of the if-let block around whisper_lang,
call builder.with_language(whisper_lang) directly (using the existing
whisper_lang variable) so WhisperBuilder::with_language receives the
Option<WhisperLanguage> as intended and the conditional can be removed.
…ify with_language call
- Update shared_whisper_models doc comment to reflect actual key format: {size:?}_{lang_key}
- Simplify with_language call by passing Option directly instead of unwrap/rewrap
Summary
Add an optional
languageparameter to the Whisper transcription stream API, allowing callers to specify the spoken language instead of relying on auto-detection.Motivation
Whisper's automatic language detection often fails for non-English languages, particularly German. By allowing the caller to specify the language explicitly, transcription accuracy improves significantly.
Changes
rust-executor/src/ai_service/mod.rs: Addedlanguage: Option<String>parameter toopen_transcription_stream(). Language is included in the model cache key (sinceWhisperLanguageis set at build time) and passed toWhisperBuilder::with_language().rust-executor/src/graphql/mutation_resolvers.rs: Addedlanguage: Option<String>parameter to the GraphQL mutation resolver.core/src/ai/AIResolver.ts: Added optionallanguage: Stringargument to theaiOpenTranscriptionStreammutation schema.core/src/ai/AIClient.ts: Addedlanguage?: stringparameter toopenTranscriptionStream()and included it in the GraphQL query.Supported Languages
Kalosm's Whisper supports:
en(English),de(German),fr(French),es(Spanish),it(Italian),ja(Japanese),ko(Korean),zh(Chinese),pt(Portuguese),nl(Dutch),ru(Russian),pl(Polish),sv(Swedish),ar(Arabic),tr(Turkish),hi(Hindi), and many more.Usage
Notes
Summary by CodeRabbit
New Features
Chores