Skip to content

feat: add language parameter to Whisper transcription API#691

Open
data-bot-coasys wants to merge 7 commits intodevfrom
feature/whisper-language-param
Open

feat: add language parameter to Whisper transcription API#691
data-bot-coasys wants to merge 7 commits intodevfrom
feature/whisper-language-param

Conversation

@data-bot-coasys
Copy link
Contributor

@data-bot-coasys data-bot-coasys commented Feb 22, 2026

Summary

Add an optional language parameter to the Whisper transcription stream API, allowing callers to specify the spoken language instead of relying on auto-detection.

Motivation

Whisper's automatic language detection often fails for non-English languages, particularly German. By allowing the caller to specify the language explicitly, transcription accuracy improves significantly.

Changes

  • rust-executor/src/ai_service/mod.rs: Added language: Option<String> parameter to open_transcription_stream(). Language is included in the model cache key (since WhisperLanguage is set at build time) and passed to WhisperBuilder::with_language().
  • rust-executor/src/graphql/mutation_resolvers.rs: Added language: Option<String> parameter to the GraphQL mutation resolver.
  • core/src/ai/AIResolver.ts: Added optional language: String argument to the aiOpenTranscriptionStream mutation schema.
  • core/src/ai/AIClient.ts: Added language?: string parameter to openTranscriptionStream() and included it in the GraphQL query.

Supported Languages

Kalosm's Whisper supports: en (English), de (German), fr (French), es (Spanish), it (Italian), ja (Japanese), ko (Korean), zh (Chinese), pt (Portuguese), nl (Dutch), ru (Russian), pl (Polish), sv (Swedish), ar (Arabic), tr (Turkish), hi (Hindi), and many more.

Usage

// Auto-detect (existing behavior)
await ai.openTranscriptionStream(modelId, callback);

// Specify German
await ai.openTranscriptionStream(modelId, callback, undefined, 'de');

Notes

  • Fully backward compatible — the parameter is optional and defaults to auto-detection
  • Models with different language settings are cached separately to avoid conflicts

Summary by CodeRabbit

  • New Features

    • Transcription streams now accept an optional language selection to improve speech-to-text accuracy; auto-detection remains the default.
    • Selected language is consistently applied across transcription processing and surfaced in logs and messages to clarify language context.
  • Chores

    • Updated version control ignore rules for repository housekeeping.

Add optional language parameter to the transcription stream API,
allowing callers to specify the spoken language instead of relying
on Whisper's auto-detection (which often fails for non-English
languages like German).

Changes:
- ai_service/mod.rs: Add language param, include in model cache key,
  pass WhisperLanguage to WhisperBuilder
- mutation_resolvers.rs: Thread language param through GraphQL mutation
- AIResolver.ts: Add language arg to schema
- AIClient.ts: Add language param to client method and GraphQL query
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

An optional language parameter is added to the transcription stream API and propagated from the TypeScript client through the GraphQL resolver into the Rust AI service, where it is parsed, used for Whisper pipeline configuration, included in the model cache key, and logged.

Changes

Cohort / File(s) Summary
TypeScript Client
core/src/ai/AIClient.ts
Added optional language?: string to openTranscriptionStream; GraphQL mutation declaration and variables include $language / language.
GraphQL Resolver (TypeScript)
core/src/ai/AIResolver.ts
Resolver aiOpenTranscriptionStream signature extended with @Arg("language", { nullable: true }) language?: string.
Rust AI Service
rust-executor/src/ai_service/mod.rs
open_transcription_stream gains language: Option<String>; parses to WhisperLanguage (errors on unsupported), derives lang_key (defaults to "auto"), includes language in model cache key, propagates language to pipeline builder, and updates logging.
Rust GraphQL Binding
rust-executor/src/graphql/mutation_resolvers.rs
Mutation resolver signature updated to accept language: Option<String> and pass it to AIService::open_transcription_stream.
Repo config
.gitignore
Added .worktrees/ ignore entry.

Sequence Diagram

sequenceDiagram
    participant Client as TypeScript Client
    participant Resolver as GraphQL Resolver
    participant Service as AI Service
    participant Whisper as Whisper Model

    Client->>Resolver: openTranscriptionStream(modelId, params, language?)
    Resolver->>Service: open_transcription_stream(modelId, params, language?)
    Service->>Service: parse language -> WhisperLanguage or error
    Service->>Service: derive lang_key (or "auto") and include in cache key
    alt language provided
        Service->>Whisper: build pipeline with language
    else
        Service->>Whisper: build pipeline with auto
    end
    Whisper-->>Service: configured pipeline
    Service-->>Resolver: transcription stream endpoint / id
    Resolver-->>Client: transcription stream ID / results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped from TS to Rust with glee,
A tiny language tag followed me,
Whisper tuned its ears to say,
Models cached by tongue and day,
Streams now hum in harmony 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding a language parameter to the Whisper transcription API, which is the core objective of all code modifications across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/whisper-language-param

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@data-bot-coasys
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1135-1147: The cache key currently uses the raw user input
(lang_key) which can produce duplicate entries; instead parse the input into the
WhisperLanguage enum (using lang.parse::<WhisperLanguage>() as already done),
then use whisper_lang.to_string() as the canonical lowercase cache key when
constructing the cache lookup and storage; update references around
WhisperLanguage, lang.parse, WhisperBuilder::with_language, and the lang_key
usage so the cache uses whisper_lang.to_string() before calling
builder.build().await? to ensure a single canonical key per language.

Address CodeRabbit review: use parsed WhisperLanguage.to_string()
for the model cache key instead of raw user input, preventing
duplicate cache entries for equivalent language codes.
HexaField
HexaField previously approved these changes Feb 22, 2026
Copy link
Contributor

@HexaField HexaField left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean threading of the optional language parameter through the full stack. Cache key separation by language is the right call since WhisperLanguage is set at build time. Backward compatible, well-documented PR description. 👍

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rust-executor/src/ai_service/mod.rs (1)

1123-1174: ⚠️ Potential issue | 🟠 Major

Tokio mutex held for the entire duration of model download/build.

shared_whisper_models.lock().await is acquired at line 1124 and is held across builder.build().await? (line 1156), which performs model download and GPU initialization — a potentially minutes-long operation. Because all open_transcription_stream calls share one tokio::sync::Mutex, concurrent callers for any model size or language are fully serialized behind this single lock until the initial load completes.

The standard fix is a double-checked pattern: check the cache under a brief lock, release the lock, build outside it, then re-acquire and insert with entry().or_insert_with(...) to tolerate a concurrent build:

🔧 Proposed double-checked locking fix
-        let whisper_model = {
-            let mut shared_models = self.shared_whisper_models.lock().await;
-            // Parse language upfront and use canonical form for cache key
-            let whisper_lang = if let Some(ref lang) = language {
-                Some(
-                    lang.parse::<WhisperLanguage>()
-                        .map_err(|_| anyhow::anyhow!("Unsupported whisper language: {}", lang))?,
-                )
-            } else {
-                None
-            };
-            let lang_key = whisper_lang
-                .as_ref()
-                .map(|l| l.to_string())
-                .unwrap_or_else(|| "auto".to_string());
-            let model_key = format!("{:?}_{}", model_size, lang_key);
-
-            if !shared_models.contains_key(&model_key) {
-                log::info!(
-                    "Loading shared Whisper model {} ({:?}, language: {}) ...",
-                    model_id, model_size, lang_key
-                );
-                let mut builder = WhisperBuilder::default()
-                    .with_source(model_size)
-                    .with_device(Self::new_candle_device());
-
-                if let Some(whisper_lang) = whisper_lang {
-                    builder = builder.with_language(Some(whisper_lang));
-                }
-
-                let model = builder.build().await?;
-                log::info!(
-                    "Shared Whisper model {:?} (language: {}) loaded! ...",
-                    model_size, lang_key
-                );
-                shared_models.insert(model_key.clone(), Arc::new(model));
-            } else {
-                log::info!(
-                    "Reusing existing shared Whisper model {:?} (language: {}) for new stream",
-                    model_size, lang_key
-                );
-            }
-
-            // Clone the Arc - this is CHEAP! Just increments a reference count
-            shared_models.get(&model_key).unwrap().clone()
-        };
+        // Parse language and compute cache key *before* touching the lock.
+        let whisper_lang = if let Some(ref lang) = language {
+            Some(
+                lang.parse::<WhisperLanguage>()
+                    .map_err(|_| anyhow::anyhow!("Unsupported whisper language: {}", lang))?,
+            )
+        } else {
+            None
+        };
+        let lang_key = whisper_lang
+            .as_ref()
+            .map(|l| l.to_string())
+            .unwrap_or_else(|| "auto".to_string());
+        let model_key = format!("{:?}_{}", model_size, lang_key);
+
+        // Fast path: return cached model without building.
+        let whisper_model = {
+            let shared_models = self.shared_whisper_models.lock().await;
+            shared_models.get(&model_key).cloned()
+        };
+
+        let whisper_model = match whisper_model {
+            Some(m) => {
+                log::info!(
+                    "Reusing existing shared Whisper model {:?} (language: {}) for new stream",
+                    model_size, lang_key
+                );
+                m
+            }
+            None => {
+                // Build outside the lock – may race with another caller for the
+                // same key, but `or_insert_with` below ensures only one Arc is
+                // stored; the duplicate is simply dropped.
+                log::info!(
+                    "Loading shared Whisper model {} ({:?}, language: {}) ...",
+                    model_id, model_size, lang_key
+                );
+                let mut builder = WhisperBuilder::default()
+                    .with_source(model_size)
+                    .with_device(Self::new_candle_device());
+                if let Some(wl) = whisper_lang {
+                    builder = builder.with_language(Some(wl));
+                }
+                let new_model = Arc::new(builder.build().await?);
+                log::info!(
+                    "Shared Whisper model {:?} (language: {}) loaded!",
+                    model_size, lang_key
+                );
+                let mut shared_models = self.shared_whisper_models.lock().await;
+                shared_models
+                    .entry(model_key.clone())
+                    .or_insert(new_model)
+                    .clone()
+            }
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/ai_service/mod.rs` around lines 1123 - 1174, The code holds
the tokio Mutex shared_whisper_models across an await (builder.build().await?)
which serializes all open_transcription_stream callers; fix by applying a
double-checked locking pattern: acquire the lock briefly to compute
whisper_lang, lang_key and model_key and
check/shared_models.contains_key(&model_key) — if present clone and return;
otherwise drop the lock, perform the long-running builder.build().await outside
the mutex, then re-acquire the mutex and insert the Arc-wrapped model only if
still absent (e.g., using entry/or_insert or contains_key check) and finally
clone the Arc to return; ensure you reference shared_whisper_models, model_key,
WhisperBuilder, and builder.build().await in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1126-1129: The parsing of WhisperLanguage in the whisper_lang
construction (using lang.parse::<WhisperLanguage>()) is case-sensitive and fails
for uppercase codes like "DE"; normalize the input by converting the referenced
language string (language / lang) to lowercase (e.g., via to_ascii_lowercase()
or to_lowercase()) before calling parse::<WhisperLanguage>(), and keep the same
map_err path (you may still include the original lang string in the error
message if desired).

---

Outside diff comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1123-1174: The code holds the tokio Mutex shared_whisper_models
across an await (builder.build().await?) which serializes all
open_transcription_stream callers; fix by applying a double-checked locking
pattern: acquire the lock briefly to compute whisper_lang, lang_key and
model_key and check/shared_models.contains_key(&model_key) — if present clone
and return; otherwise drop the lock, perform the long-running
builder.build().await outside the mutex, then re-acquire the mutex and insert
the Arc-wrapped model only if still absent (e.g., using entry/or_insert or
contains_key check) and finally clone the Arc to return; ensure you reference
shared_whisper_models, model_key, WhisperBuilder, and builder.build().await in
your changes.

Case-insensitive handling so 'DE', 'De', 'de' all work.
Addresses CodeRabbit review comment.
…across await

- Language string is already lowercased before parse (handles uppercase codes like 'DE')
- Refactored mutex usage: check cache briefly, drop lock, build model outside mutex,
  re-acquire and insert with entry().or_insert() for safe double-checked locking
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
rust-executor/src/ai_service/mod.rs (2)

57-61: ⚠️ Potential issue | 🟡 Minor

Update stale shared_whisper_models field comment — key schema changed.

The doc comment still describes the key as WhisperSource (Tiny/Small/Medium/Large) but after this PR the key is "{size:?}_{lang_key}" (e.g. "Small_auto", "Small_de").

📝 Proposed fix
-    /// Shared Whisper models - ONE model per size, shared across ALL streams using that size
-    /// Key = WhisperSource (Tiny/Small/Medium/Large), Value = Arc<Whisper>
-    /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once
-    /// Saves 500MB-1.5GB per stream!
+    /// Shared Whisper models - ONE model per (size, language) config, shared across all streams with that config.
+    /// Key = "{WhisperSource:?}_{lang_key}" (e.g. "Small_auto", "Small_de"), Value = Arc<Whisper>
+    /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once.
+    /// Saves 500MB-1.5GB per stream!
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/ai_service/mod.rs` around lines 57 - 61, The field comment
for shared_whisper_models is stale about the key format; update the doc comment
on shared_whisper_models to reflect the new key schema which is
"{size:?}_{lang_key}" (examples: "Small_auto", "Small_de") instead of
WhisperSource names, and note that the value remains Arc<Whisper> and models are
shared across streams; keep mention of Arc/Mutex and memory savings but replace
the old key description with the new "{size:?}_{lang_key}" pattern.

57-61: ⚠️ Potential issue | 🟡 Minor

Update stale field comment — cache key now includes language.

The doc comment still describes the key as WhisperSource (Tiny/Small/Medium/Large) but after this PR the key is "{:?}_{lang_key}" (e.g., "Small_de" or "Small_auto").

📝 Proposed fix
-    /// Shared Whisper models - ONE model per size, shared across ALL streams using that size
-    /// Key = WhisperSource (Tiny/Small/Medium/Large), Value = Arc<Whisper>
-    /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once
-    /// Saves 500MB-1.5GB per stream!
+    /// Shared Whisper models - ONE model per (size, language) config, shared across all streams with that config.
+    /// Key = "{WhisperSource:?}_{lang_key}" (e.g. "Small_auto", "Small_de"), Value = Arc<Whisper>
+    /// Cloning Arc is cheap (just increments ref count), model weights stay in memory once.
+    /// Saves 500MB-1.5GB per stream!
    shared_whisper_models: Arc<Mutex<HashMap<String, Arc<Whisper>>>>,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/ai_service/mod.rs` around lines 57 - 61, The doc comment
for shared_whisper_models is stale about the cache key; update the comment above
shared_whisper_models (type Arc<Mutex<HashMap<String, Arc<Whisper>>>>) to state
that the HashMap key includes the model size and language (format
"{:?}_{lang_key}", e.g., "Small_de" or "Small_auto") rather than just
WhisperSource, and mention that the map holds one Arc<Whisper> per size+language
combination shared across streams.
🧹 Nitpick comments (2)
rust-executor/src/ai_service/mod.rs (2)

1166-1168: Simplify with_language call by passing Option directly.

The WhisperBuilder::with_language method accepts Option<WhisperLanguage> directly. Remove the if let unwrap/re-wrap pattern and pass whisper_lang as-is.

♻️ Proposed simplification
-                if let Some(whisper_lang) = whisper_lang {
-                    builder = builder.with_language(Some(whisper_lang));
-                }
+                builder = builder.with_language(whisper_lang);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/ai_service/mod.rs` around lines 1166 - 1168, The code
currently unwraps and re-wraps whisper_lang before calling
WhisperBuilder::with_language; instead, remove the if let block and pass the
Option directly to with_language (i.e., call with_language(whisper_lang)) on the
builder variable so the Option<WhisperLanguage> is used as intended by
WhisperBuilder::with_language.

1166-1168: Optional: simplify the with_language call — unwrapping and re-wrapping the Option is redundant.

WhisperBuilder::with_language accepts Option<WhisperLanguage> directly. The current code unnecessarily unwraps and re-wraps the option:

♻️ Proposed simplification
-                if let Some(whisper_lang) = whisper_lang {
-                    builder = builder.with_language(Some(whisper_lang));
-                }
+                builder = builder.with_language(whisper_lang);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rust-executor/src/ai_service/mod.rs` around lines 1166 - 1168, The code
redundantly unwraps and re-wraps the Option before calling with_language;
instead of the if-let block around whisper_lang, call
builder.with_language(whisper_lang) directly (using the existing whisper_lang
variable) so WhisperBuilder::with_language receives the Option<WhisperLanguage>
as intended and the conditional can be removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 57-61: The field comment for shared_whisper_models is stale about
the key format; update the doc comment on shared_whisper_models to reflect the
new key schema which is "{size:?}_{lang_key}" (examples: "Small_auto",
"Small_de") instead of WhisperSource names, and note that the value remains
Arc<Whisper> and models are shared across streams; keep mention of Arc/Mutex and
memory savings but replace the old key description with the new
"{size:?}_{lang_key}" pattern.
- Around line 57-61: The doc comment for shared_whisper_models is stale about
the cache key; update the comment above shared_whisper_models (type
Arc<Mutex<HashMap<String, Arc<Whisper>>>>) to state that the HashMap key
includes the model size and language (format "{:?}_{lang_key}", e.g., "Small_de"
or "Small_auto") rather than just WhisperSource, and mention that the map holds
one Arc<Whisper> per size+language combination shared across streams.

---

Nitpick comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1166-1168: The code currently unwraps and re-wraps whisper_lang
before calling WhisperBuilder::with_language; instead, remove the if let block
and pass the Option directly to with_language (i.e., call
with_language(whisper_lang)) on the builder variable so the
Option<WhisperLanguage> is used as intended by WhisperBuilder::with_language.
- Around line 1166-1168: The code redundantly unwraps and re-wraps the Option
before calling with_language; instead of the if-let block around whisper_lang,
call builder.with_language(whisper_lang) directly (using the existing
whisper_lang variable) so WhisperBuilder::with_language receives the
Option<WhisperLanguage> as intended and the conditional can be removed.

…ify with_language call

- Update shared_whisper_models doc comment to reflect actual key format: {size:?}_{lang_key}
- Simplify with_language call by passing Option directly instead of unwrap/rewrap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants