Skip to content

Conversation

yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Aug 17, 2025

Summary by cubic

Adds session-based start/end timestamps to audio chunks and uses them to set segment start/end during transcription. This fixes inaccurate segment timing and falls back when timestamps aren’t provided.

  • New Features

    • AudioChunk now includes start_timestamp_ms and end_timestamp_ms.
    • SimpleAudioChunk stores these fields; streaming fills them from the session timer.
    • Transcription sets segment.start/end from these timestamps (ms to seconds).
    • VAD test updated to ignore the new fields.
  • Migration

    • If you implement AudioChunk, add start_timestamp_ms and end_timestamp_ms. Return None if not available.

Copy link

coderabbitai bot commented Aug 17, 2025

📝 Walkthrough

Walkthrough

Adds optional per-chunk start/end millisecond timestamps through the streaming and transcription pipeline: extends the AudioChunk trait and SimpleAudioChunk, threads timestamps into process_transcription, annotates produced segments when present, and updates the VAD producer to populate chunk timestamps.

Changes

Cohort / File(s) Summary
Transcription streaming core
crates/whisper-local/src/stream.rs
Added start_timestamp_ms/end_timestamp_ms to AudioChunk trait and SimpleAudioChunk (fields + accessors). Updated process_transcription signature to accept timestamps: (Option<usize>, Option<usize>), propagate timestamps to produced Segments (ms→s) when present, and change transcription error handling to log and terminate the stream (return Poll::Ready(None)). Updated call sites to pass timestamps (rodio: (None,None), metadata path: chunk timestamps).
Producer (VAD streaming)
crates/transcribe-whisper-local/src/service/streaming.rs
process_vad_stream now sets start_timestamp_ms and end_timestamp_ms on constructed SimpleAudioChunk from the input chunk. No control-flow changes.

Sequence Diagram(s)

sequenceDiagram
  participant Producer as Producer (VAD / Rodio)
  participant Stream as Stream Pipeline
  participant Proc as process_transcription
  participant Whisper as Whisper Engine
  participant Down as Downstream Consumer

  Producer->>Stream: AudioChunk(samples, meta, start/end_ts?)
  Stream->>Proc: samples, meta, timestamps (Option, Option)
  Proc->>Whisper: transcribe(samples, meta)
  alt success
    Whisper-->>Proc: segments
    Proc->>Proc: attach start/end (if provided, ms → s)
    Proc-->>Stream: Segment stream (with timestamps when present)
    Stream-->>Down: Segments
  else error
    Whisper-->>Proc: error
    Proc-->>Stream: terminate (Ready(None))
    Stream-->>Down: end of stream
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Filter silence #1200 — Modifies SimpleAudioChunk/AudioChunk and streaming logic; likely touches the same timestamp propagation and producer construction code.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f19f856 and 1c6aa58.

📒 Files selected for processing (2)
  • crates/transcribe-whisper-local/src/service/streaming.rs (1 hunks)
  • crates/whisper-local/src/stream.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/transcribe-whisper-local/src/service/streaming.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/whisper-local/src/stream.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build (macos, x86_64-apple-darwin, macos-latest)
  • GitHub Check: build (macos, aarch64-apple-darwin, macos-latest)
🔇 Additional comments (6)
crates/whisper-local/src/stream.rs (6)

23-25: AudioChunk: addition of start/end timestamp accessors looks good

The API matches the PR intent and keeps timestamps optional for fallback behavior.


31-33: SimpleAudioChunk: timestamp fields align with trait; Default remains valid

Option defaults to None, so the derive(Default) behavior is preserved.


44-50: Accessor implementations are correct and efficient

Returning Option by value leverages Copy semantics; no cloning overhead.


132-132: Rodio path: explicit (None, None) preserves current behavior

Good call to keep timestamps absent for this path.


172-173: Metadata path: correctly plumbs per-chunk timestamps into transcription

Reads from the chunk and forwards as intended.

Also applies to: 179-179


197-197: process_transcription: signature update is coherent with call sites

The tuple parameter keeps the function signature concise without over-structuring.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch timestamp-wip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
crates/whisper-local/src/stream.rs (3)

31-33: Consider using u64 for timestamps (ms) to avoid arch-dependent limits

Using usize ties max timestamp to pointer width. If sessions can exceed ~49 days (on 32-bit) or you want platform consistency, prefer u64.

Example:

-    pub start_timestamp_ms: Option<usize>,
-    pub end_timestamp_ms: Option<usize>,
+    pub start_timestamp_ms: Option<u64>,
+    pub end_timestamp_ms: Option<u64>,

This change would cascade to the trait and call sites.


192-199: Unnecessary lifetime parameter on process_transcription

The explicit 'a isn’t required since no borrows escape the function. Removing it simplifies the signature.

Apply:

-fn process_transcription<'a>(
-    whisper: &'a mut Whisper,
-    samples: &'a [f32],
-    current_segment_task: &'a mut Option<Pin<Box<dyn Stream<Item = Segment> + Send>>>,
+fn process_transcription(
+    whisper: &mut Whisper,
+    samples: &[f32],
+    current_segment_task: &mut Option<Pin<Box<dyn Stream<Item = Segment> + Send>>>,
     meta: Option<serde_json::Value>,
     timestamps: (Option<usize>, Option<usize>),
 ) -> Poll<Option<Segment>> {

211-214: Guard against invalid timestamps (end < start)

If out-of-order timestamps slip in, you’ll create negative durations downstream. Add a simple guard before setting segment times.

-                    if let (Some(start_ms), Some(end_ms)) = timestamps {
-                        segment.start = start_ms as f64 / 1000.0;
-                        segment.end = end_ms as f64 / 1000.0;
-                    }
+                    if let (Some(start_ms), Some(end_ms)) = timestamps {
+                        if end_ms >= start_ms {
+                            segment.start = start_ms as f64 / 1000.0;
+                            segment.end = end_ms as f64 / 1000.0;
+                        }
+                    }
crates/transcribe-whisper-local/src/service/streaming.rs (1)

300-302: Treat zero timestamps as “unknown” to avoid overriding model times with 0.0s

If VAD emits 0 for unknown times, always wrapping with Some(...) forces segments to [0.0, 0.0]. Prefer mapping zeros to None and let downstream defaults apply.

Apply:

-                    start_timestamp_ms: Some(chunk.start_timestamp_ms),
-                    end_timestamp_ms: Some(chunk.end_timestamp_ms),
+                    start_timestamp_ms: (chunk.start_timestamp_ms != 0).then_some(chunk.start_timestamp_ms),
+                    end_timestamp_ms: (chunk.end_timestamp_ms != 0).then_some(chunk.end_timestamp_ms),
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2ec859c and 59f6926.

📒 Files selected for processing (3)
  • crates/transcribe-whisper-local/src/service/streaming.rs (1 hunks)
  • crates/vad/src/lib.rs (1 hunks)
  • crates/whisper-local/src/stream.rs (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/vad/src/lib.rs
  • crates/transcribe-whisper-local/src/service/streaming.rs
  • crates/whisper-local/src/stream.rs
🔇 Additional comments (4)
crates/whisper-local/src/stream.rs (3)

44-50: LGTM on accessors

Returning Option<usize> by value is fine here; Option is Copy. No extra clones or lifetimes involved.


131-132: Explicitly passing (None, None) for Rodio path is appropriate

Acknowledged that Rodio-based streams don’t have chunk-timestamps; forwarding (None, None) keeps behavior consistent.


172-173: Timestamp propagation from metadata path: LGTM

Forwarding (chunk.start_timestamp_ms(), chunk.end_timestamp_ms()) to process_transcription is the right spot to inject timing metadata.

Also applies to: 179-180

crates/vad/src/lib.rs (1)

57-57: Pattern with .. keeps tests resilient to struct evolution

Good change. It future-proofs the test against added fields (e.g., timestamps) while still extracting samples.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 3 files. Review in cubic

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/whisper-local/src/stream.rs (2)

23-25: Trait extension looks good; consider default implementations to avoid semver break

Adding required methods forces all downstream implementers to update. If you want to soften the break, provide default impls that return None.

Apply this diff to add defaults:

 pub trait AudioChunk: Send + 'static {
     fn samples(&self) -> &[f32];
     fn meta(&self) -> Option<serde_json::Value>;
-    fn start_timestamp_ms(&self) -> Option<usize>;
-    fn end_timestamp_ms(&self) -> Option<usize>;
+    fn start_timestamp_ms(&self) -> Option<usize> { None }
+    fn end_timestamp_ms(&self) -> Option<usize> { None }
 }

31-33: Prefer u64 for millisecond timestamps

On 32-bit targets, usize may cap at ~49.7 days of ms. Using u64 avoids platform-dependent limits and makes intent explicit.

If you decide to switch, update the trait, struct fields, and call sites accordingly in the same PR to keep types consistent.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 59f6926 and f19f856.

📒 Files selected for processing (2)
  • crates/transcribe-whisper-local/src/service/streaming.rs (1 hunks)
  • crates/whisper-local/src/stream.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/transcribe-whisper-local/src/service/streaming.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}

⚙️ CodeRabbit Configuration File

**/*.{js,ts,tsx,rs}: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".

Files:

  • crates/whisper-local/src/stream.rs
🔇 Additional comments (5)
crates/whisper-local/src/stream.rs (5)

44-50: AudioChunk impl for SimpleAudioChunk — LGTM

The new accessors correctly forward the optional timestamps.


131-132: Rodio path fallback — LGTM

Passing (None, None) preserves existing behavior when timestamps aren’t available.


172-173: Timestamp threading from chunk — LGTM

Forwarding start/end from the chunk via the trait accessors is correct and keeps the stream generic over AudioChunk.


179-180: Passing timestamps through to processing — LGTM

The API threading is clear and localized; good separation of concerns.


197-198: process_transcription signature change — all call sites updated

Both invocations in crates/whisper-local/src/stream.rs now include the new timestamps parameter, and no other calls to process_transcription were found in the codebase. LGTM.

Comment on lines +211 to +214
if let (Some(start_ms), Some(end_ms)) = timestamps {
segment.start = start_ms as f64 / 1000.0;
segment.end = end_ms as f64 / 1000.0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Bug: Overwrites per-segment timing with chunk bounds; offset instead of replacing

This sets every segment’s start/end to the same chunk-level [start, end], discarding per-segment timing from Whisper and making all segments identical in time. Instead, add the chunk start offset to each segment’s existing times and optionally clamp to the chunk end.

Apply this diff to offset/clamp rather than overwrite:

-                    if let (Some(start_ms), Some(end_ms)) = timestamps {
-                        segment.start = start_ms as f64 / 1000.0;
-                        segment.end = end_ms as f64 / 1000.0;
-                    }
+                    // Add chunk start offset; keep model-provided intra-chunk timing.
+                    if let (Some(start_ms), _) = timestamps {
+                        let offset_s = start_ms as f64 / 1000.0;
+                        segment.start += offset_s;
+                        segment.end += offset_s;
+                    }
+                    // Optionally clamp segment end to the chunk end if provided.
+                    if let (_, Some(end_ms)) = timestamps {
+                        let end_limit_s = end_ms as f64 / 1000.0;
+                        if segment.end > end_limit_s {
+                            segment.end = end_limit_s;
+                        }
+                    }

To double-check expectations, confirm how Segment.start/end are used downstream and whether they currently assume absolute or relative times:


🏁 Script executed:

#!/bin/bash
# Locate Segment definition and usages of start/end to infer expected semantics.
rg -n -C3 -P 'struct\s+Segment\b' --type=rust
rg -n -C3 -P '\bSegment\b' --type=rust | rg -n -C1 -P '\bstart\b|\bend\b'

Length of output: 3201


Bug: Preserve per-segment timing by offsetting, not overwriting

The current code at crates/whisper-local/src/stream.rs (lines 211–214) replaces every segment’s times with the enclosing chunk’s bounds, discarding the model’s intra-chunk timestamps. Instead, you should add the chunk start offset to each segment’s existing start/end and clamp the end to the chunk boundary if provided.

• Location: crates/whisper-local/src/stream.rs:211–214

Apply this patch:

-                    if let (Some(start_ms), Some(end_ms)) = timestamps {
-                        segment.start = start_ms as f64 / 1000.0;
-                        segment.end = end_ms as f64 / 1000.0;
-                    }
+                    // Add chunk start offset; keep model-provided intra-chunk timing.
+                    if let (Some(start_ms), _) = timestamps {
+                        let offset_s = start_ms as f64 / 1000.0;
+                        segment.start += offset_s;
+                        segment.end += offset_s;
+                    }
+                    // Clamp segment end to the chunk end if provided.
+                    if let (_, Some(end_ms)) = timestamps {
+                        let end_limit_s = end_ms as f64 / 1000.0;
+                        if segment.end > end_limit_s {
+                            segment.end = end_limit_s;
+                        }
+                    }

This ensures each segment retains its relative timing from Whisper while mapping to absolute positions in the full audio.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let (Some(start_ms), Some(end_ms)) = timestamps {
segment.start = start_ms as f64 / 1000.0;
segment.end = end_ms as f64 / 1000.0;
}
// Add chunk start offset; keep model-provided intra-chunk timing.
if let (Some(start_ms), _) = timestamps {
let offset_s = start_ms as f64 / 1000.0;
segment.start += offset_s;
segment.end += offset_s;
}
// Clamp segment end to the chunk end if provided.
if let (_, Some(end_ms)) = timestamps {
let end_limit_s = end_ms as f64 / 1000.0;
if segment.end > end_limit_s {
segment.end = end_limit_s;
}
}
🤖 Prompt for AI Agents
In crates/whisper-local/src/stream.rs around lines 211 to 214, the code
currently overwrites each segment’s start/end with the chunk bounds; instead,
add the chunk start offset to the segment’s existing start/end (preserving the
model’s intra-chunk timing), convert ms->seconds consistently, and if an
enclosing chunk end is provided clamp the adjusted segment.end to that chunk end
rather than overwriting it; ensure you only apply the offset when the segment
has timestamps (Some) and leave segments without timestamps unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant