-
Notifications
You must be signed in to change notification settings - Fork 330
Add correct timestamp for segment based on session timer #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds optional per-chunk start/end millisecond timestamps through the streaming and transcription pipeline: extends the AudioChunk trait and SimpleAudioChunk, threads timestamps into process_transcription, annotates produced segments when present, and updates the VAD producer to populate chunk timestamps. Changes
Sequence Diagram(s)sequenceDiagram
participant Producer as Producer (VAD / Rodio)
participant Stream as Stream Pipeline
participant Proc as process_transcription
participant Whisper as Whisper Engine
participant Down as Downstream Consumer
Producer->>Stream: AudioChunk(samples, meta, start/end_ts?)
Stream->>Proc: samples, meta, timestamps (Option, Option)
Proc->>Whisper: transcribe(samples, meta)
alt success
Whisper-->>Proc: segments
Proc->>Proc: attach start/end (if provided, ms → s)
Proc-->>Stream: Segment stream (with timestamps when present)
Stream-->>Down: Segments
else error
Whisper-->>Proc: error
Proc-->>Stream: terminate (Ready(None))
Stream-->>Down: end of stream
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{js,ts,tsx,rs}⚙️ CodeRabbit Configuration File
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (6)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
crates/whisper-local/src/stream.rs (3)
31-33
: Consider using u64 for timestamps (ms) to avoid arch-dependent limitsUsing
usize
ties max timestamp to pointer width. If sessions can exceed ~49 days (on 32-bit) or you want platform consistency, preferu64
.Example:
- pub start_timestamp_ms: Option<usize>, - pub end_timestamp_ms: Option<usize>, + pub start_timestamp_ms: Option<u64>, + pub end_timestamp_ms: Option<u64>,This change would cascade to the trait and call sites.
192-199
: Unnecessary lifetime parameter onprocess_transcription
The explicit
'a
isn’t required since no borrows escape the function. Removing it simplifies the signature.Apply:
-fn process_transcription<'a>( - whisper: &'a mut Whisper, - samples: &'a [f32], - current_segment_task: &'a mut Option<Pin<Box<dyn Stream<Item = Segment> + Send>>>, +fn process_transcription( + whisper: &mut Whisper, + samples: &[f32], + current_segment_task: &mut Option<Pin<Box<dyn Stream<Item = Segment> + Send>>>, meta: Option<serde_json::Value>, timestamps: (Option<usize>, Option<usize>), ) -> Poll<Option<Segment>> {
211-214
: Guard against invalid timestamps (end < start)If out-of-order timestamps slip in, you’ll create negative durations downstream. Add a simple guard before setting segment times.
- if let (Some(start_ms), Some(end_ms)) = timestamps { - segment.start = start_ms as f64 / 1000.0; - segment.end = end_ms as f64 / 1000.0; - } + if let (Some(start_ms), Some(end_ms)) = timestamps { + if end_ms >= start_ms { + segment.start = start_ms as f64 / 1000.0; + segment.end = end_ms as f64 / 1000.0; + } + }crates/transcribe-whisper-local/src/service/streaming.rs (1)
300-302
: Treat zero timestamps as “unknown” to avoid overriding model times with 0.0sIf VAD emits 0 for unknown times, always wrapping with
Some(...)
forces segments to [0.0, 0.0]. Prefer mapping zeros toNone
and let downstream defaults apply.Apply:
- start_timestamp_ms: Some(chunk.start_timestamp_ms), - end_timestamp_ms: Some(chunk.end_timestamp_ms), + start_timestamp_ms: (chunk.start_timestamp_ms != 0).then_some(chunk.start_timestamp_ms), + end_timestamp_ms: (chunk.end_timestamp_ms != 0).then_some(chunk.end_timestamp_ms),
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
crates/transcribe-whisper-local/src/service/streaming.rs
(1 hunks)crates/vad/src/lib.rs
(1 hunks)crates/whisper-local/src/stream.rs
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}
: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
crates/vad/src/lib.rs
crates/transcribe-whisper-local/src/service/streaming.rs
crates/whisper-local/src/stream.rs
🔇 Additional comments (4)
crates/whisper-local/src/stream.rs (3)
44-50
: LGTM on accessorsReturning
Option<usize>
by value is fine here;Option
isCopy
. No extra clones or lifetimes involved.
131-132
: Explicitly passing (None, None) for Rodio path is appropriateAcknowledged that Rodio-based streams don’t have chunk-timestamps; forwarding
(None, None)
keeps behavior consistent.
172-173
: Timestamp propagation from metadata path: LGTMForwarding
(chunk.start_timestamp_ms(), chunk.end_timestamp_ms())
toprocess_transcription
is the right spot to inject timing metadata.Also applies to: 179-180
crates/vad/src/lib.rs (1)
57-57
: Pattern with..
keeps tests resilient to struct evolutionGood change. It future-proofs the test against added fields (e.g., timestamps) while still extracting
samples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic analysis
No issues found across 3 files. Review in cubic
59f6926
to
f19f856
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/whisper-local/src/stream.rs (2)
23-25
: Trait extension looks good; consider default implementations to avoid semver breakAdding required methods forces all downstream implementers to update. If you want to soften the break, provide default impls that return None.
Apply this diff to add defaults:
pub trait AudioChunk: Send + 'static { fn samples(&self) -> &[f32]; fn meta(&self) -> Option<serde_json::Value>; - fn start_timestamp_ms(&self) -> Option<usize>; - fn end_timestamp_ms(&self) -> Option<usize>; + fn start_timestamp_ms(&self) -> Option<usize> { None } + fn end_timestamp_ms(&self) -> Option<usize> { None } }
31-33
: Prefer u64 for millisecond timestampsOn 32-bit targets, usize may cap at ~49.7 days of ms. Using u64 avoids platform-dependent limits and makes intent explicit.
If you decide to switch, update the trait, struct fields, and call sites accordingly in the same PR to keep types consistent.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
crates/transcribe-whisper-local/src/service/streaming.rs
(1 hunks)crates/whisper-local/src/stream.rs
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/transcribe-whisper-local/src/service/streaming.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{js,ts,tsx,rs}
⚙️ CodeRabbit Configuration File
**/*.{js,ts,tsx,rs}
: 1. No error handling.
2. No unused imports, variables, or functions.
3. For comments, keep it minimal. It should be about "Why", not "What".
Files:
crates/whisper-local/src/stream.rs
🔇 Additional comments (5)
crates/whisper-local/src/stream.rs (5)
44-50
: AudioChunk impl for SimpleAudioChunk — LGTMThe new accessors correctly forward the optional timestamps.
131-132
: Rodio path fallback — LGTMPassing (None, None) preserves existing behavior when timestamps aren’t available.
172-173
: Timestamp threading from chunk — LGTMForwarding start/end from the chunk via the trait accessors is correct and keeps the stream generic over AudioChunk.
179-180
: Passing timestamps through to processing — LGTMThe API threading is clear and localized; good separation of concerns.
197-198
: process_transcription signature change — all call sites updatedBoth invocations in
crates/whisper-local/src/stream.rs
now include the new timestamps parameter, and no other calls toprocess_transcription
were found in the codebase. LGTM.
if let (Some(start_ms), Some(end_ms)) = timestamps { | ||
segment.start = start_ms as f64 / 1000.0; | ||
segment.end = end_ms as f64 / 1000.0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Bug: Overwrites per-segment timing with chunk bounds; offset instead of replacing
This sets every segment’s start/end to the same chunk-level [start, end], discarding per-segment timing from Whisper and making all segments identical in time. Instead, add the chunk start offset to each segment’s existing times and optionally clamp to the chunk end.
Apply this diff to offset/clamp rather than overwrite:
- if let (Some(start_ms), Some(end_ms)) = timestamps {
- segment.start = start_ms as f64 / 1000.0;
- segment.end = end_ms as f64 / 1000.0;
- }
+ // Add chunk start offset; keep model-provided intra-chunk timing.
+ if let (Some(start_ms), _) = timestamps {
+ let offset_s = start_ms as f64 / 1000.0;
+ segment.start += offset_s;
+ segment.end += offset_s;
+ }
+ // Optionally clamp segment end to the chunk end if provided.
+ if let (_, Some(end_ms)) = timestamps {
+ let end_limit_s = end_ms as f64 / 1000.0;
+ if segment.end > end_limit_s {
+ segment.end = end_limit_s;
+ }
+ }
To double-check expectations, confirm how Segment.start/end are used downstream and whether they currently assume absolute or relative times:
🏁 Script executed:
#!/bin/bash
# Locate Segment definition and usages of start/end to infer expected semantics.
rg -n -C3 -P 'struct\s+Segment\b' --type=rust
rg -n -C3 -P '\bSegment\b' --type=rust | rg -n -C1 -P '\bstart\b|\bend\b'
Length of output: 3201
Bug: Preserve per-segment timing by offsetting, not overwriting
The current code at crates/whisper-local/src/stream.rs (lines 211–214) replaces every segment’s times with the enclosing chunk’s bounds, discarding the model’s intra-chunk timestamps. Instead, you should add the chunk start offset to each segment’s existing start/end and clamp the end to the chunk boundary if provided.
• Location: crates/whisper-local/src/stream.rs:211–214
Apply this patch:
- if let (Some(start_ms), Some(end_ms)) = timestamps {
- segment.start = start_ms as f64 / 1000.0;
- segment.end = end_ms as f64 / 1000.0;
- }
+ // Add chunk start offset; keep model-provided intra-chunk timing.
+ if let (Some(start_ms), _) = timestamps {
+ let offset_s = start_ms as f64 / 1000.0;
+ segment.start += offset_s;
+ segment.end += offset_s;
+ }
+ // Clamp segment end to the chunk end if provided.
+ if let (_, Some(end_ms)) = timestamps {
+ let end_limit_s = end_ms as f64 / 1000.0;
+ if segment.end > end_limit_s {
+ segment.end = end_limit_s;
+ }
+ }
This ensures each segment retains its relative timing from Whisper while mapping to absolute positions in the full audio.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if let (Some(start_ms), Some(end_ms)) = timestamps { | |
segment.start = start_ms as f64 / 1000.0; | |
segment.end = end_ms as f64 / 1000.0; | |
} | |
// Add chunk start offset; keep model-provided intra-chunk timing. | |
if let (Some(start_ms), _) = timestamps { | |
let offset_s = start_ms as f64 / 1000.0; | |
segment.start += offset_s; | |
segment.end += offset_s; | |
} | |
// Clamp segment end to the chunk end if provided. | |
if let (_, Some(end_ms)) = timestamps { | |
let end_limit_s = end_ms as f64 / 1000.0; | |
if segment.end > end_limit_s { | |
segment.end = end_limit_s; | |
} | |
} |
🤖 Prompt for AI Agents
In crates/whisper-local/src/stream.rs around lines 211 to 214, the code
currently overwrites each segment’s start/end with the chunk bounds; instead,
add the chunk start offset to the segment’s existing start/end (preserving the
model’s intra-chunk timing), convert ms->seconds consistently, and if an
enclosing chunk end is provided clamp the adjusted segment.end to that chunk end
rather than overwriting it; ensure you only apply the offset when the segment
has timestamps (Some) and leave segments without timestamps unchanged.
f19f856
to
1c6aa58
Compare
Summary by cubic
Adds session-based start/end timestamps to audio chunks and uses them to set segment start/end during transcription. This fixes inaccurate segment timing and falls back when timestamps aren’t provided.
New Features
Migration