perf: Optimize subtitle segmentation with concurrent chunk processing#1264
perf: Optimize subtitle segmentation with concurrent chunk processing#1264alexj11324 wants to merge 8 commits intomengxi-ream:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 32c8fb5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd2e4bd110
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR targets faster subtitle processing by introducing concurrent chunk segmentation in the AI segmentation pipeline, and improves translation responsiveness by triggering translation ticks as soon as each chunk completes.
Changes:
- Process multiple subtitle chunks in parallel (bounded by
MAX_CONCURRENT_SEGMENTS) and cap chunk size withMAX_FRAGMENTS_PER_CHUNK. - Add an
onChunkProcessedcallback so the translation coordinator can tick immediately after chunk completion. - Export
rebalanceToTargetRangeand introduce new subtitle processing constants.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/subtitles/processor/optimizer.ts | Exports rebalanceToTargetRange for reuse outside the optimizer module. |
| src/utils/constants/subtitles.ts | Adds concurrency/chunk-size constants for AI segmentation. |
| src/entrypoints/subtitles.content/universal-adapter.ts | Wires onChunkProcessed callback into the segmentation pipeline to trigger translation ticks. |
| src/entrypoints/subtitles.content/translation-coordinator.ts | Adds triggerTranslationTick() public method to allow external tick triggering. |
| src/entrypoints/subtitles.content/segmentation-pipeline.ts | Refactors the pipeline to pick multiple non-overlapping chunks and process them concurrently, then merge results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.processedFragments.push(...optimized) | ||
| this.processedFragments.sort((a, b) => a.start - b.start) | ||
| const rebalanced = rebalanceToTargetRange(segmented, this.getSourceLanguage()) | ||
| this.mergeFragments(rebalanced, chunk) |
There was a problem hiding this comment.
The AI-success path now applies only rebalanceToTargetRange(segmented, ...) rather than optimizeSubtitles(segmented, ...). This drops the processSubtitles pass (whitespace cleanup, max-length enforcement, sentence boundary logic), which can change segmentation quality/line lengths compared to the previous behavior and the non-AI path. If the goal is to keep behavior consistent, consider running optimizeSubtitles on the AI output (or add a dedicated post-AI normalization step that preserves prior constraints) before merging.
| this.mergeFragments(rebalanced, chunk) | |
| const optimized = optimizeSubtitles(rebalanced, this.getSourceLanguage()) | |
| this.mergeFragments(optimized, chunk) |
There was a problem hiding this comment.
1 issue found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/entrypoints/subtitles.content/segmentation-pipeline.ts">
<violation number="1" location="src/entrypoints/subtitles.content/segmentation-pipeline.ts:113">
P1: Race condition: `mergeFragments()` can drop results from concurrently processed chunks. When multiple chunks run via `Promise.all`, they share `processedFragments` without synchronization. If chunk B completes first and merges its output, then chunk A completes and runs this range-based filter, it will remove any fragment whose `start` falls within chunk A's `[chunkStart, chunkEnd]` range — including fragments that `rebalanceToTargetRange` may have adjusted to start at chunk A's boundary. Since all raw starts are already marked in `segmentedRawStarts` before `Promise.all` begins, the dropped fragments are never retried.
Consider either (1) processing chunks sequentially (simpler), (2) accumulating results and merging once after all concurrent chunks complete, or (3) scoping the filter to only remove fragments that originated from the current chunk's raw input rather than using time-range overlap.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/entrypoints/subtitles.content/ui/use-vertical-drag.ts">
<violation number="1" location="src/entrypoints/subtitles.content/ui/use-vertical-drag.ts:230">
P2: Bottom subtitle render offset is capped at 15%, but anchor/drag math still uses full controls height, causing inconsistent coordinate transforms and position jumps for large controls.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 503955b205
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rawFragments: this.originalSubtitles, | ||
| getVideoElement: () => this.subtitlesScheduler?.getVideoElement() ?? null, | ||
| getSourceLanguage: () => this.subtitlesFetcher.getSourceLanguage(), | ||
| onChunkProcessed: () => this.translationCoordinator?.triggerTranslationTick(), |
There was a problem hiding this comment.
Suppress chunk-triggered ticks after subtitles are turned off
This callback can still fire after handleToggleSubtitles(false) calls translationCoordinator.stop(), because in-flight SegmentationPipeline.processChunk() tasks are not cancelled and the coordinator instance is still retained. In that case triggerTranslationTick() runs post-stop and can enter translateNearby, causing translation requests and scheduler state updates while subtitles are disabled. Guard manual ticks behind an active/running flag (or detach this callback when stopping) to prevent background translation work.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/entrypoints/subtitles.content/translation-coordinator.ts">
<violation number="1" location="src/entrypoints/subtitles.content/translation-coordinator.ts:64">
P2: Stopping only blocks new ticks; in-flight async translation can still emit stale `onTranslated`/`onStateChange` after shutdown.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0b6234210
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Process up to 3 chunks concurrently instead of sequentially (3x faster segmentation throughput) - Remove redundant optimizeSubtitles() on AI-segmented results (AI already produces sentence-level segments) - Cap fragments per chunk (MAX_FRAGMENTS_PER_CHUNK=50) for faster LLM inference with smaller prompts - Add eager translation trigger via onChunkProcessed callback (eliminates ~250ms polling delay per chunk) - Re-read video.currentTime each loop iteration for seek-aware priority-based chunk selection https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
Preserve the short-fragment rebalancing step (merges "Okay." + "Yes." into one subtitle line) while still skipping the full processSubtitles() that could damage AI sentence boundaries. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
Prevent in-flight segmentation chunks from triggering translation requests after subtitles are disabled via stop(). https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
The 50-fragment cap was splitting sentences across chunk boundaries, causing AI segmentation to produce incomplete fragments with "...". The 60s time window (PROCESS_LOOK_AHEAD_MS) already limits chunk size. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
c0b6234 to
f703aa0
Compare
Replace boolean isTranslating mutex with activeTranslations counter, allowing up to 2 concurrent batches. When a batch completes, immediately trigger the next batch instead of waiting for the next timeupdate event. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4255feae00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Clean >> chevron markers in cleanFragmentsForAi() so they don't pass through to AI output and rendered subtitles. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
Contributor trust score9/100 — New contributor This score estimates contributor familiarity with Outcome
Score breakdown
Signals used
Policy
Updated automatically when the PR changes or when a maintainer reruns the workflow. |
taiiiyang
left a comment
There was a problem hiding this comment.
This PR is excellent, it enhances the performance of AI sentence segmentation.
| ) | ||
| this.processedFragments.push(...optimized) | ||
| this.processedFragments.sort((a, b) => a.start - b.start) | ||
| const rebalanced = rebalanceToTargetRange(segmented, this.getSourceLanguage()) |
There was a problem hiding this comment.
Can I know why the original optimizeSubtitles was removed?
Type of Changes
Description
This PR optimizes the subtitle segmentation pipeline to process multiple chunks concurrently instead of sequentially. The changes improve performance by:
Concurrent Chunk Processing: Modified
SegmentationPipelineto process up toMAX_CONCURRENT_SEGMENTS(3) chunks in parallel usingPromise.all(), reducing overall processing time.Improved Chunk Management:
findNextChunk()to accept aclaimedset to prevent overlapping chunk selection across concurrent operationsfindNextChunks()method to intelligently select multiple non-overlapping chunks prioritized by playback positionMAX_FRAGMENTS_PER_CHUNK(50) limit to prevent excessively large chunksBetter Synchronization:
onChunkProcessedcallback toSegmentationPipelinethat triggers translation coordinator ticks after each chunk completestriggerTranslationTick()method inTranslationCoordinatorfor external triggeringCode Organization:
mergeFragments()helper method to reduce duplicationrebalanceToTargetRange()function from optimizer for reuseMAX_CONCURRENT_SEGMENTSandMAX_FRAGMENTS_PER_CHUNKto subtitles constantsRelated Issue
N/A
How Has This Been Tested?
Checklist
Additional Information
The concurrent processing approach significantly reduces subtitle processing latency, especially for longer videos with many subtitle fragments. The callback mechanism ensures the translation coordinator stays synchronized with segmentation progress.
https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd
Summary by cubic
Process subtitle chunks concurrently (up to 3) and prefetch translations with up to 2 parallel batches, triggering a tick after each processed chunk to cut segmentation-to-translation delay. Also fixes YouTube controls drift, stops translations after subtitles are off, and strips
>>speaker markers.Refactors
MAX_CONCURRENT_SEGMENTS=3non-overlapping chunks near playback viafindNextChunks(), re-readingvideo.currentTimeeach loop.triggerTranslationTick()after each chunk viaonChunkProcessedfor eager translation.activeTranslations; immediately queue the next batch when one finishes.rebalanceToTargetRange()on segmented results; skip fulloptimizeSubtitles()and use extractedmergeFragments().PROCESS_LOOK_AHEAD_MS=60_000.Bug Fixes
bottom: 0 !important(removed transition).stop()so no requests fire when subtitles are off.>>) during preprocessing so they don’t reach segmentation or rendering.Written for commit 32c8fb5. Summary will update on new commits.