perf: Optimize subtitle segmentation with concurrent chunk processing by alexj11324 · Pull Request #1264 · mengxi-ream/read-frog

alexj11324 · 2026-04-02T04:48:55Z

Type of Changes

⚡ Performance improvement (perf)
♻️ Code refactoring (refactor)

Description

This PR optimizes the subtitle segmentation pipeline to process multiple chunks concurrently instead of sequentially. The changes improve performance by:

Concurrent Chunk Processing: Modified SegmentationPipeline to process up to MAX_CONCURRENT_SEGMENTS (3) chunks in parallel using Promise.all(), reducing overall processing time.
Improved Chunk Management:
- Refactored findNextChunk() to accept a claimed set to prevent overlapping chunk selection across concurrent operations
- Added findNextChunks() method to intelligently select multiple non-overlapping chunks prioritized by playback position
- Added MAX_FRAGMENTS_PER_CHUNK (50) limit to prevent excessively large chunks
Better Synchronization:
- Added onChunkProcessed callback to SegmentationPipeline that triggers translation coordinator ticks after each chunk completes
- Exposed triggerTranslationTick() method in TranslationCoordinator for external triggering
- This ensures translation happens promptly as segments become available
Code Organization:
- Extracted mergeFragments() helper method to reduce duplication
- Exported rebalanceToTargetRange() function from optimizer for reuse
- Added new constants MAX_CONCURRENT_SEGMENTS and MAX_FRAGMENTS_PER_CHUNK to subtitles constants
- Fixed minor formatting inconsistencies (whitespace normalization)

Related Issue

N/A

How Has This Been Tested?

Existing tests pass
Changes maintain backward compatibility with existing subtitle processing logic
Concurrent processing is transparent to consumers of the pipeline

Checklist

I have tested these changes locally
My code follows the code style of this project
My changes do not break existing functionality

Additional Information

The concurrent processing approach significantly reduces subtitle processing latency, especially for longer videos with many subtitle fragments. The callback mechanism ensures the translation coordinator stays synchronized with segmentation progress.

https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

Summary by cubic

Process subtitle chunks concurrently (up to 3) and prefetch translations with up to 2 parallel batches, triggering a tick after each processed chunk to cut segmentation-to-translation delay. Also fixes YouTube controls drift, stops translations after subtitles are off, and strips >> speaker markers.

Refactors
- Process up to MAX_CONCURRENT_SEGMENTS=3 non-overlapping chunks near playback via findNextChunks(), re-reading video.currentTime each loop.
- Trigger triggerTranslationTick() after each chunk via onChunkProcessed for eager translation.
- Allow up to 2 concurrent translation batches via activeTranslations; immediately queue the next batch when one finishes.
- Keep rebalanceToTargetRange() on segmented results; skip full optimizeSubtitles() and use extracted mergeFragments().
- Remove fragment cap to avoid sentence truncation; rely on PROCESS_LOOK_AHEAD_MS=60_000.
Bug Fixes
- Prevent subtitle drift on YouTube controls hover by clamping controls offset to 15%, rAF-throttling mutation updates, and forcing container bottom: 0 !important (removed transition).
- Guard translation ticks after stop() so no requests fire when subtitles are off.
- Strip SDH speaker markers (>>) during preprocessing so they don’t reach segmentation or rendering.

^{Written for commit 32c8fb5. Summary will update on new commits.}

changeset-bot · 2026-04-02T04:49:01Z

🦋 Changeset detected

Latest commit: 32c8fb5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@read-frog/extension	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd2e4bd110

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/entrypoints/subtitles.content/segmentation-pipeline.ts

Copilot

Pull request overview

This PR targets faster subtitle processing by introducing concurrent chunk segmentation in the AI segmentation pipeline, and improves translation responsiveness by triggering translation ticks as soon as each chunk completes.

Changes:

Process multiple subtitle chunks in parallel (bounded by MAX_CONCURRENT_SEGMENTS) and cap chunk size with MAX_FRAGMENTS_PER_CHUNK.
Add an onChunkProcessed callback so the translation coordinator can tick immediately after chunk completion.
Export rebalanceToTargetRange and introduce new subtitle processing constants.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/utils/subtitles/processor/optimizer.ts	Exports `rebalanceToTargetRange` for reuse outside the optimizer module.
src/utils/constants/subtitles.ts	Adds concurrency/chunk-size constants for AI segmentation.
src/entrypoints/subtitles.content/universal-adapter.ts	Wires `onChunkProcessed` callback into the segmentation pipeline to trigger translation ticks.
src/entrypoints/subtitles.content/translation-coordinator.ts	Adds `triggerTranslationTick()` public method to allow external tick triggering.
src/entrypoints/subtitles.content/segmentation-pipeline.ts	Refactors the pipeline to pick multiple non-overlapping chunks and process them concurrently, then merge results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/entrypoints/subtitles.content/segmentation-pipeline.ts

Copilot · 2026-04-02T04:53:15Z

src/entrypoints/subtitles.content/segmentation-pipeline.ts

-        this.processedFragments.push(...optimized)
-        this.processedFragments.sort((a, b) => a.start - b.start)
+        const rebalanced = rebalanceToTargetRange(segmented, this.getSourceLanguage())
+        this.mergeFragments(rebalanced, chunk)


The AI-success path now applies only rebalanceToTargetRange(segmented, ...) rather than optimizeSubtitles(segmented, ...). This drops the processSubtitles pass (whitespace cleanup, max-length enforcement, sentence boundary logic), which can change segmentation quality/line lengths compared to the previous behavior and the non-AI path. If the goal is to keep behavior consistent, consider running optimizeSubtitles on the AI output (or add a dedicated post-AI normalization step that preserves prior constraints) before merging.

Suggested change

this.mergeFragments(rebalanced, chunk)

const optimized = optimizeSubtitles(rebalanced, this.getSourceLanguage())

this.mergeFragments(optimized, chunk)

src/entrypoints/subtitles.content/translation-coordinator.ts

cubic-dev-ai

1 issue found across 5 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/entrypoints/subtitles.content/segmentation-pipeline.ts">

<violation number="1" location="src/entrypoints/subtitles.content/segmentation-pipeline.ts:113">
P1: Race condition: `mergeFragments()` can drop results from concurrently processed chunks. When multiple chunks run via `Promise.all`, they share `processedFragments` without synchronization. If chunk B completes first and merges its output, then chunk A completes and runs this range-based filter, it will remove any fragment whose `start` falls within chunk A's `[chunkStart, chunkEnd]` range — including fragments that `rebalanceToTargetRange` may have adjusted to start at chunk A's boundary. Since all raw starts are already marked in `segmentedRawStarts` before `Promise.all` begins, the dropped fragments are never retried.

Consider either (1) processing chunks sequentially (simpler), (2) accumulating results and merging once after all concurrent chunks complete, or (3) scoping the filter to only remove fragments that originated from the current chunk's raw input rather than using time-range overlap.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/entrypoints/subtitles.content/segmentation-pipeline.ts

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/entrypoints/subtitles.content/ui/use-vertical-drag.ts">

<violation number="1" location="src/entrypoints/subtitles.content/ui/use-vertical-drag.ts:230">
P2: Bottom subtitle render offset is capped at 15%, but anchor/drag math still uses full controls height, causing inconsistent coordinate transforms and position jumps for large controls.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/entrypoints/subtitles.content/ui/use-vertical-drag.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 503955b205

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T05:38:23Z

src/entrypoints/subtitles.content/universal-adapter.ts

        rawFragments: this.originalSubtitles,
        getVideoElement: () => this.subtitlesScheduler?.getVideoElement() ?? null,
        getSourceLanguage: () => this.subtitlesFetcher.getSourceLanguage(),
+        onChunkProcessed: () => this.translationCoordinator?.triggerTranslationTick(),


Suppress chunk-triggered ticks after subtitles are turned off

This callback can still fire after handleToggleSubtitles(false) calls translationCoordinator.stop(), because in-flight SegmentationPipeline.processChunk() tasks are not cancelled and the coordinator instance is still retained. In that case triggerTranslationTick() runs post-stop and can enter translateNearby, causing translation requests and scheduler state updates while subtitles are disabled. Guard manual ticks behind an active/running flag (or detach this callback when stopping) to prevent background translation work.

Useful? React with 👍 / 👎.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/entrypoints/subtitles.content/translation-coordinator.ts">

<violation number="1" location="src/entrypoints/subtitles.content/translation-coordinator.ts:64">
P2: Stopping only blocks new ticks; in-flight async translation can still emit stale `onTranslated`/`onStateChange` after shutdown.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/entrypoints/subtitles.content/translation-coordinator.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0b6234210

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/entrypoints/subtitles.content/segmentation-pipeline.ts

- Process up to 3 chunks concurrently instead of sequentially (3x faster segmentation throughput) - Remove redundant optimizeSubtitles() on AI-segmented results (AI already produces sentence-level segments) - Cap fragments per chunk (MAX_FRAGMENTS_PER_CHUNK=50) for faster LLM inference with smaller prompts - Add eager translation trigger via onChunkProcessed callback (eliminates ~250ms polling delay per chunk) - Re-read video.currentTime each loop iteration for seek-aware priority-based chunk selection https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

Preserve the short-fragment rebalancing step (merges "Okay." + "Yes." into one subtitle line) while still skipping the full processSubtitles() that could damage AI sentence boundaries. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

…ift fix https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

Prevent in-flight segmentation chunks from triggering translation requests after subtitles are disabled via stop(). https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

The 50-fragment cap was splitting sentences across chunk boundaries, causing AI segmentation to produce incomplete fragments with "...". The 60s time window (PROCESS_LOOK_AHEAD_MS) already limits chunk size. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

Replace boolean isTranslating mutex with activeTranslations counter, allowing up to 2 concurrent batches. When a batch completes, immediately trigger the next batch instead of waiting for the next timeupdate event. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4255feae00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/entrypoints/subtitles.content/segmentation-pipeline.ts

Clean >> chevron markers in cleanFragmentsForAi() so they don't pass through to AI output and rendered subtitles. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

github-actions · 2026-04-02T22:02:16Z

Contributor trust score

9/100 — New contributor

This score estimates contributor familiarity with mengxi-ream/read-frog using public GitHub signals. It is advisory only and does not block merges automatically.

Outcome

PR: perf: Optimize subtitle segmentation with concurrent chunk processing #1264 — perf: Optimize subtitle segmentation with concurrent chunk processing
Author: @alexj11324
Trust label: contrib-trust:new
Maintainer review: required
Override label: add trust-check:skip to stop future trust automation updates

Score breakdown

Dimension	Score	Signals
Repo familiarity	0/35	commits in repo, merged PRs, reviews
Community standing	4/25	account age, followers, repo role
OSS influence	3/20	stars on owned non-fork repositories
PR track record	2/20	merge rate across resolved PRs in this repo

Signals used

Repo commits: 0 (author commits reachable from the repository default branch)
Repo PR history: merged 0, open 1, closed-unmerged 1
Repo reviews: 0
PR changed lines: 150 (+107 / -43)
Repo permission: read
Followers: 2
Account age: 17 months
Owned non-fork repos considered: max 1, total 2 (alexj11324/claude-skills (1), alexj11324/TSSPIM-Tsunami-and-Storm-Surge-Population-Impact-Model (1), alexj11324/AI-Scientist-v2 (0), alexj11324/BDI_LLM_Formal_Ver (0), alexj11324/11711-HW (0), alexj11324/DataStructure (0))

Policy

Low-score review threshold: < 30
Auto-close: score < 20 and changed lines > 1000
Policy version: v1.1

Updated automatically when the PR changes or when a maintainer reruns the workflow.

taiiiyang

This PR is excellent, it enhances the performance of AI sentence segmentation.

taiiiyang · 2026-04-04T11:26:15Z

src/entrypoints/subtitles.content/segmentation-pipeline.ts

-        )
-        this.processedFragments.push(...optimized)
-        this.processedFragments.sort((a, b) => a.start - b.start)
+        const rebalanced = rebalanceToTargetRange(segmented, this.getSourceLanguage())


Can I know why the original optimizeSubtitles was removed?

Copilot AI review requested due to automatic review settings April 2, 2026 04:48

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 2, 2026

Copilot started reviewing on behalf of alexj11324 April 2, 2026 04:49 View session

mengxi-ream requested a review from taiiiyang April 2, 2026 04:49

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/segmentation-pipeline.ts Show resolved Hide resolved

Copilot AI reviewed Apr 2, 2026

View reviewed changes

cubic-dev-ai bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/segmentation-pipeline.ts Show resolved Hide resolved

cubic-dev-ai bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/ui/use-vertical-drag.ts Show resolved Hide resolved

alexj11324 changed the title ~~Optimize subtitle segmentation with concurrent chunk processing~~ perf: Optimize subtitle segmentation with concurrent chunk processing Apr 2, 2026

github-actions bot added the perf label Apr 2, 2026

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

cubic-dev-ai bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/translation-coordinator.ts Show resolved Hide resolved

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 2, 2026

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/segmentation-pipeline.ts Show resolved Hide resolved

alexj11324 added 6 commits April 2, 2026 06:08

fix: prevent subtitle drift on YouTube controls hover

f9fcd55

https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

chore: add changeset for AI segmentation optimization and subtitle dr…

f36bf65

…ift fix https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

fix: guard translation ticks after coordinator stop

98c9744

Prevent in-flight segmentation chunks from triggering translation requests after subtitles are disabled via stop(). https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

alexj11324 force-pushed the claude/optimize-ai-segmentation-IX9uc branch from c0b6234 to f703aa0 Compare April 2, 2026 06:10

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 2, 2026

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

src/entrypoints/subtitles.content/segmentation-pipeline.ts Show resolved Hide resolved

fix: strip SDH speaker markers before AI segmentation

32c8fb5

Clean >> chevron markers in cleanFragmentsForAi() so they don't pass through to AI output and rendered subtitles. https://claude.ai/code/session_013Ag5Nt2Ko6eNoY4fC5SZnd

github-actions bot added contrib-trust:new PR author trust score is 0-29. needs-maintainer-review Contributor trust automation recommends maintainer review. labels Apr 2, 2026

taiiiyang reviewed Apr 4, 2026

View reviewed changes

alexj11324 closed this by deleting the head repository Apr 7, 2026

alexj11324 mentioned this pull request Apr 8, 2026

perf(subtitles): add concurrent chunk processing for AI segmentation #1294

Open

5 tasks

	this.mergeFragments(rebalanced, chunk)
	const optimized = optimizeSubtitles(rebalanced, this.getSourceLanguage())
	this.mergeFragments(optimized, chunk)

Uh oh!

Conversation

alexj11324 commented Apr 2, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Changes

Description

Related Issue

How Has This Been Tested?

Checklist

Additional Information

Summary by cubic

Uh oh!

changeset-bot bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor trust score

Uh oh!

taiiiyang left a comment

Choose a reason for hiding this comment

Uh oh!

taiiiyang Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexj11324 commented Apr 2, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Apr 2, 2026 •

edited

Loading

github-actions bot commented Apr 2, 2026 •

edited

Loading