feat(producer): auto-size chunkSize from maxParallelChunks when undefined by jrusso1020 · Pull Request #939 · heygen-com/hyperframes

jrusso1020 · 2026-05-18T20:30:15Z

What

PR 6.6.1 of the distributed rendering plan. Auto-size chunkSize in plan() when the caller passes undefined so the caller's maxParallelChunks is actually honored.

Why

Previously, plan() defaulted chunkSize to 240 on a ?? DEFAULT_CHUNK_SIZE line, so a 660-frame composition with maxParallelChunks=16 ended up at 3 chunks (ceil(660/240)) regardless of the caller's fan-out intent. Callers that bumped maxParallelChunks but left chunkSize unset got silently degraded parallelism.

Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17, where --chunks 3,6,8,12 produced 3 chunks on every config.

How

When config.chunkSize is undefined, the auto-sizer picks:

effectiveChunkSize = max(MIN_CHUNK_SIZE, ceil(totalFrames / maxParallelChunks))

The auto-sizing is applied inside resolveChunkPlan itself (its configChunkSize parameter now accepts number | undefined), so the integration point in plan() collapses to resolveChunkPlan(totalFrames, config.chunkSize, maxParallel).

MIN_CHUNK_SIZE = 10 (new module-level constant, re-exported from distributed.ts). Lower values hit a per-chunk fixed-overhead wall (worker boot + plan download + ffmpeg init) per the same benchmark; raising it had no effect on the failure mode.
Explicit numbers, including the previous default of 240, take precedence over the auto-sizer — no behavior change for callers that set chunkSize explicitly.
The chunkCount math itself (min(maxParallelChunks, ceil(totalFrames / chunkSize))) is unchanged; only the input to that math changes.

Test plan

Unit tests added/updated — three new cases in plan.test.ts > resolveChunkPlan:
- explicit chunkSize wins: 660 frames + chunkSize=240 + maxParallelChunks=16 → 3 chunks, effectiveChunkSize=240
- auto-size honors fan-out: 660 frames + chunkSize=undefined + maxParallelChunks=16 → 16 chunks, effectiveChunkSize=42
- MIN_CHUNK_SIZE floor: 50 frames + chunkSize=undefined + maxParallelChunks=16 → 5 chunks, effectiveChunkSize=10 (not 13 chunks of 4 frames each)
Existing resolveChunkPlan and buildChunkSlices unit tests pass unchanged.
bunx oxlint + bunx oxfmt --check clean on the three touched files.
tsc --noEmit clean (producer package typecheck).
Producer regression harness (docker:test, docker:test --mode=distributed-simulated) — to run on CI; in-process baselines are not touched by this change, and any distributed-simulated fixture that omits chunkSize will now produce a different chunkCount (the previous expectation was relying on the bug).

🤖 Generated with Claude Code

…ined Previously, plan() defaulted chunkSize to 240 on a `?? DEFAULT_CHUNK_SIZE` line, so a 660-frame composition with maxParallelChunks=16 ended up at 3 chunks (ceil(660/240)) regardless of the caller's fan-out intent. When config.chunkSize is undefined, auto-size from maxParallelChunks: effectiveChunkSize = max(MIN_CHUNK_SIZE, ceil(totalFrames / maxParallelChunks)) MIN_CHUNK_SIZE=10 keeps per-chunk fixed overhead from swamping the parallelism gain on tiny renders. Explicit numbers, including 240, take precedence over the auto-sizer — no behavior change for callers that set chunkSize explicitly. Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Address self-review findings: - assertPositiveInteger now only runs on the caller-supplied path so the error message names `configChunkSize` only when the caller actually passed one. Previously, the assertion fired against `resolvedChunkSize` on both paths and would have lied about the offending input. - Drop the call-site comment that narrated the diff/history; the function docstring already covers the contract. - Drop the internal-track name and date from the MIN_CHUNK_SIZE rationale and the test block header. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

vanceingalls

Surgical fix on the chunkSize=undefined path in services/distributed/plan.ts — diff is 3 files, the math is correct, and the new unit cases cover the right branches.

Strengths

plan.ts:386-390 — keeping the assertPositiveInteger on the caller-supplied value with its original name (and skipping it on the auto-sized branch because the inputs are already validated) is the right refactor shape. Error messages keep pointing at the actual bad input.
plan.test.ts:103-126 — the three new cases cover the three branches that matter (explicit wins / auto-size honors fan-out / MIN_CHUNK_SIZE floor), each with the arithmetic inlined in the comment.
Public surface change is minimal: MIN_CHUNK_SIZE added to distributed.ts:47 barrel; signature widening of resolveChunkPlan to number | undefined is the only callsite-breaking ripple, and the lone caller in plan() at plan.ts:787-789 is updated in the same patch.

Findings

important — CI failure is pre-existing on main, not introduced by this PR. regression-shards (shard-3) is FAILURE on this PR's fd3fce99 head SHA, blocking the regression workflow. Drill-down: shard-3 fails on style-7-prod in-process ({"event":"test_suite_summary","mode":"in-process","failed":1}, 60 frames below PSNR threshold, frames 0.33s–4.51s and 14.5s–16.4s). I verified main is currently red on the same suite: the v0.6.22 release commit 27efcd0f80 (run 26052294049, 2.5h before this PR's run) fails shard-3 with the identical signature — same 60-frame count, same in-process mode, same style-7-prod suite. The last green main was 4c29b43b85 (run 26051446762). So the failing check is a pre-existing in-process visual regression on main, not caused by this diff. The PR's own diff touches only services/distributed/plan.ts + distributed.ts barrel + the corresponding test file — none of which the in-process renderer imports. Calling out so the author can either wait for the unrelated main fix or merge with an explicit "CI failure unrelated" note.

important — plan() — golden planDir integration tests at plan.test.ts:155+ lose single-chunk-path coverage. The fixture invokes plan(projectDir, {fps:30,width:320,height:240,format:"mp4"}, planDir) without setting chunkSize or maxParallelChunks. With totalFrames=30 and the new auto-sizer: resolvedChunkSize = max(10, ceil(30/16)) = 10 → chunkCount=3, effectiveChunkSize=10. Previously: chunkCount=1, effectiveChunkSize=240. The assertions are chunkCount >= 1-style so they still pass, but the single-chunk path through plan() is no longer exercised by any integration test. Suggest adding chunkSize: 240 to the determinism test's config (or a sibling test) to retain coverage of the 1-chunk path that produces a different planHash framing.

important — output mp4 size regression hazard from smaller default GOP. effectiveChunkSize flows into LockedRenderConfig.gopSize at plan.ts:496 and LockedRenderConfig.chunkSize at plan.ts:500. The auto-sized default now drops gopSize from 240 to max(10, ceil(totalFrames/16)) — typically 10–60. More IDR keyframes → larger encoded mp4 files for the same visual content, sometimes meaningfully (single-digit % to mid-double-digit %, codec-dependent). The PR body cites the lever-1 chunk-scaling benchmark for throughput, but doesn't surface whether that benchmark also measured output bytes. Worth checking on a representative composition before adopters with chunkSize=undefined see a silent file-size regression on the same content. Not a correctness blocker, but worth one number in the PR body.

nit — MIN_CHUNK_SIZE = 10 lacks a benchmark cite at the source. plan.ts:200-202 justifies the floor as "per-chunk fixed-overhead wall (worker boot + plan download + planHash recompute + ffmpeg init)". The PR body says "per the same benchmark (2026-05-17)" but the source comment doesn't. Adding the date or a benchmark-results doc link would help the next person who wants to revisit the floor.

nit — resolveChunkPlan docstring at plan.ts:355 still references the formula as ceil(totalFrames / chunkSize). With the auto-sizer, the operative variable is resolvedChunkSize. Minor clarity touch-up.

Verdict: APPROVE
Reasoning: Fix is correct, well-scoped, and tested at the right granularity. The CI failure is pre-existing on main (verified against 27efcd0f80), not introduced by this PR. Important-level findings are calibration concerns (test coverage, output-size hazard, doc citations) that don't block merge but are worth folding in.

Review by Vai

…ntegration Address PR review feedback on #939: - Pin chunkSize=240 on the golden planDir layout test so the 1-chunk path through plan() stays exercised after the auto-sizer change. Assert chunkCount === 1 explicitly (previously just >= 1). - Add an integration test that runs plan() with chunkSize=undefined and asserts the auto-sizer produces multi-chunk output end-to-end (chunkCount=3, encoder.gopSize=10, encoder.chunkSize=10) for the same 30-frame fixture. - Document the GOP/file-size trade-off on the chunkSize docstring so adopters who optimize for output bytes know to pin chunkSize. - Update the resolveChunkPlan docstring formula to reference the operative variable (resolvedChunkSize) instead of the now-ambiguous chunkSize. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

jrusso1020 · 2026-05-18T21:09:58Z

Thanks for the review @vanceingalls — pushed a8499632 addressing the actionable findings.

Addressed

Single-chunk-path coverage loss in plan() — golden planDir: Pinned chunkSize: 240 on the layout fixture so the 1-chunk path through plan() stays exercised, and tightened the assertion from chunkCount >= 1 to chunkCount === 1. Added a sibling integration test auto-sizes chunkSize end-to-end when caller omits it that runs plan() with chunkSize=undefined on the same 30-frame fixture and asserts the auto-sizer produces chunkCount=3, encoder.gopSize=10, encoder.chunkSize=10 — so both branches now have integration coverage.
GOP/file-size hazard from smaller default GOP: Good catch. I haven't re-run the chunk-scaling benchmark with output-bytes measurement, so I can't put a number on it yet. As a defensive doc change I added a paragraph on the chunkSize?: number docstring noting that effectiveChunkSize drives LockedRenderConfig.gopSize, and callers who optimize for output bytes (rather than wall-clock parallelism) should pass an explicit chunkSize matching their target GOP. Real output-bytes measurement is worth a follow-up PR if someone hits a meaningful regression in the wild.
resolveChunkPlan docstring formula stale: Rewrote it to reference resolvedChunkSize explicitly, with the auto-size + chunkCount + effectiveChunkSize lines as a three-step pseudo-code block.

Not addressed

MIN_CHUNK_SIZE benchmark cite at source: Per a saved feedback rule I'm trying to keep dates and internal-track names out of source comments (they rot fast and become "why is this here?" noise within months). The PR body carries the 2026-05-17 cite, and a future revisit can recover it from git blame → PR. Happy to add it back if you'd rather lean toward in-source provenance — let me know your preference and I'll fold it in.

CI shard-3 — agreed, pre-existing on main. I'll keep an eye on the next green main and re-baseline if the merge order requires it.

jrusso1020 and others added 2 commits May 18, 2026 20:29

vanceingalls approved these changes May 18, 2026

View reviewed changes

jrusso1020 merged commit 2fd161b into main May 18, 2026
35 of 43 checks passed

jrusso1020 deleted the feat/auto-size-chunk-size-when-undefined branch May 18, 2026 21:25

jrusso1020 mentioned this pull request May 19, 2026

perf(distributed): c=3 worst-case regression — per-worker SwiftShader probe contention at 6 workers/pod #955

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(producer): auto-size chunkSize from maxParallelChunks when undefined#939

feat(producer): auto-size chunkSize from maxParallelChunks when undefined#939
jrusso1020 merged 3 commits into
mainfrom
feat/auto-size-chunk-size-when-undefined

jrusso1020 commented May 18, 2026

Uh oh!

vanceingalls left a comment

Uh oh!

jrusso1020 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrusso1020 commented May 18, 2026

What

Why

How

Test plan

Uh oh!

vanceingalls left a comment

Choose a reason for hiding this comment

Uh oh!

jrusso1020 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants