feat(producer): auto-size chunkSize from maxParallelChunks when undefined#939
Conversation
…ined Previously, plan() defaulted chunkSize to 240 on a `?? DEFAULT_CHUNK_SIZE` line, so a 660-frame composition with maxParallelChunks=16 ended up at 3 chunks (ceil(660/240)) regardless of the caller's fan-out intent. When config.chunkSize is undefined, auto-size from maxParallelChunks: effectiveChunkSize = max(MIN_CHUNK_SIZE, ceil(totalFrames / maxParallelChunks)) MIN_CHUNK_SIZE=10 keeps per-chunk fixed overhead from swamping the parallelism gain on tiny renders. Explicit numbers, including 240, take precedence over the auto-sizer — no behavior change for callers that set chunkSize explicitly. Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Address self-review findings: - assertPositiveInteger now only runs on the caller-supplied path so the error message names `configChunkSize` only when the caller actually passed one. Previously, the assertion fired against `resolvedChunkSize` on both paths and would have lied about the offending input. - Drop the call-site comment that narrated the diff/history; the function docstring already covers the contract. - Drop the internal-track name and date from the MIN_CHUNK_SIZE rationale and the test block header. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
vanceingalls
left a comment
There was a problem hiding this comment.
Surgical fix on the chunkSize=undefined path in services/distributed/plan.ts — diff is 3 files, the math is correct, and the new unit cases cover the right branches.
Strengths
plan.ts:386-390— keeping theassertPositiveIntegeron the caller-supplied value with its original name (and skipping it on the auto-sized branch because the inputs are already validated) is the right refactor shape. Error messages keep pointing at the actual bad input.plan.test.ts:103-126— the three new cases cover the three branches that matter (explicit wins / auto-size honors fan-out /MIN_CHUNK_SIZEfloor), each with the arithmetic inlined in the comment.- Public surface change is minimal:
MIN_CHUNK_SIZEadded todistributed.ts:47barrel; signature widening ofresolveChunkPlantonumber | undefinedis the only callsite-breaking ripple, and the lone caller inplan()atplan.ts:787-789is updated in the same patch.
Findings
important — CI failure is pre-existing on main, not introduced by this PR. regression-shards (shard-3) is FAILURE on this PR's fd3fce99 head SHA, blocking the regression workflow. Drill-down: shard-3 fails on style-7-prod in-process ({"event":"test_suite_summary","mode":"in-process","failed":1}, 60 frames below PSNR threshold, frames 0.33s–4.51s and 14.5s–16.4s). I verified main is currently red on the same suite: the v0.6.22 release commit 27efcd0f80 (run 26052294049, 2.5h before this PR's run) fails shard-3 with the identical signature — same 60-frame count, same in-process mode, same style-7-prod suite. The last green main was 4c29b43b85 (run 26051446762). So the failing check is a pre-existing in-process visual regression on main, not caused by this diff. The PR's own diff touches only services/distributed/plan.ts + distributed.ts barrel + the corresponding test file — none of which the in-process renderer imports. Calling out so the author can either wait for the unrelated main fix or merge with an explicit "CI failure unrelated" note.
important — plan() — golden planDir integration tests at plan.test.ts:155+ lose single-chunk-path coverage. The fixture invokes plan(projectDir, {fps:30,width:320,height:240,format:"mp4"}, planDir) without setting chunkSize or maxParallelChunks. With totalFrames=30 and the new auto-sizer: resolvedChunkSize = max(10, ceil(30/16)) = 10 → chunkCount=3, effectiveChunkSize=10. Previously: chunkCount=1, effectiveChunkSize=240. The assertions are chunkCount >= 1-style so they still pass, but the single-chunk path through plan() is no longer exercised by any integration test. Suggest adding chunkSize: 240 to the determinism test's config (or a sibling test) to retain coverage of the 1-chunk path that produces a different planHash framing.
important — output mp4 size regression hazard from smaller default GOP. effectiveChunkSize flows into LockedRenderConfig.gopSize at plan.ts:496 and LockedRenderConfig.chunkSize at plan.ts:500. The auto-sized default now drops gopSize from 240 to max(10, ceil(totalFrames/16)) — typically 10–60. More IDR keyframes → larger encoded mp4 files for the same visual content, sometimes meaningfully (single-digit % to mid-double-digit %, codec-dependent). The PR body cites the lever-1 chunk-scaling benchmark for throughput, but doesn't surface whether that benchmark also measured output bytes. Worth checking on a representative composition before adopters with chunkSize=undefined see a silent file-size regression on the same content. Not a correctness blocker, but worth one number in the PR body.
nit — MIN_CHUNK_SIZE = 10 lacks a benchmark cite at the source. plan.ts:200-202 justifies the floor as "per-chunk fixed-overhead wall (worker boot + plan download + planHash recompute + ffmpeg init)". The PR body says "per the same benchmark (2026-05-17)" but the source comment doesn't. Adding the date or a benchmark-results doc link would help the next person who wants to revisit the floor.
nit — resolveChunkPlan docstring at plan.ts:355 still references the formula as ceil(totalFrames / chunkSize). With the auto-sizer, the operative variable is resolvedChunkSize. Minor clarity touch-up.
Verdict: APPROVE
Reasoning: Fix is correct, well-scoped, and tested at the right granularity. The CI failure is pre-existing on main (verified against 27efcd0f80), not introduced by this PR. Important-level findings are calibration concerns (test coverage, output-size hazard, doc citations) that don't block merge but are worth folding in.
Review by Vai
…ntegration Address PR review feedback on #939: - Pin chunkSize=240 on the golden planDir layout test so the 1-chunk path through plan() stays exercised after the auto-sizer change. Assert chunkCount === 1 explicitly (previously just >= 1). - Add an integration test that runs plan() with chunkSize=undefined and asserts the auto-sizer produces multi-chunk output end-to-end (chunkCount=3, encoder.gopSize=10, encoder.chunkSize=10) for the same 30-frame fixture. - Document the GOP/file-size trade-off on the chunkSize docstring so adopters who optimize for output bytes know to pin chunkSize. - Update the resolveChunkPlan docstring formula to reference the operative variable (resolvedChunkSize) instead of the now-ambiguous chunkSize. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Thanks for the review @vanceingalls — pushed Addressed
Not addressed
CI shard-3 — agreed, pre-existing on main. I'll keep an eye on the next green main and re-baseline if the merge order requires it. |
What
PR 6.6.1 of the distributed rendering plan. Auto-size
chunkSizeinplan()when the caller passesundefinedso the caller'smaxParallelChunksis actually honored.Why
Previously,
plan()defaultedchunkSizeto240on a?? DEFAULT_CHUNK_SIZEline, so a 660-frame composition withmaxParallelChunks=16ended up at 3 chunks (ceil(660/240)) regardless of the caller's fan-out intent. Callers that bumpedmaxParallelChunksbut leftchunkSizeunset got silently degraded parallelism.Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17, where
--chunks 3,6,8,12produced 3 chunks on every config.How
When
config.chunkSizeisundefined, the auto-sizer picks:The auto-sizing is applied inside
resolveChunkPlanitself (itsconfigChunkSizeparameter now acceptsnumber | undefined), so the integration point inplan()collapses toresolveChunkPlan(totalFrames, config.chunkSize, maxParallel).MIN_CHUNK_SIZE = 10(new module-level constant, re-exported fromdistributed.ts). Lower values hit a per-chunk fixed-overhead wall (worker boot + plan download + ffmpeg init) per the same benchmark; raising it had no effect on the failure mode.240, take precedence over the auto-sizer — no behavior change for callers that setchunkSizeexplicitly.chunkCountmath itself (min(maxParallelChunks, ceil(totalFrames / chunkSize))) is unchanged; only the input to that math changes.Test plan
plan.test.ts > resolveChunkPlan:chunkSize=240+maxParallelChunks=16→ 3 chunks,effectiveChunkSize=240chunkSize=undefined+maxParallelChunks=16→ 16 chunks,effectiveChunkSize=42chunkSize=undefined+maxParallelChunks=16→ 5 chunks,effectiveChunkSize=10(not 13 chunks of 4 frames each)resolveChunkPlanandbuildChunkSlicesunit tests pass unchanged.bunx oxlint+bunx oxfmt --checkclean on the three touched files.tsc --noEmitclean (producer package typecheck).docker:test,docker:test --mode=distributed-simulated) — to run on CI; in-process baselines are not touched by this change, and any distributed-simulated fixture that omitschunkSizewill now produce a differentchunkCount(the previous expectation was relying on the bug).🤖 Generated with Claude Code