Skip to content

feat(producer): auto-size chunkSize from maxParallelChunks when undefined#939

Merged
jrusso1020 merged 3 commits into
mainfrom
feat/auto-size-chunk-size-when-undefined
May 18, 2026
Merged

feat(producer): auto-size chunkSize from maxParallelChunks when undefined#939
jrusso1020 merged 3 commits into
mainfrom
feat/auto-size-chunk-size-when-undefined

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

PR 6.6.1 of the distributed rendering plan. Auto-size chunkSize in plan() when the caller passes undefined so the caller's maxParallelChunks is actually honored.

Why

Previously, plan() defaulted chunkSize to 240 on a ?? DEFAULT_CHUNK_SIZE line, so a 660-frame composition with maxParallelChunks=16 ended up at 3 chunks (ceil(660/240)) regardless of the caller's fan-out intent. Callers that bumped maxParallelChunks but left chunkSize unset got silently degraded parallelism.

Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17, where --chunks 3,6,8,12 produced 3 chunks on every config.

How

When config.chunkSize is undefined, the auto-sizer picks:

effectiveChunkSize = max(MIN_CHUNK_SIZE, ceil(totalFrames / maxParallelChunks))

The auto-sizing is applied inside resolveChunkPlan itself (its configChunkSize parameter now accepts number | undefined), so the integration point in plan() collapses to resolveChunkPlan(totalFrames, config.chunkSize, maxParallel).

  • MIN_CHUNK_SIZE = 10 (new module-level constant, re-exported from distributed.ts). Lower values hit a per-chunk fixed-overhead wall (worker boot + plan download + ffmpeg init) per the same benchmark; raising it had no effect on the failure mode.
  • Explicit numbers, including the previous default of 240, take precedence over the auto-sizer — no behavior change for callers that set chunkSize explicitly.
  • The chunkCount math itself (min(maxParallelChunks, ceil(totalFrames / chunkSize))) is unchanged; only the input to that math changes.

Test plan

  • Unit tests added/updated — three new cases in plan.test.ts > resolveChunkPlan:
    • explicit chunkSize wins: 660 frames + chunkSize=240 + maxParallelChunks=16 → 3 chunks, effectiveChunkSize=240
    • auto-size honors fan-out: 660 frames + chunkSize=undefined + maxParallelChunks=16 → 16 chunks, effectiveChunkSize=42
    • MIN_CHUNK_SIZE floor: 50 frames + chunkSize=undefined + maxParallelChunks=16 → 5 chunks, effectiveChunkSize=10 (not 13 chunks of 4 frames each)
  • Existing resolveChunkPlan and buildChunkSlices unit tests pass unchanged.
  • bunx oxlint + bunx oxfmt --check clean on the three touched files.
  • tsc --noEmit clean (producer package typecheck).
  • Producer regression harness (docker:test, docker:test --mode=distributed-simulated) — to run on CI; in-process baselines are not touched by this change, and any distributed-simulated fixture that omits chunkSize will now produce a different chunkCount (the previous expectation was relying on the bug).

🤖 Generated with Claude Code

jrusso1020 and others added 2 commits May 18, 2026 20:29
…ined

Previously, plan() defaulted chunkSize to 240 on a `?? DEFAULT_CHUNK_SIZE`
line, so a 660-frame composition with maxParallelChunks=16 ended up at 3
chunks (ceil(660/240)) regardless of the caller's fan-out intent.

When config.chunkSize is undefined, auto-size from maxParallelChunks:
  effectiveChunkSize = max(MIN_CHUNK_SIZE, ceil(totalFrames / maxParallelChunks))

MIN_CHUNK_SIZE=10 keeps per-chunk fixed overhead from swamping the
parallelism gain on tiny renders. Explicit numbers, including 240, take
precedence over the auto-sizer — no behavior change for callers that set
chunkSize explicitly.

Surfaced by the lever-1 chunk-scaling benchmark on 2026-05-17.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Address self-review findings:
- assertPositiveInteger now only runs on the caller-supplied path so the
  error message names `configChunkSize` only when the caller actually
  passed one. Previously, the assertion fired against `resolvedChunkSize`
  on both paths and would have lied about the offending input.
- Drop the call-site comment that narrated the diff/history; the
  function docstring already covers the contract.
- Drop the internal-track name and date from the MIN_CHUNK_SIZE rationale
  and the test block header.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surgical fix on the chunkSize=undefined path in services/distributed/plan.ts — diff is 3 files, the math is correct, and the new unit cases cover the right branches.

Strengths

  • plan.ts:386-390 — keeping the assertPositiveInteger on the caller-supplied value with its original name (and skipping it on the auto-sized branch because the inputs are already validated) is the right refactor shape. Error messages keep pointing at the actual bad input.
  • plan.test.ts:103-126 — the three new cases cover the three branches that matter (explicit wins / auto-size honors fan-out / MIN_CHUNK_SIZE floor), each with the arithmetic inlined in the comment.
  • Public surface change is minimal: MIN_CHUNK_SIZE added to distributed.ts:47 barrel; signature widening of resolveChunkPlan to number | undefined is the only callsite-breaking ripple, and the lone caller in plan() at plan.ts:787-789 is updated in the same patch.

Findings

important — CI failure is pre-existing on main, not introduced by this PR. regression-shards (shard-3) is FAILURE on this PR's fd3fce99 head SHA, blocking the regression workflow. Drill-down: shard-3 fails on style-7-prod in-process ({"event":"test_suite_summary","mode":"in-process","failed":1}, 60 frames below PSNR threshold, frames 0.33s–4.51s and 14.5s–16.4s). I verified main is currently red on the same suite: the v0.6.22 release commit 27efcd0f80 (run 26052294049, 2.5h before this PR's run) fails shard-3 with the identical signature — same 60-frame count, same in-process mode, same style-7-prod suite. The last green main was 4c29b43b85 (run 26051446762). So the failing check is a pre-existing in-process visual regression on main, not caused by this diff. The PR's own diff touches only services/distributed/plan.ts + distributed.ts barrel + the corresponding test file — none of which the in-process renderer imports. Calling out so the author can either wait for the unrelated main fix or merge with an explicit "CI failure unrelated" note.

important — plan() — golden planDir integration tests at plan.test.ts:155+ lose single-chunk-path coverage. The fixture invokes plan(projectDir, {fps:30,width:320,height:240,format:"mp4"}, planDir) without setting chunkSize or maxParallelChunks. With totalFrames=30 and the new auto-sizer: resolvedChunkSize = max(10, ceil(30/16)) = 10chunkCount=3, effectiveChunkSize=10. Previously: chunkCount=1, effectiveChunkSize=240. The assertions are chunkCount >= 1-style so they still pass, but the single-chunk path through plan() is no longer exercised by any integration test. Suggest adding chunkSize: 240 to the determinism test's config (or a sibling test) to retain coverage of the 1-chunk path that produces a different planHash framing.

important — output mp4 size regression hazard from smaller default GOP. effectiveChunkSize flows into LockedRenderConfig.gopSize at plan.ts:496 and LockedRenderConfig.chunkSize at plan.ts:500. The auto-sized default now drops gopSize from 240 to max(10, ceil(totalFrames/16)) — typically 10–60. More IDR keyframes → larger encoded mp4 files for the same visual content, sometimes meaningfully (single-digit % to mid-double-digit %, codec-dependent). The PR body cites the lever-1 chunk-scaling benchmark for throughput, but doesn't surface whether that benchmark also measured output bytes. Worth checking on a representative composition before adopters with chunkSize=undefined see a silent file-size regression on the same content. Not a correctness blocker, but worth one number in the PR body.

nit — MIN_CHUNK_SIZE = 10 lacks a benchmark cite at the source. plan.ts:200-202 justifies the floor as "per-chunk fixed-overhead wall (worker boot + plan download + planHash recompute + ffmpeg init)". The PR body says "per the same benchmark (2026-05-17)" but the source comment doesn't. Adding the date or a benchmark-results doc link would help the next person who wants to revisit the floor.

nit — resolveChunkPlan docstring at plan.ts:355 still references the formula as ceil(totalFrames / chunkSize). With the auto-sizer, the operative variable is resolvedChunkSize. Minor clarity touch-up.

Verdict: APPROVE
Reasoning: Fix is correct, well-scoped, and tested at the right granularity. The CI failure is pre-existing on main (verified against 27efcd0f80), not introduced by this PR. Important-level findings are calibration concerns (test coverage, output-size hazard, doc citations) that don't block merge but are worth folding in.

Review by Vai

…ntegration

Address PR review feedback on #939:

- Pin chunkSize=240 on the golden planDir layout test so the 1-chunk path
  through plan() stays exercised after the auto-sizer change. Assert
  chunkCount === 1 explicitly (previously just >= 1).
- Add an integration test that runs plan() with chunkSize=undefined and
  asserts the auto-sizer produces multi-chunk output end-to-end
  (chunkCount=3, encoder.gopSize=10, encoder.chunkSize=10) for the same
  30-frame fixture.
- Document the GOP/file-size trade-off on the chunkSize docstring so
  adopters who optimize for output bytes know to pin chunkSize.
- Update the resolveChunkPlan docstring formula to reference the operative
  variable (resolvedChunkSize) instead of the now-ambiguous chunkSize.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Thanks for the review @vanceingalls — pushed a8499632 addressing the actionable findings.

Addressed

  • Single-chunk-path coverage loss in plan() — golden planDir: Pinned chunkSize: 240 on the layout fixture so the 1-chunk path through plan() stays exercised, and tightened the assertion from chunkCount >= 1 to chunkCount === 1. Added a sibling integration test auto-sizes chunkSize end-to-end when caller omits it that runs plan() with chunkSize=undefined on the same 30-frame fixture and asserts the auto-sizer produces chunkCount=3, encoder.gopSize=10, encoder.chunkSize=10 — so both branches now have integration coverage.
  • GOP/file-size hazard from smaller default GOP: Good catch. I haven't re-run the chunk-scaling benchmark with output-bytes measurement, so I can't put a number on it yet. As a defensive doc change I added a paragraph on the chunkSize?: number docstring noting that effectiveChunkSize drives LockedRenderConfig.gopSize, and callers who optimize for output bytes (rather than wall-clock parallelism) should pass an explicit chunkSize matching their target GOP. Real output-bytes measurement is worth a follow-up PR if someone hits a meaningful regression in the wild.
  • resolveChunkPlan docstring formula stale: Rewrote it to reference resolvedChunkSize explicitly, with the auto-size + chunkCount + effectiveChunkSize lines as a three-step pseudo-code block.

Not addressed

  • MIN_CHUNK_SIZE benchmark cite at source: Per a saved feedback rule I'm trying to keep dates and internal-track names out of source comments (they rot fast and become "why is this here?" noise within months). The PR body carries the 2026-05-17 cite, and a future revisit can recover it from git blame → PR. Happy to add it back if you'd rather lean toward in-source provenance — let me know your preference and I'll fold it in.

CI shard-3 — agreed, pre-existing on main. I'll keep an eye on the next green main and re-baseline if the merge order requires it.

@jrusso1020 jrusso1020 merged commit 2fd161b into main May 18, 2026
35 of 43 checks passed
@jrusso1020 jrusso1020 deleted the feat/auto-size-chunk-size-when-undefined branch May 18, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants