Skip to content

test: regenerate style-1-prod and style-12-prod baselines#925

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/regenerate-style-baselines
May 17, 2026
Merged

test: regenerate style-1-prod and style-12-prod baselines#925
miguel-heygen merged 1 commit into
mainfrom
fix/regenerate-style-baselines

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 17, 2026

Summary

Test plan

  • CI regression shards pass (style-1-prod in shard-2, style-12-prod in shard-6)

Sub-comp visibility fix (PR #918) changed rendered output for these two
tests but the baselines on main were stale. Regenerated inside
Dockerfile.test to match CI's Chrome + ffmpeg build.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@miguel-heygen miguel-heygen merged commit 20a6ff6 into main May 17, 2026
50 of 70 checks passed
@miguel-heygen miguel-heygen deleted the fix/regenerate-style-baselines branch May 17, 2026 23:08
pull Bot pushed a commit to bhardwajRahul/hyperframes that referenced this pull request May 18, 2026
… frame

Two narrow fixes to keep the regression suite green and reproducible.
Stale baselines from the sub-composition refactor (PR heygen-com#918) are being
regenerated separately in PR heygen-com#925; this PR is just the structural
fixes that PR can't make on its own.

1. **Pin `chrome-headless-shell` in `Dockerfile.test`** to
   `148.0.7778.167` instead of `@stable`. `@stable` is a moving tag;
   every Chrome stable promotion shifts pixel output enough to fail
   PSNR on the golden baselines, so the regression suite silently
   broke whenever Docker.test rebuilt against a freshly-promoted
   stable. Pinning to the version `@stable` currently resolves to
   (matching what main's regenerated baselines were captured under)
   makes Chrome bumps an explicit, batched-with-baseline-regen
   action. Comment on the `RUN` line spells out the bump procedure.

2. **Clamp the last PSNR checkpoint to a frame the video stream
   actually contains.** `runTestSuite` samples 100 checkpoints across
   `min(rendered, snapshot)` container duration. Container duration
   includes audio padding past the last video frame — many-cuts is
   5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99
   the raw container duration mapped to time 5.59746s → frame index
   168 (round(5.59746 × 30)), one past the last frame the stream
   contains. ffmpeg's `psnr` filter emits no `average:` line for a
   non-existent frame, so the harness crashed with `Unable to parse
   PSNR output at 5.59746s` — pre-existing on plain `origin/main`,
   which PR heygen-com#918 admin-merged through on shard-2. Miguel's regen via
   `--update` didn't catch it because `--update` only writes the
   snapshot; it doesn't validate. Subtracting one frame interval
   from the sampling duration guarantees the last checkpoint always
   lands on a real frame.

Verified locally inside `Dockerfile.test`:

  bun run --cwd packages/producer docker:build:test
  bun run --cwd packages/producer docker:test many-cuts   # ✅ green
  bun run --cwd packages/producer docker:test style-3-prod \
    style-5-prod sub-composition-video                    # ✅ green
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant