test: regenerate style-1-prod and style-12-prod baselines#925
Merged
Conversation
Sub-comp visibility fix (PR #918) changed rendered output for these two tests but the baselines on main were stale. Regenerated inside Dockerfile.test to match CI's Chrome + ffmpeg build. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This was referenced May 17, 2026
pull Bot
pushed a commit
to bhardwajRahul/hyperframes
that referenced
this pull request
May 18, 2026
… frame Two narrow fixes to keep the regression suite green and reproducible. Stale baselines from the sub-composition refactor (PR heygen-com#918) are being regenerated separately in PR heygen-com#925; this PR is just the structural fixes that PR can't make on its own. 1. **Pin `chrome-headless-shell` in `Dockerfile.test`** to `148.0.7778.167` instead of `@stable`. `@stable` is a moving tag; every Chrome stable promotion shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pinning to the version `@stable` currently resolves to (matching what main's regenerated baselines were captured under) makes Chrome bumps an explicit, batched-with-baseline-regen action. Comment on the `RUN` line spells out the bump procedure. 2. **Clamp the last PSNR checkpoint to a frame the video stream actually contains.** `runTestSuite` samples 100 checkpoints across `min(rendered, snapshot)` container duration. Container duration includes audio padding past the last video frame — many-cuts is 5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 × 30)), one past the last frame the stream contains. ffmpeg's `psnr` filter emits no `average:` line for a non-existent frame, so the harness crashed with `Unable to parse PSNR output at 5.59746s` — pre-existing on plain `origin/main`, which PR heygen-com#918 admin-merged through on shard-2. Miguel's regen via `--update` didn't catch it because `--update` only writes the snapshot; it doesn't validate. Subtracting one frame interval from the sampling duration guarantees the last checkpoint always lands on a real frame. Verified locally inside `Dockerfile.test`: bun run --cwd packages/producer docker:build:test bun run --cwd packages/producer docker:test many-cuts # ✅ green bun run --cwd packages/producer docker:test style-3-prod \ style-5-prod sub-composition-video # ✅ green
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
style-1-prodandstyle-12-prodregression baselines insideDockerfile.testTest plan