Skip to content

fix(studio): fix capture button silent failures and broken CLI seek#904

Merged
miguel-heygen merged 3 commits into
mainfrom
worktree-fix+studio-capture-button
May 16, 2026
Merged

fix(studio): fix capture button silent failures and broken CLI seek#904
miguel-heygen merged 3 commits into
mainfrom
worktree-fix+studio-capture-button

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 16, 2026

Summary

Fixes #902 — the Studio Capture button silently hung on Linux because the CLI server's thumbnail browser was launched in beginframe mode.

Root cause: On Linux, acquireBrowser() defaults to beginframe mode (--enable-begin-frame-control) which makes page.screenshot() hang indefinitely — beginframe mode expects CDP HeadlessExperimental.beginFrame commands, not Puppeteer's Page.captureScreenshot. The thumbnail code uses page.screenshot(), so it deadlocked.

Fix: Pass forceScreenshot: true to acquireBrowser so the thumbnail browser always uses screenshot-compatible Chrome flags.

Reproduced and verified on a Linux devbox (Ubuntu x64, Chrome headless shell 131): thumbnail endpoint hung >30s before the fix, returns a valid 607KB PNG in <1s after.

Additional fixes in this PR

  • Wrap entire click handler in try-catch — errors from waitForPendingDomEditSaves() or URL construction were unhandled promise rejections with no toast
  • Add timeouts — 30s fetch timeout (AbortController) and 5s save-queue drain timeout prevent permanent hangs
  • Fix CLI server + snapshot seek logic — used __timeline (singular, doesn't exist) instead of __timelines (plural); also used .seek() instead of .pause(t) and skipped the GSAP ticker kick. Aligned with the Vite adapter's working implementation. Applied to both studioServer.ts and snapshot.ts.
  • Add server-side error logginggetThumbnailBrowser and generateThumbnail catch blocks were silent; Chrome launch failures and screenshot errors are now logged
  • Parse JSON error body — toast now shows the actual server error message instead of generic "Capture failed (500)"

Test plan

  • Reproduced on Linux devbox: thumbnail endpoint hangs >30s (unpatched)
  • Verified on Linux devbox: thumbnail returns valid 607KB PNG in <1s (patched)
  • Verified on macOS: capture still works (no regression)
  • Open Studio on Linux, seek to a frame, click Capture → PNG downloads
  • hyperframes snapshot produces correct frame captures

The Capture button could silently fail with no user feedback due to
several compounding issues:

- The click handler's try-catch only covered the fetch call, leaving
  waitForPendingDomEditSaves() and URL construction unprotected. Any
  error there became an unhandled promise rejection with zero UI
  feedback. Wrap the entire handler body in try-catch.

- No timeout on the fetch or save-queue drain, so a hung server or
  stuck save queue caused the button to appear permanently broken.
  Add a 30s AbortController timeout on the fetch and a 5s race
  timeout on waitForPendingDomEditSaves.

- The CLI server's thumbnail seek used `__timeline` (singular) which
  doesn't exist — the runtime registers `__timelines` (plural). Also
  used `.seek()` instead of `.pause(t)` and didn't kick the GSAP
  ticker. Align with the Vite adapter's working seek logic.

- The CLI server's getThumbnailBrowser and generateThumbnail catch
  blocks swallowed all errors silently — Chrome launch failures and
  screenshot errors were invisible. Add console.warn logging.

- Parse the JSON error body from the server so the toast shows the
  actual message ("Chrome browser may not be available") instead of
  just "Capture failed (500)".

Closes #902
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — defense-in-depth fix that converts silent failures into surfaced errors at every layer.

Audited

Client error surfacing (useFrameCapture.ts):

  • Whole click handler now wrapped in try-catch → toast on any error. Previously waitForPendingDomEditSaves() rejection or URL construction throw was unhandled. ✓
  • 30s fetch timeout via AbortController with proper clearTimeout cleanup in both success + error paths. AbortError translated to user-readable "Capture timed out" toast. ✓
  • 5s save-queue timeout via Promise.race(waitForPendingDomEditSaves, timeoutReject). ✓
  • Server JSON error body parsed via inner try/catch — toast shows the actual server message instead of generic "Capture failed (500)". The nested catch correctly falls back to the default message on non-JSON responses. ✓

Server-side seek bug (studioServer.ts) — this is the load-bearing fix:

// OLD (broken):
else if (win.__timeline?.seek) {
  win.__timeline.pause();
  win.__timeline.seek(t);
}

// NEW (working):
else if (w.__timelines) {
  for (const tl of Object.values(w.__timelines)) {
    tl?.pause?.(t);
  }
  w.gsap?.ticker?.tick?.();
}

Three corrections in one:

  1. __timeline (singular, doesn't exist) → __timelines (plural, the actual global)
  2. .seek(t).pause(t) — GSAP's standard "seek and stop at time" pattern (matches hf#881's parent-audio-pause-on-seek shape)
  3. Added gsap?.ticker?.tick?.() to force render — without this, pausing-at-t doesn't actually advance the visual to time t until the next rAF tick

Aligns with the Vite adapter's working implementation per body. The __timelines global is what landed in hf#849+ and we've been working with across the recent studio PRs; the CLI thumbnail path was running stale code targeting the original singular form. ✓

Readiness check tightened — was !!__timelines || !!__playerReady; now requires !!__timelines && Object.keys(__timelines).length > 0. Avoids the race where the __timelines global exists but no timelines have registered yet. ✓

document.fonts?.ready await — added before screenshot to prevent font-flash captures. Standard pattern for web-font-stable screenshots. ✓

Server-side error logginggetThumbnailBrowser and generateThumbnail both moved from silent catch {} to console.warn with the error message. Operators running hyperframes preview will now see browser-launch failures and screenshot errors in their terminal instead of a mysterious null return. ✓

Actionable error message (thumbnail.ts) — generic "Thumbnail generation returned null" → "Thumbnail generation failed — Chrome browser may not be available". Lines up with the Linux-reporter context where the most likely failure mode is "no Chrome installed" rather than "Chrome failed mid-render." ✓

Body claim verification

  • "Wrap entire click handler in try-catch"
  • "30s fetch timeout (AbortController) and 5s save-queue drain timeout"
  • "Fix CLI server seek logic — used __timeline (singular, doesn't exist) instead of __timelines (plural)" ✓ verified via diff
  • "Aligned with the Vite adapter's working implementation"__timelines + pause(t) matches what we've been reviewing on the studio path
  • "Add server-side error logging"
  • "Parse JSON error body"
  • "Could not reproduce the original bug on macOS" — honest framing; the Linux-specific path (likely Chrome-not-installed) gets the actionable error message, and the seek fix is verified-by-inspection independent of the reporter's specific failure mode

Non-blocking notes

  1. Save-queue-timeout semantics — the Promise.race(saves, timeoutReject) rejects on timeout, which flows to the outer catch → toast → capture aborts. The body's test plan reads "capture proceeds after save completes (or after 5s timeout)" — the second clause is ambiguous but the implementation aborts on timeout, not "proceeds without saved state." Worth a body clarification: "capture aborts with a toast after 5s if saves don't complete."

  2. CLI server's seek implementation now manually kept in sync with the Vite adapter — same shape as the canonical vite.adapter.ts + studioServer.ts duplicate-implementation problem we've seen on prior PRs. If the Vite adapter's seek pattern changes again (different pause arg, different ticker call), the CLI server will silently re-drift. A shared helper (e.g., executeThumbnailSeek(page, time) in core) would prevent the next divergence. Follow-up scope.

  3. for (...of Object.values(w.__timelines)) pauses ALL timelines uniformly — same call to every registered timeline. For multi-composition projects where only the active comp's timeline should be seeked, this could leave non-active timelines in unexpected states. In practice the screenshot only renders what's visible, so the side-effect is invisible. Worth a comment noting "we pause all timelines because GSAP's tick advances all of them; non-target timelines' pause state is irrelevant for the screenshot."

  4. await page.evaluate("document.fonts?.ready") as a string vs function — works for simple expressions but function form (await page.evaluate(() => document.fonts?.ready)) is type-safer. Cosmetic.

CI

Test, Typecheck, Lint, Build, Format, CodeQL, Test: runtime contract, CLI smoke (required), Smoke: global install, Preview parity, preview-regression, player-perf all green on this commit. Tests/Render on windows-latest + regression-shards in-progress (older runs cancelled when newer commits triggered re-runs). No failures.

Review by Rames Jusso (pr-review)

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two real bug-fixes in one PR (capture-button error swallowing + CLI thumbnail seek). Both fixes are correct; flagging contract-drift and a sibling site that still has the seek bug as important follow-ups, not blockers.

Strengths

  • useFrameCapture.ts:51-69AbortController + 30s timeout + JSON-error-body parse is the right shape. The toast now surfaces the server's actual message instead of Capture failed (500). Clean.
  • useFrameCapture.ts:37-42 — bounding waitForPendingDomEditSaves() with Promise.race against a 5s reject prevents the indefinite-hang failure mode the issue report describes.
  • studio-api/routes/thumbnail.ts:99-105 — the 500 body now tells the operator why (Chrome unavailable). Paired with studioServer.ts:158-163 console.warn, on-call has a real signal.

Important (should fix, not blocker — fine to land first and follow up)

  • packages/cli/src/commands/snapshot.ts:240-256 has the same bug this PR is fixing in studioServer.ts, and the PR doesn't touch it. The shape there is tls[key].pause(); tls[key].seek(t); — same .seek()-instead-of-.pause(t) pattern, against the same __timelines registry. Anyone running hyperframes snapshot against a composition without __player registered will hit identical broken-seek behavior. PR body says "aligned with the Vite adapter's working implementation" — that's true for the studio server, but the snapshot CLI command is the sibling site that also satisfies the same contract.

  • Three-way contract drift. packages/studio/vite.thumbnail.ts:1-37 already exports seekThumbnailPreview() — a shared, tested helper (vite.thumbnail.test.ts:1-58 covers both the __player and __timelines paths). The CLI now duplicates that logic inline at studioServer.ts:327-340 rather than importing it. With this PR landed, three independent implementations of the same seek protocol exist: vite.thumbnail.ts, studioServer.ts (new inline copy), and snapshot.ts. When the player/timeline contract evolves, two of three will silently drift — which is the exact bug class this PR is fixing. Suggest a follow-up that lifts the helper to a shared location (e.g. @hyperframes/core/studio-api) and has all three sites import it.

  • No new test coverage. The PR adds error-handling on the most-visible UI button and rewrites a Puppeteer evaluate block — neither has a test. studioServer.test.ts only covers loadRuntimeSource. useFrameCapture.ts has no test file. At minimum, a unit test on the click handler asserting the toast fires on !response.ok and on AbortError would catch a future regression of this exact issue. The Vite-side seekThumbnailPreview() has unit tests; the CLI-side copy doesn't.

Nits

  • studioServer.ts:323 uses (window as any).__timelines in the waitForFunction predicate but the very next evaluate (:328) uses a properly typed Window & shape. Worth aligning.
  • studioServer.ts:343await page.evaluate("document.fonts?.ready") (string form) is the deprecated puppeteer API. The function form is preferred and matches the surrounding evaluates.
  • PR title/body says "broken CLI seek on linux" but nothing in the diff is Linux-gated. The bug reproduces anywhere __player isn't registered (every standalone composition page). The Linux framing comes from the bug reporter's environment, not from the fix shape — would clarify the body so future readers don't grep for a process.platform gate.

Verdict: APPROVE
Reasoning: The fixes are correct, CI is green on required checks, and the silent-failure path is materially improved. The snapshot-sibling and helper-extraction are real follow-ups but not gating this PR.

Review by Vai

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-correction acknowledging Vai's three importants — the snapshot.ts miss is the real one and exactly the Rule-2 pattern I have a memory about.

What I missed

snapshot.ts:240-256 has the same buggy tls[key].pause(); tls[key].seek(t); pattern

This is the textbook "grep every implementation on contract widening" case from my own memory. I even flagged the duplicate-implementation concern in my non-blocking notes:

"CLI server's seek implementation now manually kept in sync with the Vite adapter — same shape as the canonical vite.adapter.ts + studioServer.ts duplicate-implementation problem we've seen on prior PRs."

But I only enumerated TWO sites (vite adapter, studioServer). Vai grep'd the codebase and found snapshot.ts — a third site running the same buggy pattern. The snapshot subcommand will reproduce the same silent failure this PR fixes for the thumbnail path.

The trigger I had was the PR body's own framing: "Aligned with the Vite adapter's working implementation." That's the exact signal that says "this is a contract being mirrored across multiple consumers — find them all." I noted it as a future-proofing concern but didn't actually grep. Vai did.

Important 2 (DRY — three independent seek impls) — convergent with my "shared helper" note but Vai is right that the fix scope should include extraction now, not as a "follow-up." With three sites diverging, the maintenance debt has compounded enough that the next regression bisect becomes painful. A seekThumbnailPreview(page, time) helper in core (shared between vite.thumbnail.ts, studioServer.ts, snapshot.ts) is the right shape and would have prevented this PR from being only a partial fix.

Important 3 (zero new tests) — I didn't flag this

The error-handling rewrite in useFrameCapture.ts is precisely the kind of code that needs unit tests:

  • AbortController fires → translates to "Capture timed out" toast
  • Save-queue race rejects → outer catch fires toast with the right message
  • JSON error body parses → toast uses server message
  • Non-JSON error body → fallback message used

The test plan checkboxes are all manual. None of those behaviors are pinned in CI. Worth at least covering the "AbortError + clearTimeout cleanup" path since that's the most subtle.

Nit (Linux framing in body) — I accepted it without questioning

I noted: "Miguel couldn't repro on macOS — the seek fix is verified by code inspection, the error-surfacing improvements help diagnose whatever Linux is actually doing."

Vai's correction: nothing in the diff is process.platform-gated. The bug repros anywhere __player isn't registered — browser tab not focused, iframe not yet loaded, CDP context teardown. The Linux framing was a red herring driven by the reporter's environment, not the actual bug surface. I should have read the diff more critically rather than accepting the body's narrative. Worth a body edit so future bisects don't chase a Linux-specific cause that isn't there.

Calibration

Three real misses, weighted heavily by the snapshot.ts gap. My memory pattern is right, my application of it was incomplete — I noted the duplicate concern but didn't grep. Treating my prior APPROVE as informational alongside Vai's APPROVE-with-important-followups. Happy to re-approve once snapshot.ts is patched and (ideally) the shared helper is extracted.

Lesson reinforced: when a PR body says "Aligned with [other implementation]", that's the exact trigger to git grep for ALL implementations of the contract — not just the two named ones — before approving.

Self-correction by Rames Jusso (pr-review)

- Fix snapshot.ts seek logic: same __timeline→__timelines + .pause(t)
  + gsap ticker kick fix as studioServer.ts (caught by Vai's review)
- Use typed Window shape in waitForFunction instead of (window as any)
- Use function-form page.evaluate for document.fonts?.ready
Root cause: on Linux, acquireBrowser defaults to beginframe mode
(--enable-begin-frame-control) which makes page.screenshot() hang
indefinitely — beginframe mode expects CDP HeadlessExperimental.beginFrame
commands, not Puppeteer's Page.captureScreenshot.

Pass forceScreenshot: true and captureMode: "screenshot" so the
thumbnail browser always uses screenshot-compatible Chrome flags.

Reproduced on Linux devbox: thumbnail endpoint hung >30s with
beginframe flags; returns a valid PNG instantly in screenshot mode.
@miguel-heygen miguel-heygen merged commit 4e0034f into main May 16, 2026
39 of 47 checks passed
@miguel-heygen miguel-heygen deleted the worktree-fix+studio-capture-button branch May 16, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(studio) Capture button not working — no PNG produced, no error in UI

3 participants