Skip to content

feat: CLI observability + fix studio save failures on JS-created elements#1091

Merged
miguel-heygen merged 11 commits into
mainfrom
worktree-feat+cli-observability
May 27, 2026
Merged

feat: CLI observability + fix studio save failures on JS-created elements#1091
miguel-heygen merged 11 commits into
mainfrom
worktree-feat+cli-observability

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented May 27, 2026

Summary

  • Fix studio save failures on JS-created elements: The studio visual editor now probes the source HTML at selection time via a new POST /file-mutations/probe-element/ endpoint. Elements that only exist at runtime (created by composition scripts) are marked as non-editable with a clear message instead of silently failing on save.

  • CLI error/crash telemetry: Added uncaughtException and unhandledRejection handlers that emit cli_error events, plus cli_command_result events tracking per-command success/failure and duration.

  • Enrich save_failure telemetry: Added target_id, target_selector, and target_source_file to save failure events for better debugging.

Test plan

  • probeElementInSource unit tests (9 cases including JS-created element scenario)
  • Existing patchElementInHtml and removeElementFromHtml tests still pass
  • domEditing.test.ts updated for async selection flow
  • Full build passes (core, studio, cli, engine, producer)
  • Lint + format clean on all changed files
  • Manual: open studio, select a script-generated element, confirm editing is disabled with reason message
  • Manual: trigger a CLI crash, confirm cli_error event fires

Make `resolveDomEditSelection` async and wire a `probeSourceElement` call
into the selection path so elements generated by scripts (not present in the
source HTML) are detected early and have all edit capabilities disabled with
a clear reason message ("This element is generated by a script and cannot be
edited visually.").

Part A – core probe logic:
- `domEditingLayers.ts`: `resolveDomEditSelection` is now async; calls
  `probeSourceElement` (POST /api/projects/:id/file-mutations/probe-element/:file)
  when `projectId` is supplied and the element has a stable id/selector.
  `existsInSource: false` flows into `resolveDomEditCapabilities`, which
  disables all write capabilities with the appropriate reason.
- `domEditingLayers.ts`: `refreshDomEditSelection` promoted to async.
- `files.ts`: new `probe-element` route; extracted `resolveProjectPath`,
  `resolveFileMutationContext`, `writeIfChanged`, and `parseMutationBody`
  helpers to eliminate repeated boilerplate across remove/patch/probe handlers.

Part B – caller propagation (all eight consumer sites):
- `useDomSelection.ts`: `buildDomSelectionFromTarget`,
  `resolveDomSelectionFromPreviewPoint`,
  `buildDomSelectionForTimelineElement`, `handleTimelineElementSelect`,
  `refreshDomEditSelectionFromPreview`, and
  `refreshDomEditGroupSelectionsFromPreview` all made async; `projectId`
  forwarded into `resolveDomEditSelection`.
- `useDomEditCommits.ts`, `useDomEditTextCommits.ts`: updated
  `buildDomSelectionFromTarget` parameter type; added `await` at call sites.
- `useDomEditSession.ts`: inner `syncSelectionFromDocument` made async; fire
  with `void` to satisfy the surrounding effect.
- `usePreviewInteraction.ts`: `handlePreviewCanvasMouseDown` and
  `handlePreviewCanvasPointerMove` made async (React ignores handler return
  values, so this is safe).
- `useStudioUrlState.ts`: deferred `buildDomSelectionFromTarget` call
  converted to `.then()` chain with `void` prefix so the effect stays sync.
- `LayersPanel.tsx`: `seekToLayer`, `handleSelectLayer`, and
  `handleLayerHover` made async.
- `DomEditOverlay.tsx` / `useDomEditOverlayGestures.ts`: `onCanvasPointerMove`
  return type widened to `Promise<DomEditSelection | null>`; pointer-down
  handler falls back to `hoverSelectionRef.current` (always populated by a
  prior hover) instead of awaiting the async move callback inline.

Part C – test and tooling fixes:
- `lefthook.yml`: filesize hook shell loop explicitly skips `*.test.ts/tsx`
  files as a guard against a lefthook v2.1.6 bug where `exclude` patterns are
  not applied to `{staged_files}` in shell scripts.
- `domEditing.test.ts`: all `it()` blocks calling `resolveDomEditSelection`
  made async with `await`.
- `DomEditOverlay.test.ts`: mock updated to return `Promise.resolve(selection)`
  and `hoverSelection` pre-seeded so pointer-down test works with the new
  hover-first path.
- `studioUrlState.test.ts`: `buildDomSelectionFromTarget` mocks wrapped in
  `Promise.resolve()`; seek/selection hydration test made async with
  `await act(async () => { await Promise.resolve(); })` to flush microtasks.
Register process-level uncaughtException and unhandledRejection handlers
that fire trackCliError so unhandled crashes are captured in telemetry.
Add the trackCliError function to events.ts and re-export it from the
telemetry barrel.
Comment on lines +295 to +302
const response = await fetch(
`/api/projects/${projectId}/file-mutations/probe-element/${encodeURIComponent(sourceFile)}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ target }),
},
);
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff. Strong foundation on the studio-side fix — the server probe approach is the right call — but there are three blockers on the CLI observability piece.

Audited: packages/cli/src/cli.ts, packages/cli/src/telemetry/events.ts, packages/cli/src/telemetry/client.ts (end-to-end), packages/cli/src/telemetry/index.ts, packages/core/src/studio-api/routes/files.ts, packages/core/src/studio-api/helpers/sourceMutation.ts + test, packages/studio/src/components/editor/domEditingLayers.ts, packages/studio/src/hooks/useDomSelection.ts, packages/studio/src/hooks/usePreviewInteraction.ts.


Strengths

  • routes/files.tsresolveProjectPath / writeIfChanged / parseMutationBody extraction is clean. Three route handlers that were 40-line copies are now 6-line call sites. Good deduplication.
  • sourceMutation.test.ts:169–232 — nine test cases covering the JS-created-element scenario that motivated the bug. The affirmative pin (id: "arrows-svg" absent from source, id: "canvas" present) directly covers the failure mode.
  • Routing the "is editable" decision server-side via a source probe rather than client-side heuristics is the right architecture; it makes the source file the single source of truth.

Blockers

[blocker] uncaughtException handler suppresses the crash without re-exiting — cli.ts:57–71

When you register a listener on uncaughtException, Node stops its default crash behavior. The listener you added queues a telemetry event asynchronously and then returns. The process is now alive in an indeterminate state — it will eventually drain to beforeExit and emit cli_command_result with success: true for a command that threw an unhandled exception. The standard pattern is to queue telemetry synchronously (which trackEvent supports since client.ts buffers), let the exit handler's flushSync() dispatch it, and then call process.exit(1):

process.on("uncaughtException", (error) => {
  // trackEvent is synchronous — it buffers; flushSync on exit picks it up
  trackCliError({ ..., kind: "uncaught_exception" });
  process.exit(1);
});

Same fix for unhandledRejection (cli.ts:73–88). Without process.exit(1), the crash is silenced and callers (shell scripts, CI) see exit code 0.

[blocker] cli_command_result.success is always truecli.ts:183

The beforeExit handler hardcodes success: true. There is no call site in the diff that ever emits success: false. The PR description says "per-command success/failure tracking" but this field is a constant in the current implementation. A cli_command_result event with a boolean success field that's always true will produce misleading dashboards — any funnel or success-rate query built on it will show 100% success. Either:

  • Wire the actual exit code through (the process exit code is available via the exit event's code argument), or
  • Drop the success field from trackCommandResult until failure-path emission is wired up.

[blocker] Format CI failing on .fallowrc.jsonc — the block added in this PR

oxfmt --check found a formatting issue in .fallowrc.jsonc (the domEditingLayers exemption block you added). CI log: Format issues found in above 1 files. Run without --check to fix. Fix: bun run format and re-push.


Important

[important] Hover-driven probe requests — usePreviewInteraction.ts:83–98, domEditingLayers.ts:333

resolveDomEditSelection is now async and performs a POST /file-mutations/probe-element/ call. This is wired into handlePreviewCanvasPointerMove, which fires on every pointer move over the canvas. That means a network round-trip per mouse-move event. Two concrete failure modes:

  1. Request flood: fast mouse movement queues dozens of in-flight requests simultaneously.
  2. Out-of-order resolution: a slow probe for element A resolves after the user has already moved to element B, setting existsInSource from the wrong response.

The probe should be debounced (or use an AbortController to cancel the prior request), and should be limited to pointer-settle or selection-commit time rather than every pointermove event.

[important] stack_trace contains filesystem paths — events.ts:116

Node stack frames embed absolute filesystem paths (e.g. /Users/<username>/... or /home/<username>/...). These are sent to PostHog as stack_trace and will be tied to person records via distinct_id. The PR description and client.ts both say "no personal info or file paths collected" — this contradicts that promise. Strip or sanitize the path component before sending. A regex like /\/(Users|home)\/[^\/]+\//g/<user>/ would protect the majority of cases.

[important] No tests for trackCliError / trackCommandResultevents.ts:108–136

The PR adds test coverage for probeElementInSource (9 cases) but none for the two new telemetry functions. These are the primary deliverable of the "CLI observability" half of the PR. Minimal coverage: verify that trackCliError calls trackEvent("cli_error", ...) with truncated error_message and stack_trace, and that trackCommandResult maps durationMsduration_ms.


Nits

[nit] commandStart declared after the handlers that reference it — cli.ts:238

commandStart is declared at the bottom of the file but captured in closures registered earlier (lines 183, 64, 79). This works at runtime via closure semantics, but it reads as a TDZ violation on first glance. Move the declaration to the top of the file alongside command.

[nit] unhandledRejection handler wraps non-Error reasons — cli.ts:75

new Error(String(reason)) for non-Error rejection reasons loses the original value's structure and produces low-quality error_name: "Error". Consider reason?.constructor?.name ?? "UnhandledRejection" for the name field.


CodeQL SSRF (#416): The bot flagged probeSourceElement in domEditingLayers.tsprojectId and sourceFile come from selection state derived from iframe DOM content. The URL is built client-side and hits the local studio API server (same origin). The server side validates with isSafePath and \0-check, which is the real defense. A comment in probeSourceElement explaining that sourceFile is resolved against the project directory server-side (not caller-controlled) would let the finding be closed cleanly.

Verdict: REQUEST CHANGES
Reasoning: Two blockers on the uncaughtException handler (crash suppression + success: true always) mean the CLI observability code is incorrect as shipped. Format CI also needs a fix before merge. Studio-side fix is solid.

— Vai

- uncaughtException handler now calls process.exit(1) after flushing
- cli_command_result uses real exit code from process "exit" event
- drop stack_trace from cli_error (contains filesystem paths)
- skip source probe during hover — only probe on click/selection
- format .fallowrc.jsonc
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing after author's blocker fixes. Format CI is green, probe-element debounce is addressed. Two of three original blockers are resolved; one has a new correctness bug, and I'm raising a related one that slipped through.


Resolved

cli_command_result.success hardcoded true — now uses code === 0 && !commandFailed, wired to real exit code. Fixed.

✅ Format CI failingFormat: pass in the latest run. Fixed.

stack_trace filesystem paths to PostHogtrackCliError sends only error_name and error_message; no stack field at all. Concern is moot.

✅ Every pointer-move triggering POST /file-mutations/probe-element/handlePreviewCanvasPointerMove passes skipSourceProbe: true. Hover path never hits the endpoint. Fixed.


Blockers

1. Dynamic import() inside process.on("exit") and uncaughtException is dead code — telemetry never fires

cli.ts — exit handler and uncaughtException handler

Both new telemetry calls use the same pattern:

process.on("exit", (code) => {
  try {
    import("./telemetry/index.js")   // async — schedules a microtask
      .then((mod) => { mod.trackCommandResult(...); })
      .catch(() => {});
  } catch { }
  _flushSync?.();
});

process.on("exit") handlers are synchronous only. Node.js will not drain the microtask queue between the import() call and process teardown — the .then() callback never runs. I verified this empirically: a console.log inside .then() after process.exit(0) is never printed.

The same applies to uncaughtException:

process.on("uncaughtException", (error) => {
  import("./telemetry/index.js").then((mod) => { mod.trackCliError(...); });
  _flushSync?.();
  process.exit(1);   // exit fires before import resolves
});

Both trackCliError and trackCommandResult are effectively unreachable as written.

Fix options:

  • Eager-import the telemetry module at the top of cli.ts (or at initialization time) so it's already resolved by the time the handlers fire. The lazy-load comment below (showUsage) is about avoiding allocating help data — telemetry initialization is a different concern and eager loading is fine.
  • Or: restructure the handlers so they call the already-resolved module synchronously.

The _flushSync path for PostHog batching is fine (it's synchronous). It's only the new trackCliError / trackCommandResult calls that never fire.


Nits (non-blocking)

  • commandStart = Date.now() is declared after runMain is called at the end of the file, but hoisted by const semantics. The timing is actually fine because const isn't hoisted like var, but the declaration appearing after the event handler registrations that reference it is confusing. Moving const commandStart = Date.now() to the top of the file (or at least before the handler registrations) would make the intent clearer.

  • unhandledRejection handler sets commandFailed = true but does not call process.exit(1). This is intentional per Node.js semantics (unhandled rejections don't always mean the process should exit) — just worth a comment explaining why, since the uncaughtException counterpart does exit.


— Vai

@miguel-heygen miguel-heygen merged commit f19d6fd into main May 27, 2026
45 checks passed
@miguel-heygen miguel-heygen deleted the worktree-feat+cli-observability branch May 27, 2026 05:44
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker resolved. _trackCliError and _trackCommandResult are now module-level let variables assigned in the existing import(...).then() initializer — same eager-capture pattern as _flushSync. The exit and uncaughtException handlers call the pre-resolved references synchronously; no dynamic import() inside the handlers. Fix is correct.

— Vai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants