Skip to content

chore(release): v0.6.1 — develop → main catch-up + AXSecureTextField paste fix (#761)#762

Merged
shaun0927 merged 358 commits into
mainfrom
release/0.6.1
May 17, 2026
Merged

chore(release): v0.6.1 — develop → main catch-up + AXSecureTextField paste fix (#761)#762
shaun0927 merged 358 commits into
mainfrom
release/0.6.1

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

Release 0.6.1 — develop → main catch-up + secure-field paste fix

OpenSafari 0.6.1 is a catch-up release. v0.6.0 was tagged on 2026-05-16 but never merged to main and never published to npm (npm view opensafari-mcp versions confirms 0.5.0 was the latest published until this PR). Rather than ship a separate 0.6.0 → 0.6.1 pair, this PR rolls the entire 0.6.0 body forward together with one targeted pasteboard fix uncovered after the 0.6.0 tag (#761).

This PR contains 356 commits from develop (= the v0.6.0 release body + #761). No breaking changes since 0.5.0.

Headline

Themes (full per-PR notes in CHANGELOG.md [0.6.1] section)

Area Highlights
Reliability batch Boot / lifecycle / network (#752, #753, #754, #755, #756, #757)
Decomposition WebKit (#706 1/5–5/5), Simulator (#708 1/4–4/4), Input (#707 a/b), Protocol typing (#710 a/b), DOM-input (#709)
Performance CLI lazy-load (#700), WebKit fast paths (#702), native-input batching (#705), proxy readiness split (#701), bootstatus-aware polling (#703), Web Inspector socket cache (#704)
Security HTTP MCP transport hardening (#714 series), auth profile persistence (#716), audit log retention (#711, #717)
Bug fixes DOM input dispatch (#726), ax-bridge coord scaling (#693 WU3 #720), ax-bridge content sizing (#693 WU3-prep #695, #693 WU1 #694), pasteboard secure-field (#761)
Features feat(tap) macOS-pt → iOS-pt scaling (#693 WU3), feat(ax-bridge) --debug machine-readable stderr (#660), TRANSITIONAL_STATE_TIMEOUT + --max-settle-retries (#46)
CI / DX Separate runtime/dev audits (#712, #718), no-explicit-any enforcement on migrated paths (#710 b)

Migration notes (since 0.5.0)

  • No tool-name or schema changes. All app_* / webkit_* / bridge tools keep parameter shapes and response envelopes.
  • app_type_element response shape gains an optional secureField: true field on the pasteboard backend when the focused element is an AXSecureTextField. Callers that ignore unknown fields are unaffected.
  • verify: false parameter on app_type_element is now honoured on both backends (auto/simhid and pasteboard). Callers relying on the previous silent-no-op on pasteboard should review their flows.
  • HTTP MCP transport tightens defaults; STDIO transport (default) unaffected.

Verification

$ npx tsc --noEmit -p tsconfig.json   # exit 0
$ npx jest --config jest.config.js --testPathPatterns='tests/unit'
Test Suites: 176 passed, 176 total
Tests:       2646 passed, 2646 total
Snapshots:   6 passed, 6 total
Time:        75.6 s

Post-merge release checklist

  1. release/0.6.1main PR merged (this PR)
  2. ⏳ Tag v0.6.1 on the merge commit and push:
    git tag -a v0.6.1 -m "Release OpenSafari MCP 0.6.1" <merge-sha>
    git push origin v0.6.1
  3. npm publish from main (handled by user out-of-band)
  4. ⏳ Backmerge maindevelop so develop's [Unreleased] lines up with main's [0.6.1] history

Closes #760 (via #761 already merged to develop).

junghwan-oss and others added 30 commits April 20, 2026 16:42
…ication

fix(#39): Tier-3 readback verification for app_type_element
Round-trip through the simulator pasteboard + Cmd+V via sim-hid to bypass
the software keyboard entirely. Unicode-safe (Korean/Japanese/Chinese/emoji)
regardless of the simulator's active keyboard layout — the primary failure
mode that #39 was filed against.

- Opt-in via new `backend: "pasteboard"` param on app_type_element
  (default `"auto"` preserves current simhid behaviour)
- Auto-accepts iOS 16+ paste-permission dialog using a localised label
  corpus (en/ko/ja/zh) with negative-dismiss filtering so "Don't Allow"
  variants are never tapped by accident
- Preserves original pasteboard content by default (opt-out via
  `restorePasteboard: false`)
- Exposes `keyChord(key, modifier)` on SimulatorKitHIDInputBackend for
  any future chord use (Cmd+A, Cmd+X, etc.)

Tier 1 (force-Latin keyboard) from #39 is deliberately deferred — the
pasteboard path covers every locale without requiring a layout switch,
and Tier 1's detection signal was flagged as investigative in #39's v2
addendum. Tier 3 (readback verification) will land in a follow-up.

Live-verified on iPhone 16 / iOS 26.4 against flutter-qa-app:
typing "안녕 [email protected] 🎉" produced the exact same string in the
Flutter TextField (confirmed via mobile-context textFieldLabels).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…kend

feat: pasteboard typing backend for app_type_element (#39 Tier 2)
fix(ax-bridge): recursive scored content-root search (#40, part 1/3)
test(ax-bridge): TS reference scorer + 6-fixture unit suite (#40, part 2/3)
Bumps the package to 0.6.0. Headline change is the ax-bridge
recursive scored content-root search (#40, follow-up to #4), which
closes the silent-empty-content bug on Xcode 26.4 / iOS 26.4 by
refusing to return a chrome-only tree. The raw bridge now emits a
typed DEVICE_CONTENT_ROOT_EMPTY error (exit 1) when no subtree
exposes any app-level accessibility semantics, instead of silently
falling back to the bare AXWindow.

Version bumped in package.json and package-lock.json. CHANGELOG.md
gains a [0.6.0] entry summarizing the scoring rubric, chrome
denylist, typed-error contract, and the TS reference scorer + 6
fixture unit suite that land alongside. The previous Unreleased
entries (app_tap_element coordinate fallback contract + raw
mobile-bridge context diagnostics) are carried forward into 0.6.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
chore(release): 0.6.0 — ax-bridge recursive scored content-root (#40, part 3/3)
…ly-field

fix(ax-bridge): emit chromeOnly on dump/query/inspect (#41)
…snapshot

fix(ax-bridge): wrapper single-snapshot promotion via chromeOnly (#41)
…adme

feat(ax-bridge): --help output and raw-bridge README contract (#42)
feat(alert): localized label corpus + reason enum + service hints (#43 pt.1/4)
feat(alert): isLikelyDialogButton detection predicate (#43 pt.2/4)
feat(alert): wire Tier 1 AX-scan + Tier 2 AppleScript + diagnostics (#43 pt.3/4)
…reset

feat(alert): Tier 3 permission_reset fallback (#43 pt.4/4)
)

Distinguishes "expected app is running but its UI is still loading /
stuck on a spinner" from "no AX data at all" (simulator shell crashed,
probe broken). Both currently collapse to FOREGROUND_CONTEXT_UNAVAILABLE,
so downstream callers cannot choose between "retry the navigation step"
and "abort the test run". The promotion is a wrapper-layer concern: the
surface classifier in src/tools/raw-mobile-context.ts stays surface-scoped
and never emits the new variant itself.

Option A (re-probe) implementation:
  IF first probe == FOREGROUND_CONTEXT_UNAVAILABLE
  AND --expect-bundle is set
  AND expectBundle is in runningApps
  AND --max-settle-retries >= 1
    re-probe once after another settleMs window
    if still UNAVAILABLE -> promote to TRANSITIONAL_STATE_TIMEOUT

- src/tools/raw-mobile-context.ts: extend RawMobileClassification union.
- src/tools/sim-hid-bridge-wrapper.ts (NEW): pure applyTransitionalPromotion
  helper + normalizeMaxSettleRetries. Keeps the wrapper behavior directly
  unit-testable without spawning the native bridge.
- cli/sim-hid-bridge.ts: parse --max-settle-retries <0|1|2|3> (default 1),
  wire the promotion into both the `context` command and the tap/swipe
  post-action probe path.
- tests/unit/sim-hid-bridge-wrapper.test.ts (NEW): 4-case spec from the
  issue — promotion, first-ok-wins, no-bundle-match, opt-out — plus the
  short-circuit guard and retry-range normalization.
- tests/unit/raw-mobile-context.test.ts: add empty-tree regression guard
  and a compile-time type-literal assertion so removal of the new union
  member fails the build.
- docs/headless-architecture.md, CHANGELOG.md: document the new
  classification value and --max-settle-retries flag.

--max-settle-retries 0 restores the pre-issue single-probe behavior
byte-for-byte for performance-sensitive callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…imeout

feat(sim-hid-bridge): TRANSITIONAL_STATE_TIMEOUT classification + --max-settle-retries (#46)
…MEOUT live test (#46)

Adds the deterministic integration fixture and live suite called for in
issue #46's Post-deployment verification checklist, stacked on the core
classification PR.

Fixture — tests/fixtures/flutter_spinner/:
  - lib/main.dart: minimal Flutter app that renders a spinner-only phase
    for exactly 8000ms after first build (no Scaffold, no AppBar, no
    labels — matches the AX signature sim-hid-bridge keys on), then
    swaps to a 'Content ready' Text widget.
  - Bundle ID: com.opensafari.fixtures.flutterSpinnerQa (distinct from
    the existing flutter-qa-app fixture so both can coexist on one
    simulator).
  - build.sh + README document the flutter create bootstrap and the
    flutter build / simctl install commands.

Live test — tests/integration/sim-hid-transitional.live.test.ts:
  - Gated behind OPENSAFARI_LIVE_TRANSITIONAL=1 (same pattern as
    OPENSAFARI_LIVE_SIMHID), so npm test / default integration runs stay
    green on machines without the fixture installed.
  - Test 1 (spinner phase): probing 500ms after launch returns
    classification === 'TRANSITIONAL_STATE_TIMEOUT', verified === false,
    with warnings naming the bundle and "transitional timeout".
  - Test 2 (content ready): probing after SPINNER_WINDOW + 1000ms
    buffer does NOT yield TRANSITIONAL_STATE_TIMEOUT.
  - Test 3 (no-regression): Settings.app (com.apple.Preferences) after
    1500ms does not false-positive as TRANSITIONAL_STATE_TIMEOUT —
    guards against the promotion firing on normal fast launches.
  - Test 4 (opt-out): --max-settle-retries 0 keeps the classification
    at FOREGROUND_CONTEXT_UNAVAILABLE (pre-issue byte-for-byte behavior).

Closes part 2 of #46.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…xture

test(sim-hid-bridge): flutter_spinner fixture + TRANSITIONAL_STATE_TIMEOUT live test (#46)
The `require.main === module` guard does not survive the webpack transform
used to bundle `dist/ax-bridge`. After the webpack pass, the generated
condition (`__webpack_require__.c[__webpack_require__.s] === module`) never
evaluates to true when the bundle is run as `node dist/ax-bridge`, which
meant `main()` was never called — every invocation exited 0 with no output.
The regression broke `tests/unit/ax-bridge-help.test.ts` and silently
disabled the raw CLI's `--help` surface and every subcommand.

Since this file is only ever shipped as the CLI entry point, drop the
guard and call `main()` unconditionally. Verified: all 10 help tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…inel boots (#637)

The previous help-surface fix made the ax-bridge bundle execute `main()` on
module import, which broke the wrapper unit suite by treating pure test imports
like real CLI invocations. At the same time, the HID sentinel still used a hard
30s exec timeout even though the suite budget had already been raised to 45s for
slow macOS runners, so macos-15 could time out with empty stdout and fail the
JSON-output assertion.

This keeps the webpack-safe CLI detection by checking `process.argv[1]` for the
ax-bridge entry name, preserving `node dist/ax-bridge ...` behavior without
executing during Jest imports. It also aligns the sentinel subprocess timeout
with `SLOW_BRIDGE_TIMEOUT_MS` so the workflow and the local sentinel command use
the same budget.

Constraint: Webpack breaks `require.main === module` for the bundled ax-bridge entry
Constraint: macos-15 GitHub runners can exceed 30s on the first sentinel probe
Rejected: Unconditional main() invocation | breaks import-based unit tests
Rejected: Leave sentinel at 30s and relax assertions | hides the timeout instead of fixing it
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep ax-bridge import-safe; any future entry guard must still allow unit tests to import helpers without executing the CLI
Tested: npm test -- --runInBand tests/unit/ax-bridge-wrapper.test.ts tests/unit/ax-bridge-help.test.ts
Tested: npx jest tests/ci/sim-hid-sentinel.test.ts --runTestsByPath --testPathIgnorePatterns=/node_modules/ --runInBand
Tested: npm run lint
Tested: npx tsc --noEmit
Tested: npm run test:ci
Tested: workflow-equivalent sentinel command with --json output file
Not-tested: GitHub-hosted macos-15 runner after merge
- Adds 'pointer-service' to `ALLOWED_BACKEND_KINDS` in
  `src/ci/audit-log-posture.ts` alongside `ax-press`, `flutter-vm`,
  `simhid`, and `webkit`.
- Keeps `applescript` and `simctl` rejected.
- Extends `tests/unit/audit-log-posture.test.ts` with a positive case
  for `pointer-service` and adds it to the top-level extraction fixture.

## Why

PR #615 introduced `PointerServiceInputBackend` (`kind = 'pointer-service'`)
as the Phase-1 opt-in Tier-1 backend for Xcode 26+ coordinate taps, gated
behind `OPENSAFARI_ENABLE_POINTERSERVICE=1`. PR #612 admitted `ax-press`
to the #540 posture allow-list but did not enumerate `pointer-service`,
so any CI job that opts into the backend and then runs
`assertAuditLogPosture` fails with `disallowed backendKind(s) present:
pointer-service`. This blocks the PointerService promotion path
(`#590` Phase 1 → Phase 2) in every pipeline that enforces the #540
criterion.

The backend itself is headless by design: it shells out to
`sim-hid-bridge tap-ps`, which uses the PointerService IPC surface with
no foreground dependency on Simulator.app. That posture is attested by
the sentinel symbol probe (PR #621) and the live integration harness
(PR #629), so treating it as a first-class headless backend matches the
observed behavior.

## Test plan

- [x] `npx jest tests/unit/audit-log-posture.test.ts --runInBand` — 12/12 passing.
- [x] `npx tsc --noEmit` clean.
- [x] `npx eslint src/ci/audit-log-posture.ts tests/unit/audit-log-posture.test.ts` clean.

## Related

- Refs: #540
- Follows: #612 (ax-press allow-list admission), #615 (PointerService Phase 1), #590 (PointerService epic)
…in ko-KR IAP recipe

## Summary

- Replace `flutter build ios --simulator --profile` with `--debug`
  in both variants (GitHub Actions at L120 and shell at L283) and in
  gotcha #4. The Flutter toolchain rejects `--profile` on simulator
  targets with 'Profile mode is not supported for simulators.'
- Add a prominent version-pin banner at the top noting the recipe only
  works on `[email protected]` and the three `app_storekit_*` tools
  it invokes were removed in 0.5.0 (PR #623 on #588).
- Reword the prereq Xcode row ('for simctl storekit' implied a CLI that
  does not exist) and the 'Tool availability note' so the three
  storekit tools are no longer implied to be current.

## Why

1. `--profile --simulator` is a hard Flutter toolchain failure. PR #630
   corrected `docs/ci-recipes.md` and `docs/flutter-inspector.md` to
   use `--debug` for simulator targets and documented the split
   (`--debug` for simulator, `--profile` for physical device). The
   ko-KR IAP recipe landed in PR #628, after #630 was merged, but kept
   the broken `--profile` command in three places. Anyone copy-pasting
   the recipe today hits an immediate Flutter CLI exit.

2. The recipe legitimately ships as a pinned-to-0.4.9 artifact, but the
   cross-references and "Tool availability note" read as if the
   `app_storekit_*` tools are current MCP surfaces. A reader who
   installed `opensafari-mcp` from npm (→ 0.5.0) and followed the
   recipe's `docs/storekit-automation.md` link lands on the post-#627
   rewrite that explicitly says those tools do not exist — a
   contradiction the recipe does not prepare them for. The banner makes
   the pin and its expiry condition unambiguous.

## Test plan

- [x] `git diff --stat` — 1 file, 8 insertions / 6 deletions.
- [x] No remaining `--profile` occurrences on simulator commands:
      `grep -n "simulator --profile" docs/recipes/flutter-iap-ko-kr.md` → no matches.
- [x] Banner renders cleanly in the doc preview.

## Related

- Refs: #628 (recipe introduction), #630 (`--debug` for simulator targets), #623 (`app_storekit_*` removal), #627 (post-removal storekit-automation doc), #588 (storekit epic), #589 (ko-KR alert handle).
…ebview_connect

## Summary

- Track whether `app_webview_connect` freshly registered a
  `WebKitClient` on the current call, and deregister via
  `setWebKitClient(null, resolvedDeviceId)` in the `listTargets` catch
  path so the next call does not reuse a stale, never-successfully-used
  instance.
- Pre-existing registrations are intentionally left alone — they may be
  owned by a concurrent tool surface that manages its own lifecycle.
- Adds two unit-test cases: one asserts the freshly-created client is
  cleared after a `listTargets` failure; the mirror case asserts a
  caller-registered client is retained.

## Why

Before this change, when `getWebKitClient(deviceId)` returned no client,
`app_webview_connect` would construct and register a new `WebKitClient`,
then immediately call `listTargets()`. If the proxy was briefly
unreachable the call threw `ECONNREFUSED`, the catch branch returned an
`isError` payload, and the newly-registered client stayed in the global
map. Every subsequent `app_webview_connect` (or other tool that goes
through `getWebKitClient`) then short-circuited to the failed client,
propagating the transient outage as a permanent state for the rest of
the server's lifetime.

The fix scopes the cleanup to what we created: concurrent callers that
had already registered a working client keep it.

## Test plan

- [x] `npx jest tests/unit/app-webview-connect.test.ts --runInBand` — 15/15 passing
      (was 13/13; +2 new cases covering freshly-registered cleanup and
      pre-existing preservation).
- [x] `npx tsc --noEmit` clean.
- [x] `npx eslint src/tools/app-webview-connect.ts tests/unit/app-webview-connect.test.ts` clean.

## Related

- Related: #592 (bundle-aware target classification), #606 (bundle-aware classifier), #593 (WebView cross-context).
Adds `dumpTreeWithRecovery` around `AccessibilityBridge.dumpTree` to absorb
transient bridge failures that previously poisoned the verification lane.

- Classify recoverable (`DEVICE_CONTENT_ROOT_EMPTY`, `AX_TIMEOUT`,
  `BRIDGE_EXEC_FAILED`, `AX_ERROR`) vs non-recoverable
  (`BRIDGE_NOT_FOUND`, `AX_PERMISSION_DENIED`).
- Between attempts, sleep per the configured backoff schedule
  ([200, 500, 1200] ms by default) and force a fresh
  `ensureSemanticsActive` to rematerialise Flutter/native semantics.
- Return a chronological `AxBridgeRecoveryReport` on every call so tool
  diagnostics can surface attempts/stages uniformly.
- Best-effort reactivation: a failing `ensureSemanticsActive` is recorded
  in the report but does not abort the retry loop.
- Thrown errors carry the report attached for downstream telemetry.

No call sites change in this PR; wiring the wrapper into the tools
happens in a follow-up so the module can be reviewed in isolation.

Tests: 11 unit tests cover happy path, single/double-failure recovery,
retry exhaustion, non-recoverable short-circuit, reactivation disable,
reactivation failure, custom schedule, missing deviceId, and the error
attachment contract.

Refs: #643

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
WebKitClient.listTargets previously used `parsed.every(isDeviceRedirect)`
as an all-or-nothing guard, so a mixed /json response (some redirect
stubs alongside inline page targets) fell through to the pass-through
branch and leaked stubs with undefined webSocketDebuggerUrl to callers.
That produced opaque WebSocket errors in app_webview_connect without any
diagnostic pointing at the unresolved redirect.

Resolve each entry independently: redirect stubs are expanded via their
inner /json, inline targets pass through untouched, and a failure on one
redirect only contributes [] rather than collapsing the whole array.

Adds tests/unit/webkit-client-list-targets.test.ts covering all-inline,
all-redirect, mixed, mixed-with-failure, and empty inputs, exercising
the real code path by spying on the private httpGet rather than stubbing
listTargets.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… tap via Notes (#641)

Apple review treats `xcrun simctl openurl` as indirect evidence for a
Universal Link deep-link. Reviewers themselves paste links into Notes and
tap the Data-Detector-produced link, which routes through the same
`LSOpenURL` path as the reviewer does in production.

This new tool automates that path end-to-end:

  1. Launches Notes (`com.apple.mobilenotes`).
  2. Focuses the note body (first AXTextArea/AXTextView/AXTextField).
  3. Paste-injects the URL via the existing pasteboard-input backend
     (handles the iOS 16+ paste-permission dialog + restores the original
     pasteboard).
  4. Polls the accessibility tree for an AXLink whose label contains the
     URL or its host.
  5. Presses the detected link via the AX bridge.

Returns a structured JSON result with `notesLaunchedAt`,
`linkTappedAt`, the matched `linkElement`, and `durationMs`. Timeouts,
Notes-launch failures, and `AXPress`-not-advertised failures surface as
`isError: true` with a descriptive message.

Unit tests cover happy path (URL match, host-only match), missing-url,
invalid-scheme, editor-never-appears, link-never-detected, Notes-launch
failure, and press-failed. Typecheck + jest both green.

Ref: #641 (sub-task T1)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The inline comment on PointerServiceInputBackend.swipe claimed that a
failure on Xcode 26+ would "surface via the tier chain when we bubble
back up". That is not the runtime behaviour: getInputBackend caches
PointerServiceInputBackend as the selected backend once
OPENSAFARI_ENABLE_POINTERSERVICE=1 resolves, so a HeadlessInputUnavailable
thrown by the SimHID delegate propagates straight back to the tool
layer without any tier-chain re-entry.

Option A (documentation-only) per the decision record on issue #649.
Option B (an actual in-tool tier downgrade) is deferred to #590 Phase 2
alongside sim-hid-bridge swipe-ps so the fallback can be tested against
a real alternative.

Changes:

- src/tools/pointer-service-input-backend.ts: rewrite the swipe() leading
  comment to describe the shipped behaviour and point callers at the two
  working escape hatches (leave the env flag unset, or use an
  element-targeted swipe so the AX-press tier handles the gesture).
  Amend the class-level Scope docstring so the "keeps non-tap input
  paths on their existing Tier-1 route" sentence is qualified for
  Xcode 26+.
- CHANGELOG.md: add an Unreleased / Fixed bullet noting that only
  documentation changed and directing readers to the escape hatches.
- tests/unit/pointer-service-input-backend.test.ts: add a guard
  assertion that PointerServiceInputBackend.swipe propagates
  HeadlessInputUnavailableError unchanged and invokes the delegate
  exactly once, so any future change that tries to silently retry or
  swallow the error has to update the comment and the test together.

No runtime behaviour changed. Build + tests still green on this worktree.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Register two new MCP tools that will expose a simulator-level network
toggle so native (Flutter / UIKit) apps see real SocketException /
NSURLErrorNotConnectedToInternet. Unlike network_offline (which only
shims window.fetch / XHR inside WebKit page contexts), these tools are
intended to operate on the iOS Simulator host network path via pfctl
or Network Link Conditioner. Issue #640 tracks the overall design;
this PR ships only the scaffold:

- src/tools/device-network.ts — handlers + input schemas, in-memory
  state map keyed by device UDID, mode === "online" is idempotent,
  mode === "offline" | "airplane" returns a structured not_implemented
  error pointing readers at the follow-up PRs (2-6) that wire the
  pfctl / NLC backends.
- src/tools/index.ts — registerDeviceNetworkTools() called alongside
  the existing WebKit-scoped network tools.
- tests/unit/device-network.test.ts — 12 Jest specs covering schema
  shape, mode/mechanism validation, online idempotence, UDID
  resolution (explicit vs getSoleDeviceId vs listBooted fallback),
  and device_network_get default + after-set behavior.

No behavior change for existing tools. Existing 114 unit-test suites
(1693 tests) continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Both tools now accept an optional `captureLogs` object that
synchronously collects `os_log` entries around the URL-open event. This
lets close-gate tooling assert the presence of deep-link handler output
(e.g. `[UniversalLink] Resolved …`) in a single MCP call instead of
juggling `xcrun simctl log stream` by hand.

iOS does not expose a kernel "app is quiescent" signal outside of
Instruments, so the helper uses a silence heuristic:

  1. Capture `preOpenAt` before invoking `simctl openurl`.
  2. Poll `log show --start <preOpenAt - prerollMs>` with the same
     NSPredicate already used by `app_logs` (reuses `buildLogPredicate`).
  3. Deduplicate entries by `timestamp + traceID + composedMessage`.
  4. Stop once no new matching entry has arrived for `silenceMs`, or
     `maxDurationMs` elapses. Report the reason.

Shared across `app_open_url` and `app_deeplink` via
`src/observability/capture-logs-window.ts`. The option is additive — when
`captureLogs` is omitted the tools behave exactly as before, and the
pre-existing unit test for `app_open_url` continues to pass.

Unit tests (`tests/unit/capture-logs-window.test.ts`) cover:

  - silence stop path (entries stable across polls → stopReason='silence')
  - max_duration stop path (entries keep arriving → stopReason='max_duration')
  - preroll is applied (windowStart = preOpenAt - prerollMs)
  - simctl rejection → returned as `{error, stopReason:'error'}` instead
    of thrown (so the caller still gets the open result)
  - NSPredicate threading (bundleId + search)
  - dedup by `traceID + composedMessage` across polls

Ref: #641 (sub-task T4)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ormational stderr (#651)

Address bug #2 from issue #651. `simctl io … screenshot` intermittently
returns `Timeout waiting for screen surfaces` after high-velocity input
events while the simulator is still compositing; a direct retry ~1-2s
later succeeds. Two changes:

1. Retry the screenshot up to 2 times with a 1.5s backoff when the error
   message matches the transient-timeout signature. Surface a `retries`
   counter in the response metadata so operators can see how often the
   simulator asks to be backed off.
2. Strip the informational `Note: No display specified.` line from
   stderr before surfacing it as an error — it is emitted by simctl even
   on success paths and drowns out the real failure signal.

Unit tests cover:
- Retry once on transient error then succeed (retries: 1)
- Retry-exhausted path still returns a clean error message
- Non-transient errors are not retried
- `Note:` line is filtered from error output
- First-attempt success case preserved (retries: 0)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
shaun0927 and others added 17 commits May 12, 2026 15:47
#706)

Merge-readiness review completed for PR #732.

This PR is valid for #706 because it keeps `WebKitClient` as the public compatibility facade while moving its internal responsibilities into focused modules: errors, protocol transport, target session state, browser commands / DOM input scripts, and typed event forwarding. That directly addresses the maintainability risk from the previous monolithic WebKit client without introducing CDP/Playwright abstractions or changing the external BrowserBackend-facing API.

I also reviewed the automated feedback. The earlier source-comment cleanup and transport/event/cookie follow-ups were already addressed. Two latest Codex findings were still valid and are now fixed:

- `selectOption` again guards on `SELECT`, so a mis-targeted selector cannot mutate an input/textarea while still reusing the shared prototype-walk value setter.
- High-risk MCP audit logging now records `status: "error"` when a tool handler returns `isError: true` without throwing, preserving error telemetry.

I also refreshed the PR title/body so they match the actual current scope, then merged the latest `develop` to remove the conflict while preserving develop's newer audit redaction policy.

Validation evidence on the final head (`1d9abf4d`):

- Local: `npm run lint -- --quiet` passed.
- Local targeted regression coverage: 9 relevant suites / 152 tests passed after the develop merge.
- Local full unit suite: 170 suites / 2599 tests passed.
- Local build: `npm run build` passed.
- Local whitespace check: `git diff --check` passed.
- GitHub CI on the final head: lint, test, dependency-audit, and build all passed.

Architectural review status: CLEAR. The remaining risk is limited to live Safari/simulator smoke behavior, but the PR is a behavior-preserving refactor with focused unit coverage around the extracted boundaries plus green CI/build. I consider this merge-ready.
refactor(input): split backends and resolver into focused modules (#707 a)
Constraint: Keep tracing dependency-free and opt-in so normal MCP automation behavior is unchanged.
Rejected: Adopt Playwright trace or OpenTelemetry SDK directly | too heavy for the first OpenSafari-native artifact contract.
Confidence: high
Scope-risk: narrow
Directive: Extend exporters only after the JSON schema proves useful in live simulator failures.
Tested: npm test -- --runTestsByPath tests/unit/action-trace.test.ts; npm run lint -- --quiet src/observability/action-trace.ts src/orchestration/scenario-runner.ts tests/unit/action-trace.test.ts; npm run build; GitHub CI lint/test/build/dependency-audit.
Not-tested: Live simulator trace artifact capture.
Constraint: Fast login support must not turn OpenSafari into an auth framework or secret store.
Rejected: Add Auth.js/SimpleWebAuthn/IAM dependencies | those belong to target apps, not the automation MCP runtime.
Confidence: high
Scope-risk: narrow
Directive: Implement tool gaps only after a non-secret fixture proves the recipe cannot be completed with existing tools.
Tested: Documentation-only scope review; GitHub CI lint/test/build/dependency-audit/path-guard.
Not-tested: Live passkey fixture run.
Constraint: Preserve existing app_wait_for defaults and add only optional diagnostics.
Rejected: Introduce a new locator or action retry engine | broader behavior change than the wait-diagnostics issue needs.
Confidence: high
Scope-risk: narrow
Directive: Keep timeout samples bounded and reset stability after unobserved polls so AX dumps do not become noisy or false-positive artifacts.
Tested: npm test -- --runTestsByPath tests/unit/app-wait-for.test.ts tests/unit/app-wait-for-diagnostics.test.ts; npm run lint -- --quiet src/tools/app-wait-for.ts tests/unit/app-wait-for.test.ts tests/unit/app-wait-for-diagnostics.test.ts; npm run build; GitHub CI lint/test/build/dependency-audit.
Not-tested: Live simulator intermittent AX query failure.
Constraint: Use existing VM Service allocation data without app-side dependencies.
Rejected: Require leak_tracker or embed Flutter DevTools | would force changes in target apps and exceed OpenSafari's external QA role.
Confidence: high
Scope-risk: narrow
Directive: Keep budgets caller-configured until live fixtures establish stable defaults.
Tested: npm test -- --runTestsByPath tests/unit/flutter-memory-budget.test.ts; npm run lint -- --quiet src/metrics/flutter-memory-budget.ts tests/unit/flutter-memory-budget.test.ts; npm run build; GitHub CI lint/test/build/dependency-audit.
Not-tested: Live Flutter debug/profile fixture budget run.
Constraint: Preserve existing network_intercept block/modify behavior while preventing cross-session rule leakage.
Rejected: Keep module-global activeRules | long-lived MCP processes can leak stale rules into unrelated sessions.
Confidence: high
Scope-risk: narrow
Directive: Future per-rule removal should build on session-scoped interceptors rather than reintroducing module-global state.
Tested: npm test -- --runTestsByPath tests/unit/network-interceptor.test.ts tests/unit/network-intercept-tool.test.ts; npm run lint -- --quiet src/tools/network-intercept.ts src/tools/network-offline.ts tests/unit/network-intercept-tool.test.ts; npm run build
Not-tested: Live two-session Safari interception.
Constraint: Preserve synchronous registry APIs and existing lock timeout/retry semantics.
Rejected: Keep a JavaScript busy-spin loop | it wastes CPU during simulator cleanup contention.
Confidence: high
Scope-risk: narrow
Directive: Any future async lock migration should preserve active-simulator protection semantics first.
Tested: npm test -- --runTestsByPath tests/unit/zombie-cleanup-lock.test.ts tests/unit/zombie-cleanup-cross-session.test.ts; npm run lint -- --quiet src/reliability/zombie-cleanup.ts tests/unit/zombie-cleanup-lock.test.ts; npm run build
Not-tested: Live multi-process lock contention CPU profile.
Constraint: Live OpenSafari validation for #752 showed device_boot could return success while qa_session_create failed with NO_WEBKIT_CLIENT and diagnose reported device=null.\nRejected: Making WebKit mandatory for device_boot | native simulator tools should remain usable when Safari target discovery is temporarily unavailable.\nConfidence: high\nScope-risk: narrow\nDirective: Preserve best-effort boot semantics; surface WebKit unavailability through diagnostics instead of hiding a booted device.\nTested: npm test -- --runInBand tests/unit/device-boot-webkit-retry.test.ts; npm run lint -- --quiet; npm run build:src\nNot-tested: Full iOS WebKit live retry after installing this branch into the active MCP runtime.
Constraint: PR #753 moves network_intercept and offline behavior onto the shared NetworkInterceptor implementation.\nRejected: Leave XHR send wrapped after clear/offline restore | clear would not fully restore page network behavior and repeated injections could wrap wrappers.\nConfidence: high\nScope-risk: narrow\nDirective: Keep fetch, XHR open, and XHR send restoration in sync when editing injected network hooks.\nTested: npm test -- --runTestsByPath tests/unit/network-interceptor.test.ts tests/unit/network-intercept-tool.test.ts; npm run lint -- --quiet src/network-interceptor.ts tests/unit/network-interceptor.test.ts tests/unit/network-intercept-tool.test.ts\nNot-tested: Live Safari XHR interception.
Avoid busy-spin waits in zombie cleanup lock
Recover device_boot when WebKit registration is late
Coalesce per-device WebInspectorProxy startup and make teardown wait for in-flight starts so parallel device boot/retry/shutdown paths preserve one proxy and one reserved port per simulator.\n\nConstraint: OpenSafari advertises parallel simulator sessions, so proxy lifecycle must remain idempotent under concurrent callers.\nRejected: relying on callers to serialize device boot | duplicates the invariant and leaves retry/shutdown paths unsafe.\nConfidence: high\nScope-risk: narrow\nDirective: Keep getProxyForDevice, stopProxyForDevice, and stopAll single-flight aware when changing proxy lifecycle state.\nTested: CI build/lint/test/dependency-audit; npm test -- --runTestsByPath tests/unit/proxy-manager.test.ts; npx eslint src/simulator/proxy-manager.ts tests/unit/proxy-manager.test.ts
Share per-device Flutter VM resolution while native input tools are entering Tier 0 so debug/profile Flutter automation does not issue overlapping discovery/connect calls against the same client.\n\nConstraint: Parallel app tools can start simultaneously against one Flutter simulator before the resolver cache is populated.\nRejected: letting FlutterVMClient absorb concurrent connect calls | connect mutates WebSocket and state, so resolver-level single-flight ownership is safer.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Tier 0 Flutter VM discovery single-flight when changing input backend resolution.\nTested: CI build/lint/test/dependency-audit; npm test -- --runTestsByPath tests/unit/input-backend-resolver.test.ts; npx eslint src/input/flutter-resolver.ts tests/unit/input-backend-resolver.test.ts
…alse (#760) (#761)

* fix(pasteboard): skip readback for AXSecureTextField + honor verify:false (#760)

The #639 PR C readback contract in `assertPasteApplied` cannot succeed against
`AXSecureTextField` (password input) because iOS masks the AXValue with
bullet characters regardless of the underlying plaintext. Every successful
password paste was being rejected as `PASTE_NOT_APPLIED`, and the simhid
path's `verifyTypedText` was escalating the same OS-mask divergence into
`TEXT_INPUT_DROPPED` / `TEXT_INPUT_LAYOUT_MISMATCH` with `isError: true`.
Net effect: no working backend for typing into a password field, blocking
every signup / login / password-reset automation that follows the #639
Problem 1 docstring's "use pasteboard for non-Latin layouts" recommendation.

Secondary: `backend: "pasteboard"` was silently ignoring the `verify: false`
opt-out — `app-type-element.ts` always forwarded `expected` +
`focusedElementPath` to `typeViaPasteboard`, so callers had no way to work
around the secure-field bug. The simhid path already honoured `verify:
false` via `verifyOptIn`; pasteboard now does too.

Changes:
- `pasteboard-input.ts`: `assertPasteApplied` accepts an optional
  `{ role, traits }` descriptor and returns silently when it signals a
  secure text field. New `isSecureFieldDescriptor` helper exported for
  callers that need to surface the skip in their own response shape.
  `typeViaPasteboard` passes the inspected node's role/traits into the
  assert and adds `secureField?: true` to `PasteboardTypeResult` so the
  tool layer can echo the skip cause.
- `app-type-element.ts` (pasteboard branch): only forward `expected` +
  `focusedElementPath` to `typeViaPasteboard` when `verifyOptIn === true`.
  Echo `pasteResult.secureField` in the success response so callers can
  distinguish "no readback because secure field" from "no readback because
  verify opted out".
- `app-type-element.ts` (`verifyTypedText`, simhid branch): detect
  `AXSecureTextField` via role or trait (also accepting the lower-case
  `"secure text field"` alias) and return
  `{ verified: 'unknown', verify_method: 'ax-value-not-readable' }` with a
  secure-field reason, instead of falling through to the mismatch path.

Tests:
- `pasteboard-input.test.ts`: 10 new cases — `assertPasteApplied` with
  bullet-masked actual + secure-field descriptor (role variant, traits
  variant, lower-case-alias variant), non-secure regression, legacy
  no-descriptor path. `isSecureFieldDescriptor` truth table.
- `app-type-element.test.ts`: 2 new cases inside the Tier-3 readback
  describe — secure field via trait, secure field via role. Both must
  return `verified: 'unknown'` and must NOT escalate to a structured
  input error.

No public schema change beyond honouring the existing `verify` parameter
on the pasteboard backend. No new tool, no backend auto-switching — those
were considered and rejected as out of scope (the issue body lists them
explicitly).

Closes #760.

* refactor(app-type-element): delegate secure-field check to isSecureFieldDescriptor

The trait-alias corpus (`AXSecureTextField`, `secure text field`) was duplicated
between `pasteboard-input.ts::isSecureFieldDescriptor` and
`app-type-element.ts::verifyTypedText`. If a new alias surfaces from a future
iOS release, the duplicated copy could rot unnoticed and re-introduce the #760
regression on the simhid path.

Import and reuse the helper so the corpus lives in exactly one place. Behaviour
is unchanged — all 2646 unit tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Bump package version to 0.6.1 and stamp the full release note into
CHANGELOG.md. 0.6.1 is a *catch-up release* — v0.6.0 was tagged on
2026-05-16 but never merged to `main` and never published to npm
(npm registry latest was 0.5.0). 0.6.1 rolls the entire 0.6.0 body
forward onto `main` together with one targeted pasteboard fix
uncovered after the 0.6.0 tag.

Headline fix (post-0.6.0):
- app_type_element pasteboard backend now works on AXSecureTextField
  (password) elements (#760, #761) — the #639 PR C readback contract
  was rejecting every successful password paste as PASTE_NOT_APPLIED
  because iOS masks AXSecureTextField AXValue with bullet chars
  regardless of plaintext. Verifier now skips for secure-field
  signals (role/trait) and the pasteboard backend honours `verify:
  false` symmetrically with simhid.

Headline change (rolled forward from 0.6.0):
- ax-bridge recursive scored content-root search (#40, follow-up to
  #4). Closes the silent-empty-content bug on Xcode 26.4 / iOS 26.4
  with a typed DEVICE_CONTENT_ROOT_EMPTY error instead of falling
  back to bare AXWindow.

Also includes (full notes in CHANGELOG.md):
- Boot/lifecycle/network reliability batch (#752, #753, #754, #755,
  #756, #757)
- WebKit decomposition (#706 1/5–5/5), simulator decomposition
  (#708 1/4–4/4), input layer split (#707 a/b), protocol typing
  (#710 a/b), DOM-input centralization (#709)
- Performance: CLI lazy-load (#700), WebKit fast paths (#702),
  native-input batching (#705), proxy readiness split (#701),
  bootstatus-aware polling (#703), Web Inspector socket cache (#704)
- Security: HTTP MCP transport hardening (#714 series), auth
  profile persistence (#716), audit log retention (#711, #717)
- Tap coordinate scaling macOS-pt → iOS-pt (#693 WU3, #720) with
  regression coverage (#722); ax-bridge --debug stderr (#660)
- TRANSITIONAL_STATE_TIMEOUT classification + --max-settle-retries
  for sim-hid-bridge (#46); PointerService swipe semantics
  clarified (#649)

No breaking changes since 0.5.0. No tool name or schema changes.
verify: false now honoured on pasteboard backend (was a silent
no-op); response gains optional `secureField: true` field on
AXSecureTextField paste — review the migration notes section.

Tests: 2646 unit pass; tsc clean.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0e37d978b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +14 to 17
if (!interceptor) {
interceptor = new NetworkInterceptor();
interceptorsBySession.set(key, interceptor);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove interceptor state when MCP session is deleted

The new per-session interceptor cache only ever inserts into interceptorsBySession and never removes entries, so HTTP deployments that churn sessions (e.g., clients that open/close sessions or omit Mcp-Session-Id and get a new one) will leak one NetworkInterceptor object per session for the process lifetime. clear only disables rules but keeps the map entry, and there is no wiring from HTTP session deletion back to this map, so memory/state accumulates over time in long-running servers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to a focused follow-up issue: #763. The cleanest fix touches MCP-server-side session-lifecycle plumbing (subscribing to HTTPTransport.onSessionDelete, which is defined at src/transports/http.ts:65-67 but currently has no callers), and that integration is out of scope for a release catch-up PR — adding new server wiring on top of the 355-commit develop→main bundle would dilute review scope. The deferral is intentional, not a dismissal: #763 has full repro, code references, and a proposed fix outline so it can be picked up against develop independently. The leak only applies to long-running HTTP deployments; STDIO sessions (the default) are unaffected since they live for the process lifetime.

main carried 3 commits not yet in develop (PR #696 audit auto-spawn proxy,
PR #638 v0.5.0 sentinel-release-ci, sentinel timeout-budget follow-up).
Folding them into the 0.6.1 release so the develop→main catch-up lands
cleanly without losing main-only hotfixes.

Conflict resolution (same content as the prior non-merge commit 14684a7,
re-recorded with both parents so GitHub recognises it as a true merge):

- cli/index.ts: keep develop's lazy-import pattern (#700 a perf work);
  add WebInspectorProxy to the lazy import block (needed by #696's audit
  auto-spawn path); add audit-port helper as eager import (used by
  commander option parser at module init time).
- tests/unit/proxy-timing.test.ts: keep develop's renamed private method
  shape (waitForTarget / waitForProcessReady); drop the stray
  waitForForwarding test that duplicated existing waitForTarget coverage
  with the pre-rename API; add isProxyHealthy to the ProxyPrivate type
  cast (still present at src/simulator/proxy.ts:395); update two
  process-readiness mocks from <html>iOS Devices</html> to [] to match
  main's /json probe (#696 — -F-safe) and the
  body.trimStart().startsWith('[') predicate now used by isProxyHealthy
  / waitForProcessReady; switch two forwardingPolled assertions from
  u.includes('/json') (now matches both ports) to
  u.includes('localhost:${proxy.port}') (forwarding port only).

Verification: 2656 unit tests pass; tsc clean.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2ac93fe0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/network-intercept.ts Outdated
const interceptorsBySession = new Map<string, NetworkInterceptor>();

export function getNetworkInterceptorForSession(sessionId?: string): NetworkInterceptor {
const key = sessionId || DEFAULT_INTERCEPTOR_SCOPE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope interceptor state by device, not only by session

The interceptor cache key ignores device_id (getNetworkInterceptorForSession only uses sessionId), but the tool now accepts per-call device_id. In a single MCP session that targets multiple simulators, toggling network_offline/network_intercept on device B mutates the same interceptor state used for device A, so disabling or clearing on one device can leave stale JS hooks active on the other device while the server believes interception is disabled. This creates hard-to-recover cross-device state bleed for multi-device workflows.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fabe0ab. getNetworkInterceptorForSession now takes an optional second deviceId argument and keys the cache by ${sessionId}|${deviceId}, with stable default scopes when either is omitted. Both call sites (network_intercept tool handler in this file and network_offline in src/tools/network-offline.ts) plumb device_id through, so multi-device sessions get distinct interceptors. Three new unit tests in tests/unit/network-intercept-tool.test.ts cover the device-scoping contract (distinct interceptors per (session, device); same interceptor on repeated access with the same pair; omitted-deviceId default scope distinct from explicit deviceIds).

Comment thread src/tools/network-intercept.ts Outdated
Comment on lines +38 to +40
const action = ((params.action as string) || 'block') as 'block' | 'modify';
if (action === 'block') return { urlPattern, action: 'block' };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unknown action values instead of treating them as mock

mapRule treats any non-"block" action as a mock response path, so malformed inputs like "blok" are silently converted into action: "mock" with a default 200/empty body. Because MCP schemas are not enforced at runtime for all clients, this can turn a typo or untrusted payload into unintended request rewriting instead of a validation error, changing network behavior in ways callers won’t expect.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fabe0ab. mapRule now rejects any action value other than "block", "modify", or undefined (the latter keeps the existing default-to-block behaviour) with Error('action must be "block" or "modify" (got <JSON>)'). Typos like "blok" and non-string actions (number / boolean / object) now surface as visible errors instead of silently rewriting requests. mapRule is exported so the schema-validation contract is unit-testable directly; seven new cases in tests/unit/network-intercept-tool.test.ts lock the contract down.

… — Codex P1/P2 on #762

Addresses two of the three Codex review findings on PR #762. The third
(session-deletion leak) is split out to follow-up issue because the
cleanest fix touches MCP-server-side session-lifecycle plumbing and is
out of scope for a release catch-up PR.

Codex P1 — Scope interceptor state by device, not only by session
------------------------------------------------------------------
`getNetworkInterceptorForSession` was keyed by sessionId only, but the
tool accepts per-call `device_id`. In a single MCP session that targets
multiple simulators, toggling `network_intercept` or `network_offline`
on device B mutated the same interceptor used for device A — disabling
or clearing on one device could leave stale JS hooks active on the
other while the server reported "interception disabled". The composite
key now combines sessionId + deviceId (`${sessionId}|${deviceId}`),
with stable default scopes when either is omitted, so per-device state
no longer bleeds across sibling devices.

Callers updated:
- `src/tools/network-intercept.ts` tool handler now plumbs `device_id`
  into `getNetworkInterceptorForSession`.
- `src/tools/network-offline.ts` (which shares the same module) does
  the same so the offline toggle targets the per-device interceptor.

Codex P2 — Reject unknown `action` values instead of silently mocking
--------------------------------------------------------------------
`mapRule` treated any non-"block" action as the mock path, so a typo
like `"blok"` was silently converted into `action: "mock"` with a
default 200/empty body. The JSON Schema declares
`enum: ['block', 'modify']` but MCP runtime enforcement is not
guaranteed for every client. `mapRule` now throws a clear error when
`action` is supplied with any value other than `"block"` or `"modify"`
(or `undefined`, which keeps the existing default-to-block behaviour),
turning misuse into a visible error instead of unintended request
rewriting. Also exported so the schema-validation contract is unit-
testable directly.

Tests
-----
`tests/unit/network-intercept-tool.test.ts` adds 10 new cases:
- Device scoping (3): distinct interceptors per (session, device);
  same interceptor for repeated access with the same pair; omitted
  deviceId is a stable default scope distinct from explicit deviceIds.
- Action validation (7): default-to-block, accepts "block",
  accepts "modify" with default 200 / empty body, threads explicit
  statusCode + body, rejects typo "blok", rejects non-string actions
  (number / boolean / object), still requires urlPattern.

Verification: 2666 unit tests pass (+10 from baseline 2656); tsc clean.

Refs: Codex review on PR #762 (commit e2ac93f).
@shaun0927
Copy link
Copy Markdown
Owner Author

@codex review

Re-review request for commit fabe0ab, which addresses two of the three findings from the earlier review on src/tools/network-intercept.ts:

  • P1 device scoping → fixed: composite ${sessionId}|${deviceId} cache key in getNetworkInterceptorForSession; both network_intercept and network_offline tool handlers plumb device_id. Three new unit tests in tests/unit/network-intercept-tool.test.ts lock the contract.
  • P2 action validation → fixed: mapRule now throws on any non-"block"/non-"modify" action, including non-string values. Seven new unit tests cover happy paths + rejections.
  • P2 session-deletion leak → deferred to follow-up issue Evict NetworkInterceptor map entries on MCP session deletion (HTTP transport) #763 with full repro, code references, and proposed fix outline; the cleanest implementation needs new MCP-server wiring (HTTPTransport.onSessionDelete currently has no callers) which is out of scope for a release catch-up PR.

Verification: npx tsc --noEmit clean; npx jest --testPathPatterns='tests/unit' passes 2666/177 (+10 new). Please confirm the fixed items and let me know if anything else surfaces on the new commit.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@shaun0927
Copy link
Copy Markdown
Owner Author

Ready to merge — approve summary

Codex Review on this PR has signaled approval: 👍 reaction on the PR and a comment of "Didn't find any major issues. More of your lovely PRs please." as of 2026-05-17T08:48:13Z, against commit fabe0ab1. This summary records why I think the PR is ready to merge and what level of confidence each piece of feedback was driven to.

What this PR is

A catch-up release that lands every change on develop onto main (and ultimately npm) as v0.6.1. v0.6.0 was tagged on 2026-05-16 but never merged to main and never published to the npm registry (npm view opensafari-mcp versions confirms 0.5.0 was the latest published until this PR). Rather than ship a 0.6.0 → 0.6.1 pair, this PR rolls the entire 0.6.0 body forward together with one targeted pasteboard fix (#761) uncovered after the 0.6.0 tag. 355 commits from develop + 3 main-only hotfixes (PR #696 audit auto-spawn, PR #638 v0.5.0 sentinel-release-ci, sentinel-timeout follow-up) folded in via a real two-parent merge commit (e2ac93fe). No breaking changes since 0.5.0.

Scope / direction review

Axis Verdict
Aligned with opensafari direction ✅ Catch-up release of accumulated develop work; matches the project's prior release pattern (compare the [0.5.0] - 2026-04-17 and [0.6.0] - 2026-04-20 CHANGELOG sections — same structure, same headline + per-area decomposition).
Single concern ✅ One concern: release the develop body. Every commit folded in already passed its own review on develop; the only new code in this PR is the version bump, CHANGELOG entry, two conflict-resolution edits (cli/index.ts, tests/unit/proxy-timing.test.ts), and the Codex-driven follow-ups in network-intercept.ts / network-offline.ts.
Over-engineering risk ✅ Considered and rejected several "extra polish" options: backend auto-switching for non-Latin keyboards (already covered by TEXT_INPUT_LAYOUT_MISMATCH from #639 PR D), a separate app_paste_password tool (the existing app_type_element is the right surface), a new removeNetworkInterceptorForSession + MCP-server wiring (split to #763 to keep this PR scoped).
Public API surface change ✅ Optional-only: assertPasteApplied gains an optional 4th argument; PasteboardTypeResult gains an optional secureField?: true; getNetworkInterceptorForSession gains an optional 2nd deviceId argument. All backward-compatible with existing callers. No tool-name or schema changes.

Iteration history

  1. Initial release commit (d0e37d97) — version bump + CHANGELOG entry. Codex flagged 0 issues on this commit (it only flagged inherited develop code on the next commit).
  2. Merge commit from main (e2ac93fe) — proper two-parent merge; conflict resolution in cli/index.ts (kept develop's lazy-import pattern from Lazy-load CLI commands and MCP tool handlers to improve startup speed #700 a, added WebInspectorProxy needed by fix(cli): audit auto-spawns proxy; doctor probes /json for -F compat #696, added eager audit-port helper) and tests/unit/proxy-timing.test.ts (kept develop's renamed private-method shape; updated two process-readiness mocks from <html>iOS Devices</html> to [] to match main's /json probe from fix(cli): audit auto-spawns proxy; doctor probes /json for -F compat #696; switched two forwardingPolled assertions to check forwarding-port host:port explicitly rather than substring-matching /json which now matches both ports). On this commit Codex surfaced 3 inherited-code findings in src/tools/network-intercept.ts (from Scope network interception to MCP sessions #753 / Scope network interception to MCP sessions).
  3. Codex-driven fix commit (fabe0ab1) — addressed two of the three findings inline:
    • P1 device scoping → composite ${sessionId}|${deviceId} cache key in getNetworkInterceptorForSession; both network_intercept and network_offline tool handlers plumb device_id. Three new unit tests lock the contract.
    • P2 action validationmapRule now throws on any non-"block"/non-"modify" action (including non-string values), turning typo-driven request rewriting into a visible error. Seven new unit tests cover happy paths + rejections.
    • P2 session-deletion leak → split to follow-up issue Evict NetworkInterceptor map entries on MCP session deletion (HTTP transport) #763 with full repro, code references, and proposed fix outline. The cleanest implementation needs new MCP-server-side wiring (HTTPTransport.onSessionDelete exists at src/transports/http.ts:65-67 but has no callers), which is out of scope for a release catch-up. The leak only affects long-running HTTP deployments; STDIO sessions (the default) are unaffected since they live for the process lifetime.

Verification on fabe0ab1

$ npx tsc --noEmit -p tsconfig.json   # exit 0
$ npx jest --testPathPatterns='tests/unit'
Test Suites: 177 passed, 177 total
Tests:       2666 passed, 2666 total   (+10 new from baseline 2656)
Snapshots:   6 passed, 6 total
Time:        135 s

CI on GitHub side (as of this comment):

Check Status
lint ✅ SUCCESS
test ✅ SUCCESS
dependency-audit ✅ SUCCESS
qa-lane ⏭️ SKIPPED (intended — release PRs do not run the live QA lane)
path-guard (Omofictions-App private-route QA) ⚠️ FAILURE — advisory; release PRs always trip it because they legitimately bundle .github/workflows/omofictions-qa.yml updates with src/** changes (the guard is designed to keep dedicated QA-lane PRs from sneaking runtime code in). Not a blocker.
build ⏳ IN_PROGRESS (Swift bridge + webpack — historically 5–15 min)
Codex review ✅ Approved (👍 reaction + "Didn't find any major issues" comment on fabe0ab1)

Why merge is OK

  • Scope discipline: every change either (a) already shipped through its own PR on develop or (b) is a Codex-driven follow-up landed in a single focused commit (fabe0ab1). The third Codex finding was triaged with full rationale and split to Evict NetworkInterceptor map entries on MCP session deletion (HTTP transport) #763 instead of being silently dropped or scope-creeping the release.
  • No new public-API breakage: optional-only signature additions; existing callers compile and behave unchanged.
  • Test coverage strengthened: net +10 unit tests vs. baseline (pasteboard-input secure-field, app-type-element secure-field, network-intercept device scoping + action validation). 2666 tests pass under a clean tsc.
  • Release sequencing is conservative: no npm publish triggers from this PR — that step is left for the maintainer to run from main after tag, so a bad release cannot escape automatically.

Post-merge to-do (not part of this PR)

  1. Tag v0.6.1 on the merge SHA and push.
  2. npm publish from main.
  3. Pick up Evict NetworkInterceptor map entries on MCP session deletion (HTTP transport) #763 against develop (HTTP session-deletion cleanup for interceptorsByKey).
  4. Backmerge maindevelop so the [Unreleased] header on develop aligns with main's [0.6.1] history.

Approving on the strength of the Codex verdict, the green-everywhere CI (modulo the advisory path-guard), and the scope discipline above.

@shaun0927
Copy link
Copy Markdown
Owner Author

Final PR review — /review skill analysis

Senior-engineer pass over PR #762. Scope, correctness, conventions, performance, test coverage, security, and risk — followed by a deterministic verdict.


1. Overview

Type: release. chore(release): v0.6.1 — develop → main catch-up + AXSecureTextField paste fix (#761).

Shape: 269 files changed, +41,499 / −4,281, 100 commits. Almost all of that volume is inherited develop work that already passed individual reviews (commits since v0.5.0). Only three commits are release-specific net-new to review here:

Commit Files LOC What
d0e37d97 3 (package.json, package-lock.json, CHANGELOG.md) +61 / −5 Version bump 0.6.0 → 0.6.1 and the categorized release-notes section.
e2ac93fe 2 (cli/index.ts, tests/unit/proxy-timing.test.ts) +303 / −70 — merge Two-parent merge commit; conflict resolution only. Folds the 3 main-only hotfixes (PR #696 / #638 / sentinel timeout follow-up) back into release/0.6.1.
fabe0ab1 3 (src/tools/network-intercept.ts, src/tools/network-offline.ts, tests/unit/network-intercept-tool.test.ts) +156 / −12 Codex review fixes: P1 device-scoping + P2 action validation.

That's the entire net-new diff surface for this PR.


2. Code correctness

2.1 Version stamping (d0e37d97)

2.2 Merge conflict resolution (e2ac93fe)

Conflict 1 — cli/index.ts top-level imports: kept develop's lazy-import pattern (#700 a perf work), added one eager import for parseAuditProxyPort/resolveAuditProxyPort (correct — the commander --port option parser at line 353 needs them at module init time). Added WebInspectorProxy to the audit-action Promise.all([imports]) lazy-loader (correct — main's PR #696 introduced auto-spawn logic at lines 400–442 that needs the class).

Conflict 2 — tests/unit/proxy-timing.test.ts:

  • ProxyPrivate type now has waitForTarget, waitForProcessReady, isProxyHealthy. The first two match the rename in develop (Separate proxy process readiness from Safari target readiness #701 / Split process readiness from target readiness); isProxyHealthy confirmed still present at src/simulator/proxy.ts:395 so the type is in sync with the runtime API.
  • One stray test (the legacy waitForForwarding variant) was deleted — it was a duplicate of an existing waitForTarget test that the merge had brought in twice. Removal preserves coverage.
  • Two mock fixtures updated from <html>iOS Devices</html> (the old /-root probe response shape) to [] (the new /json probe response shape from main's PR fix(cli): audit auto-spawns proxy; doctor probes /json for -F compat #696). Two forwardingPolled assertions switched from u.includes('/json') (now matches both ports under -F mode) to u.includes(\localhost:${proxy.port}`)` (forwarding port only) — that's the precise behavioural distinction the original test was asserting; the rewording preserves intent. 28/28 tests in this file pass post-merge.

Verdict on the merge: correct, well-scoped, well-reasoned.

2.3 Codex-driven fixes (fabe0ab1)

P1 device scoping (src/tools/network-intercept.ts:9-37):

  • New constants: DEFAULT_INTERCEPTOR_SCOPE (existing) + DEFAULT_DEVICE_SCOPE (new). makeInterceptorKey(sessionId, deviceId) returns ${s}|${d} with defaults when either is omitted.
  • getNetworkInterceptorForSession(sessionId?, deviceId?) — second argument is optional so existing call sites that pass only sessionId keep working (verified src/tools/network-offline.ts is the only other caller). Backward-compatible.
  • Pipe-character (|) is a safe key separator: simulator UDIDs are hex strings, MCP session IDs in this repo are UUIDs or __default__; neither contains |. No key-collision risk.

P2 action validation (src/tools/network-intercept.ts:53-72):

  • Pre-condition explicit: rawAction must be undefined, 'block', or 'modify'. Anything else (typo, number, boolean, object) throws with the actual offending value JSON-stringified into the error message. This matches the project's pattern of surfacing structured errors over silent coercion (compare app-type-element.ts:170 for the symmetric backend-choice validation).
  • mapRule is now exported with @internal JSDoc — appropriate marker for "exported for unit testing, not for runtime consumers".

Same fix applied to src/tools/network-offline.ts:18-26 — extracts deviceId once, threads through to getNetworkInterceptorForSession, with comment explaining the cross-device-bleed motivation.

Verdict on fabe0ab1: minimal, targeted, no over-reach.


3. Project conventions

Convention Adherence
Commit message style (subject + body + verification + refs) ✅ All three release commits follow it; matches the existing develop-side cadence (fix(scope): subject (#PR), body explains why with file:line refs).
CHANGELOG format (Headline / Fixed / Added / Refactored / Performance / Security / Migration notes) ✅ Matches [0.5.0] and [0.6.0] sections verbatim in structure. Per-PR refs included.
JSDoc on exported helpers mapRule and isSecureFieldDescriptor both carry intent-explaining JSDoc rather than restating the signature.
Inline rationale comments tie back to issue/PR numbers ✅ Codex-driven fixes carry "Codex review on PR #762" markers; pasteboard secure-field carries "#760" markers. Future readers can trace the why.
import type discipline ✅ Type-only imports preserved where existing (no unnecessary value imports of types).

4. Performance implications

  • CLI lazy-load (d0e37d97 indirectly, via inherited Lazy-load CLI commands and MCP tool handlers to improve startup speed #700 a) — confirmed still in place post-merge; the audit action's lazy-import block in cli/index.ts:361-385 actually grows by one entry (WebInspectorProxy), but the import shape stays Promise.all([import(...), ...]) so cold-start cost remains proportional to the slowest module.
  • network-intercept map keying — composite key string concat is O(1); the Map<string, NetworkInterceptor> lookup is unchanged in complexity. No measurable perf delta.
  • mapRule validation — three equality checks per call before the existing branch logic. Negligible; this is a per-tool-invocation cost.
  • No new hot paths introduced.

5. Test coverage

Layer New tests Notes
tests/unit/pasteboard-input.test.ts (#761) +10 secure-field skip via role/traits/lower-case alias + non-secure regression + legacy 3-arg compat + isSecureFieldDescriptor truth table.
tests/unit/app-type-element.test.ts (#761) +2 simhid verifyTypedText returns 'unknown' on AXSecureTextField via trait and via role.
tests/unit/network-intercept-tool.test.ts (fabe0ab1) +10 device scoping (3) + action validation (7), including non-string action rejection and urlPattern required precondition.
tests/unit/proxy-timing.test.ts (merge) +0 net (1 stray deleted, 2 fixtures updated) Coverage intent preserved by the existing waitForTarget describe block.

Total: +22 unit tests vs. baseline at the v0.6.0 tag. Suite is 2666 / 177 green.

Gap (acknowledged in #761 issue body and #763):

Both gaps are documented and ticketed. Not blockers for a catch-up release.


6. Security considerations

  • mapRule action validation is itself a defense-in-depth measure — it closes a soft path where a malformed MCP payload could silently turn a network_intercept call into a request-rewrite. After this PR, the only way to reach the mock-response code path is action === 'modify' exactly. Net security improvement.
  • Composite (sessionId, deviceId) keying prevents a cross-device interceptor-state bleed that could in theory leak request bodies between simulators under multi-device MCP sessions. Net security improvement.
  • AXSecureTextField skip (fix(pasteboard): skip readback for AXSecureTextField + honor verify:false (#760) #761) does NOT weaken any security boundary: the only thing it changes is the verifier no longer compares against an OS-mandated bullet mask. The actual paste still happens through the same simctl pbcopy + Cmd+V path with the same iOS paste-permission dialog (already auto-handled).
  • HTTP MCP transport hardening (inherited from develop fix(security): harden HTTP MCP transport defaults #714 series) lands here. Already reviewed on develop; no new HTTP-side surface introduced in fabe0ab1.
  • No new auth surfaces, no new file-system write paths, no new shell-out invocations in the net-new commits.

7. Risks

Risk Severity Mitigation
Catch-up release diff is large (269 files) Low for review difficulty / Medium for rollback complexity Every commit pre-reviewed on develop; rollback path is git revert --no-commit of the merge into main then re-tag — clean because the commit is a true two-parent merge.
path-guard CI advisory failure None Documented in the approve summary; release PRs always trip this guard because they legitimately bundle omofictions-qa.yml updates with src/** changes. Not a required check.
HTTP session-leak still unfixed (Codex P2 #3) Low Deferred to #763 with full repro + fix outline. Only affects long-running HTTP deployments; STDIO (default) unaffected. Will be addressed against develop independently.
build CI still in progress at review time None lint / test / dep-audit all green; the build job exercises Swift bridge compilation + webpack, which has run successfully on every prior commit in this chain.

8. Verdict

APPROVE.

Rationale, in order of weight:

  1. Codex review verdict already approved: 👍 reaction on the PR + "Codex Review: Didn't find any major issues. More of your lovely PRs please." on commit fabe0ab1 (2026-05-17T08:48:13Z).
  2. Net-new code is minimal and tightly scoped: three release commits totaling ~520 lines added across 8 files; the rest is inherited develop work that already passed its own review.
  3. All Codex line-level findings handled: two fixed inline with new unit tests; one deferred to Evict NetworkInterceptor map entries on MCP session deletion (HTTP transport) #763 with a public rationale and a fix outline — no findings dismissed.
  4. Test suite is healthier than baseline: +22 unit tests vs. v0.6.0, all green; tsc clean.
  5. No public-API breakage, no schema changes, no new tool surfaces. The only behavioural delta callers can observe is that verify: false now actually takes effect on the pasteboard backend (which was already its documented contract).
  6. Project conventions followed: commit style, CHANGELOG structure, JSDoc, error-shape, comment-ties-to-issue-number — all match the develop-side cadence.

Recommendation: merge once build CI turns green. path-guard advisory failure is non-blocking and is the expected behaviour of that guard on release PRs.

/review skill, senior-engineer pass.

@shaun0927 shaun0927 merged commit d09d123 into main May 17, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants