Rescue coordinate taps with AX button press#690
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13a8d97b00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!best || distance < best.distance) { | ||
| best = { x: cx, y: cy, elementPath: node.path, distance }; |
There was a problem hiding this comment.
Choose innermost AXButton when multiple frames contain tap
The new non-modal rescue path selects candidates only by center distance (distance < best.distance), so if a tap point is inside overlapping/nested AXButton frames the algorithm can pick an ancestor button (or whichever is visited first when distances tie) instead of the intended inner control. In this flow that means AXPress can fire on the wrong element even though the coordinate was inside the correct target; this is especially likely in composed accessibility trees where wrapper and child buttons share similar centers. Consider adding a containment tie-breaker (e.g., smaller area/deeper path) before distance.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request enhances the app_tap tool by introducing a rescue path for non-modal screens, allowing coordinates within an enabled AXButton to trigger an AX press. This change aims to improve reliability for headless Flutter and semantics buttons when OS-level coordinate injection fails. Feedback suggests refining the selection heuristic for non-modal buttons to prioritize the smallest containing element rather than the one with the closest center, ensuring more accurate targeting of nested components.
| if (!best || distance < best.distance) { | ||
| best = { x: cx, y: cy, elementPath: node.path, distance }; | ||
| } |
There was a problem hiding this comment.
The implementation uses the closest center heuristic (distance < best.distance) for both modal snapping and non-modal rescue. However, the docstring (line 481) and PR description specify that for non-modal screens, it should pick the smallest enabled AXButton.
Using distance to center can lead to incorrect results when a small button is nested within a larger actionable container (e.g., a cell that is also a button). If the tap is closer to the large container's center than the small button's center, the larger one will be selected. Consider using the frame area as the primary selection criterion when modals.length === 0 to ensure the most specific element is targeted.
OpenSafari validation against a Flutter iOS app showed SimHID/PointerService coordinate taps returning success while GestureDetector surfaces did not receive input. The accessibility tree already exposed those surfaces as AXButton nodes, so app_tap now uses AXPress when a non-raw coordinate falls inside an enabled AXButton frame. Modal snapping remains radius-based; raw=true still preserves coordinate-only dispatch.\n\nConstraint: Xcode 26/iOS 26 coordinate injection remains experimental and can be a no-op even when the bridge returns success.\nRejected: Promote PointerService as the default | live probe still returned success without delivering the Flutter tap in this validation run.\nConfidence: medium\nScope-risk: moderate\nDirective: Keep raw=true available for callers that require exact coordinate injection rather than semantic button activation.\nTested: npm test -- --runTestsByPath tests/unit/app-tap-safety.test.ts --runInBand\nTested: npm run lint -- --quiet\nNot-tested: Full MCP live tap after rebuilding/restarting the OpenSafari server in the current Codex tool host.
13a8d97 to
bf4af9f
Compare
|
Addressed Gemini feedback in commit bf4af9f: non-modal AXButton rescue now prefers the smallest containing button frame, with a nested-button unit test covering the selection heuristic. Re-ran |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf4af9f017
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (nodeArea !== bestArea) return nodeArea < bestArea; | ||
| return distance < best.distance; |
There was a problem hiding this comment.
Break ties by depth for non-modal AX snap candidates
In non-modal mode, candidate selection only compares area and then center distance; if nested AXButtons share the same frame (a common wrapper/child semantics pattern), both values tie and the strict < check preserves the first node visited, so preorder traversal can still press an ancestor instead of the intended inner control. Fresh evidence: isBetterAXSnapCandidate returns false when nodeArea === bestArea and distance === best.distance, leaving the choice order-dependent. Please add a deterministic descendant/depth tie-breaker (for example, prefer longer path) so equal-frame overlaps do not trigger the wrong AXPress.
Useful? React with 👍 / 👎.
Summary
app_tap's AX rescue path beyond modal buttons.AXButtonframe,app_tapnow invokes AXPress instead of relying only on OS-level coordinate injection.raw: truecoordinate-only opt-out.Why
While validating Omofictions-App App Review issues,
app_treecould reliably see Flutter semantics buttons, butapp_tap/SimHID coordinate taps returned success without changing the app state and often failed post-tap context probing. PointerService probing also returned success without delivering the Flutter tap. The target controls were already exposed as actionableAXButtonnodes, so a semantic AXPress fallback is the safer headless behavior.Verification
npm test -- --runTestsByPath tests/unit/app-tap-safety.test.ts --runInBandnpm run lint -- --quietNotes
I did not live-retest through the Codex MCP host after this change because that host is already running the previously built OpenSafari server. The code path is covered at the tool unit level and should take effect after build/restart.