feat: expose OpenWork UI control plane and MCP bridge by benjaminshafii · Pull Request #1638 · different-ai/openwork

benjaminshafii · 2026-05-02T22:27:49Z

Summary

Add a provider-neutral OpenWork UI control plane so controllers can discover visible app state and execute semantic actions without DOM scraping.
Expose OpenWork session/composer capabilities through that control plane: list/open/rename/delete sessions, create tasks, read transcript/latest message, type/send/stop composer prompts, and scroll/focus the active session.
Add an OpenWork-owned local UI bridge plus the openwork-ui-mcp package so external MCP clients can use ui_status, ui_snapshot, ui_list_actions, and ui_execute_action.
Keep OpenAI Realtime as an optional Feature Preview driver that drives the generic control plane; the durable improvement is the OpenWork-owned action registry and MCP surface.
Extract the standalone HandsFree/Pilot app out of this repo so this PR now focuses on the actual OpenWork app/server/desktop improvements.

OpenWork improvements

Semantic UI control plane

New window.__openworkControl registry with snapshot(), listActions(), execute(), setEnabled(), and subscribe().
Domain-owned actions are registered by OpenWork UI/runtime code instead of by provider-specific automation.
Control state includes route/status narration and currently available actions, making automation safer and more inspectable.

Session and composer actions

Session controls include session.create_task, session.list_sessions, session.open, session.rename, session.delete, session.latest_message, and session.read_transcript.
Composer controls include composer.set_text, composer.send, and composer.stop.
Session surface also exposes scroll/focus actions used by controllers and future tests.

MCP-facing OpenWork bridge

OpenWork desktop starts a localhost, bearer-token-protected UI-control bridge and writes discovery metadata to Electron userData.
New packages/openwork-ui-mcp stdio MCP server proxies that bridge as MCP tools:
- ui_status
- ui_snapshot
- ui_list_actions
- ui_execute_action
New docs/mcp-ui-control-profile.md documents the intended semantic MCP profile for OpenWork UI control.

Optional Realtime preview driver

The Feature Preview Realtime controller remains isolated under shell/control-drivers/openai-realtime/.
It uses the generic OpenWork control plane rather than hard-wiring provider logic into session UI.
Server-side Realtime session creation stays isolated under apps/server/src/remote-control/openai-realtime.ts; long-lived OpenAI API keys do not go to the browser.

Architecture intent

This PR is not about making voice or OpenAI the foundation of OpenWork control. The durable layer is OpenWork-owned:

semantic app state/action discovery,
domain-owned action execution,
an MCP-compatible bridge for external clients,
replaceable drivers such as OpenAI Realtime, tests, demos, or HandsFree.

Screenshots

Feature Preview settings	Realtime activity pane

Composer typed through control plane	Status bar connector

Verification

Previously run on this branch:

pnpm --filter @openwork/app typecheck ✅
pnpm --filter openwork-server typecheck ✅
pnpm --filter openwork-server build:bin ✅
pnpm --filter @openwork/desktop package:electron:dir ✅
Packaged app smoke check via Chrome DevTools CDP:
- control actions registered ✅
- session.list_sessions returned 30 sessions ✅
- Realtime connected with live mic in prior packaged verification ✅

Latest extraction/MCP sanity checks:

pnpm install --lockfile-only ✅
node --check packages/openwork-ui-mcp/index.mjs ✅
node --check apps/desktop/electron/main.mjs ✅

…ion, and inline transcript panel Add app-native voice control via OpenAI Realtime WebRTC so users can drive visible UI actions hands-free through microphone input. - Provider-neutral control surface (window.__openworkControl) with snapshot, listActions, execute, setEnabled, and subscribe - OpenAI Realtime WebRTC bridge with mic input, server VAD, text output, and tool calling (snapshot, list_actions, execute_action, set_input, list_sessions, open_session) - Server endpoint POST /remote/session mints ephemeral client secrets with key from env store; no secrets in browser - Feature Preview settings tab with Realtime toggle, OpenAI key entry, mic selector/test, and transcript panel toggle - Inline right-side Voice transcript pane (not overlay) showing user speech, assistant responses, and tool call lifecycle - Session list/open control actions so voice can navigate by name - Electron mic permission plumbing and macOS entitlements - Stale mic device fallback (OverconstrainedError → system default)

vercel · 2026-05-02T22:27:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
openwork-app	Ready	Preview, Comment	May 4, 2026 6:53pm
openwork-den	Ready	Preview, Comment	May 4, 2026 6:53pm
openwork-den-worker-proxy	Ready	Preview, Comment	May 4, 2026 6:53pm
openwork-landing	Ready	Preview, Comment, Open in v0	May 4, 2026 6:53pm
openwork-share	Ready	Preview, Comment	May 4, 2026 6:53pm

github-actions · 2026-05-02T22:28:16Z

The following comment was made by an LLM, it may be inaccurate:

Add voice-accessible controls for renaming and deleting sessions, scrolling the current session to the top or bottom, and reading the latest visible message. Extend the Realtime tool surface so the model can use these actions directly while requiring explicit confirmation for deletion.

Reorganize the realtime voice-control PR so the generic OpenWork control surface lives independently from the OpenAI Realtime driver. Move session-owned control actions into the session domain, move the OpenAI browser driver and activity/status UI into a driver folder, and move backend Realtime session/tool setup out of server.ts.

…tatus bar, better mic test Activity panel: - Rename header from "Voice" to "Control" (generic surface, not voice-specific) - Replace colored role bubbles with softer tints: structure before effects - Add proper role labels ("You", "Assistant", "Tool") instead of raw role names - Add relative timestamps on entries ("now", "12s ago", etc.) - Add pending-dot animation for in-flight entries - Add dismiss (X) button to hide the panel inline - Add empty-state icon + descriptive copy - Reduce width from 300px to 280px for tighter proportion - Remove shell shadow on the aside (flat-first per DESIGN-LANGUAGE.md) Status bar control: - Replace round pill + text label with minimal icon-only button - Show compact state text ("Listening", "Connecting…", "Error") without truncation - Use MicOff icon for disconnect affordance - Remove background fills; use text color only for state (flatter) Feature Preview settings: - Thinner mic level bar (1.5px → cleaner) - Color-coded level: gray idle → accent low → green strong - Show numeric percentage during test - Remove Volume2 icon from test description - Tighter copy for mic test prompt

…ession The voice controller could list/open/rename sessions but couldn't read the content of the currently active session. "What's the last message?" would fail because the model didn't know it had access. Changes: - Add session.read_transcript control action (returns last N messages as readable text with session ID, title, and message count) - Add read_transcript tool to OpenAI Realtime tool schema - Add controller handler for read_transcript dispatching to the action - Improve system instructions: tell the model it CAN see session content and should always call read_transcript/get_latest_message before saying it cannot see the session - Better tool label for transcript reads in activity panel

…on composer When the user says something like "tell them I'll be there at 3" or "reply that looks good", the intent is to type and send that as a message in the active OpenWork session — not to get a response from the voice controller itself. Add REPLY INTENT instructions that tell the model to: 1. read_transcript to understand the on-screen conversation 2. compose the reply from the user's spoken words 3. set_input → composer.set_text with the reply 4. execute_action → composer.send Direct commands to the controller ("list sessions", "open settings") still get handled directly. When ambiguous, default to treating spoken input as a session reply — that's the most common intent when the user is looking at a conversation.

New standalone Electron menubar app at apps/pilot/ that controls macOS via voice. Pilot is the top-level control surface; OpenWork and other apps are connectable targets. What's included: - Electron main process: menubar tray, floating always-on-top panel, global hotkeys (⌘⇧; toggle panel, ⌘⇧L toggle listening) - System control via AppleScript IPC: - list/activate/launch apps - frontmost app detection - keystroke/key-combo injection - clipboard read/write - open URL - Preload bridge: window.__PILOT__.system.* for the UI and future Realtime driver - Floating panel UI: dark vibrancy glass, transcript area, status, mic button, empty state with hotkey hints - macOS entitlements: microphone + AppleScript automation - LSUIElement: true (no dock icon, menubar-only) - electron-builder config for packaging Verified: panel shows, detects frontmost app via AppleScript, counts 18 running apps. System IPC bridge functional. Next: wire up OpenAI Realtime driver with system tools, add OpenWork app connector protocol.

Pilot now owns the Realtime voice driver as the standalone macOS control app. What's included: - Main-process OpenAI Realtime session creation with local API key persistence so long-lived OpenAI keys never enter the renderer - Tool schema for macOS control: snapshot, list/frontmost apps, activate/launch app, type text, press key combo, clipboard read/write, and open URL - Renderer WebRTC Realtime driver with microphone capture, SDP exchange, data-channel tool-call handling, transcript logging, and tool results - Panel settings UI for saving the OpenAI key locally - Panel states for ready/connecting/listening/error and Realtime transcript/tool activity - Vite config so Pilot packages the static panel correctly Verified: - pnpm --filter @openwork/pilot build:ui - pnpm --filter @openwork/pilot package:dir - Launched packaged/dev Pilot panel; AppleScript frontmost-app and list-apps calls still work.

benjaminshafii · 2026-05-04T19:27:14Z

Superseded by the split PRs: #1644 for the OpenWork UI control/MCP bridge, and draft #1645 for built-in Realtime control.

vercel Bot deployed to Preview – openwork-landing May 2, 2026 22:27 View deployment

vercel Bot deployed to Preview – openwork-den May 2, 2026 22:28 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 2, 2026 22:28 View deployment

vercel Bot deployed to Preview – openwork-share May 2, 2026 22:50 View deployment

vercel Bot deployed to Preview – openwork-app May 2, 2026 22:50 View deployment

vercel Bot deployed to Preview – openwork-landing May 2, 2026 22:50 View deployment

vercel Bot deployed to Preview – openwork-den May 2, 2026 22:50 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 2, 2026 22:51 View deployment

vercel Bot deployed to Preview – openwork-app May 2, 2026 23:10 View deployment

vercel Bot deployed to Preview – openwork-share May 2, 2026 23:10 View deployment

vercel Bot deployed to Preview – openwork-landing May 2, 2026 23:10 View deployment

vercel Bot deployed to Preview – openwork-den May 2, 2026 23:10 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 2, 2026 23:10 View deployment

benjaminshafii changed the title ~~feat: realtime voice control mode with mic input, session navigation, and inline transcript panel~~ feat: add UI control plane with OpenAI Realtime driver May 2, 2026

vercel Bot deployed to Preview – openwork-share May 2, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-app May 2, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-landing May 2, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-den May 2, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 2, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-share May 3, 2026 02:04 View deployment

vercel Bot deployed to Preview – openwork-app May 3, 2026 02:04 View deployment

vercel Bot deployed to Preview – openwork-landing May 3, 2026 02:05 View deployment

vercel Bot deployed to Preview – openwork-den May 3, 2026 02:05 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 3, 2026 02:05 View deployment

vercel Bot deployed to Preview – openwork-share May 3, 2026 02:07 View deployment

vercel Bot deployed to Preview – openwork-app May 3, 2026 02:08 View deployment

vercel Bot deployed to Preview – openwork-landing May 3, 2026 02:08 View deployment

vercel Bot deployed to Preview – openwork-den May 3, 2026 02:08 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 3, 2026 02:08 View deployment

vercel Bot had a problem deploying to Preview – openwork-share May 3, 2026 23:09 Failure

vercel Bot had a problem deploying to Preview – openwork-den-worker-proxy May 3, 2026 23:09 Failure

vercel Bot had a problem deploying to Preview – openwork-app May 3, 2026 23:09 Failure

vercel Bot had a problem deploying to Preview – openwork-landing May 3, 2026 23:09 Failure

vercel Bot had a problem deploying to Preview – openwork-den May 3, 2026 23:09 Failure

vercel Bot deployed to Preview – openwork-share May 3, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-app May 3, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-landing May 3, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-den May 3, 2026 23:24 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 3, 2026 23:24 View deployment

feat: expose OpenWork UI control over MCP

00a41fb

vercel Bot deployed to Preview – openwork-share May 4, 2026 18:52 View deployment

vercel Bot deployed to Preview – openwork-app May 4, 2026 18:52 View deployment

vercel Bot deployed to Preview – openwork-landing May 4, 2026 18:53 View deployment

vercel Bot deployed to Preview – openwork-den May 4, 2026 18:53 View deployment

vercel Bot deployed to Preview – openwork-den-worker-proxy May 4, 2026 18:53 View deployment

benjaminshafii changed the title ~~feat: add UI control plane with OpenAI Realtime driver~~ feat: expose OpenWork UI control plane and MCP bridge May 4, 2026

benjaminshafii closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose OpenWork UI control plane and MCP bridge#1638

feat: expose OpenWork UI control plane and MCP bridge#1638
benjaminshafii wants to merge 9 commits intodevfrom
feat/realtime-voice-control-mode

benjaminshafii commented May 2, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

benjaminshafii commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjaminshafii commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

OpenWork improvements

Semantic UI control plane

Session and composer actions

MCP-facing OpenWork bridge

Optional Realtime preview driver

Architecture intent

Screenshots

Verification

Uh oh!

vercel Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

benjaminshafii commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benjaminshafii commented May 2, 2026 •

edited

Loading

vercel Bot commented May 2, 2026 •

edited

Loading