Skip to content

feat(browser): interactive browser viewing UI with CDP screencast#531

Open
penso wants to merge 78 commits intomainfrom
claude/plan-browser-viewing-HOHno
Open

feat(browser): interactive browser viewing UI with CDP screencast#531
penso wants to merge 78 commits intomainfrom
claude/plan-browser-viewing-HOHno

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 31, 2026

Summary

Adds a full browser viewing and interaction UI to the Settings > Browser page. Users can create browser sessions, view them live via CDP screencast, interact with mouse/keyboard/scroll, and review session history with action logs. Per-agent browser profiles provide cookie isolation.

Key features:

  • Live browser viewing via CDP screencast with mouse, keyboard, and scroll relay
  • URL bar with Google suggestions autocomplete, auto-https, and live URL sync
  • Session management: create, switch, close with instant screenshot prefetch
  • Persistent session history with per-session action log (SQLite)
  • Per-agent browser profiles for cookie isolation (UI=default, agents=session_key)
  • Shared BrowserManager between tool and UI service
  • Viewport auto-resize on screencast start for proper content rendering
  • Scrollbar overlay with click-to-scroll
  • Cookie persistence across restarts via shared profile directories

Technical highlights:

  • Screencast frames delivered via WebSocket event subscription (v4 protocol)
  • rAF-gated canvas rendering for smooth display
  • Coordinate mapping using image natural dimensions + offset_top correction
  • Wheel events batched (50ms), mousemove throttled to avoid flooding CDP
  • CDP requires both deltaX/deltaY for mouseWheel and windowsVirtualKeyCode for special keys
  • Sessions persist for 2 hours (idle + hard TTL), no stop-on-switch to avoid Chrome crashes
  • Screencast relay keeps running when no subscribers (avoids race condition)
  • Screenshot prefetch cache with live frame updates for instant session switching

Validation

Completed

  • cargo +nightly-2025-11-30 fmt --all -- --check
  • cargo clippy -p moltis-browser -p moltis-gateway -p moltis-tools --all-targets -- -D warnings
  • cargo test -p moltis-browser -p moltis-gateway
  • npx biome check --write (JS)
  • Integration tests: screencast_host, screencast_sandbox, click_dispatches, scroll_dispatches, screencast_metadata_valid
  • Playwright e2e test spec: browser.spec.js + smoke route

Remaining

  • ./scripts/local-validate.sh
  • Manual QA on fresh install

Manual QA

  1. Go to Settings > Browser
  2. Click "New Session" — placeholder appears with "creating" badge
  3. Enter a URL — autocomplete shows Google suggestions
  4. Verify screencast displays and updates live
  5. Click links — URL bar updates automatically
  6. Scroll with mouse wheel or scrollbar overlay
  7. Type in text fields (including backspace/special keys)
  8. Create second session, switch between them — instant cached frame
  9. Close a session — appears in History section
  10. Click closed session — action log displayed
  11. Restart moltis — cookies persist (no captcha on revisit)
  12. Agent browser sessions use separate cookie profile from UI sessions

Future Work

  • WebRTC upgrade for sub-100ms latency (reference: neko project)
  • Binary WebSocket frames instead of base64 JSON (~33% bandwidth savings)
  • Video recording/replay of browser sessions
  • Scrollbar drag support (currently click-to-scroll only)

Add a browser tab to the settings page that shows active browser sessions
and allows viewing them via CDP screencast. Users can log in to websites
manually and export cookies for agent automation.

Backend:
- Add screencast module (ScreencastRegistry) for CDP frame relay
- Add mouse/keyboard input, cookie export/import to BrowserManager
- Add browser.screencast.frame protocol event
- Add /api/browser/action and /api/browser/sessions endpoints
- Wire screencast frame broadcasting to WebSocket clients

Frontend:
- Add page-browser.js with canvas-based screencast viewer
- Session list with view, export cookies, close actions
- Mouse/keyboard input relay to browser viewport
- Navigation bar for URL entry
- Register as "Browser" tab in settings page

https://claude.ai/code/session_015M64wR6GhhnyiEFodhAuAH
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 31, 2026

Merging this PR will improve performance by 21.75%

⚡ 1 improved benchmark
✅ 38 untouched benchmarks
⏩ 5 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
env_substitution 12.2 µs 10.1 µs +21.75%

Comparing claude/plan-browser-viewing-HOHno (6663f11) with main (a113473)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR introduces real-time browser session viewing and interactive control to the web UI. A new CDP screencast pipeline streams JPEG frames from Chromium via broadcast channels (ScreencastRegistry → adapter task → WebSocket relay), and new action types handle mouse/keyboard input and cookie import/export. The overall architecture is reasonable, but there are three issues that affect correctness of the streaming path and one dead-code inconsistency:

  • Relay task exits permanently on zero receivers (screencast.rs): The relay task breaks when tx.send returns an error (no receivers). Because the initial receiver _rx from start_screencast is immediately dropped, any frame arriving before spawn_screencast_relay subscribes causes the relay to exit permanently — subsequent subscribers get a live receiver but no frames.
  • Duplicate relay tasks per session (api.rs): spawn_screencast_relay is called unconditionally on every StartScreencast success with no check for an existing relay. Repeat calls spawn N adapter tasks and N broadcast relay tasks, delivering N copies of each frame to all WebSocket clients.
  • Dead code in BrowserManager::list_sessions (manager.rs): The screencasting enrichment loop computes contains() but discards the result via let _ = ...; the field has not been added to BrowserSessionInfo. The enrichment is handled correctly in services.rs, making this loop purely wasteful.
  • Scroll not forwarded (page-browser.js): No wheel event listener is attached to the canvas, so MouseInputType::Wheel (fully wired on the backend) is unreachable from the UI.

Confidence Score: 4/5

Safe to merge with caution — the three P1 findings affect screencast reliability but don't risk data loss or security; most core functionality works correctly

Three P1 findings exist: the relay task can permanently stop if a frame arrives before the subscriber connects, duplicate relay tasks are spawned on repeated StartScreencast calls, and dead code in list_sessions performs unnecessary async work. None cause crashes or data corruption, but the screencast feature may silently stop delivering frames in some conditions and will deliver duplicate frames on reconnect.

crates/browser/src/screencast.rs (relay lifetime), crates/web/src/api.rs (relay dedup), crates/browser/src/manager.rs (dead code in list_sessions)

Important Files Changed

Filename Overview
crates/browser/src/screencast.rs New module implementing CDP screencast relay — relay task exits permanently when no receivers are present, and has a TOCTOU race between the read-lock check and write-lock insert in start()
crates/web/src/api.rs New browser action and session handlers — spawn_screencast_relay has no dedup guard, spawning N duplicate relay tasks on N StartScreencast calls for the same session
crates/browser/src/manager.rs Extended with screencast/input/cookie actions — list_sessions contains dead code that computes screencasting status but discards the result with let _ = ...
crates/browser/src/types.rs New action variants and supporting types added cleanly with proper serde defaults
crates/browser/src/pool.rs Added BrowserSessionInfo and list_sessions() — clean implementation reading session metadata under the existing RwLock
crates/gateway/src/services.rs Implements list_sessions and subscribe_screencast — each subscribe_screencast call spawns a new adapter task with no dedup, contributing to the duplicate relay problem
crates/web/src/assets/js/page-browser.js New Preact browser viewer page — solid structure with coordinate scaling and canvas rendering, but missing wheel event relay for scroll support
crates/service-traits/src/lib.rs Extended BrowserService trait with list_sessions and subscribe_screencast with sensible no-op defaults — clean trait extension
crates/protocol/src/lib.rs Registers three new event types for screencast — straightforward additions to KNOWN_EVENTS
crates/web/src/assets/js/page-settings.js Adds the Browser nav entry and page section handler — clean integration using the existing settings SPA pattern

Sequence Diagram

sequenceDiagram
    participant UI as Browser UI (page-browser.js)
    participant API as Web API (api.rs)
    participant GW as Gateway (services.rs)
    participant REG as ScreencastRegistry (screencast.rs)
    participant CDP as Chromium CDP

    UI->>API: POST /api/browser/action {action: start_screencast}
    API->>GW: browser.request(body)
    GW->>REG: screencasts.start(session_id, page, ...)
    REG->>CDP: Page.startScreencast
    REG-->>GW: broadcast::Receiver (_rx, dropped immediately)
    GW-->>API: BrowserResponse::success
    API->>GW: subscribe_screencast(session_id)
    GW->>REG: registry.subscribe(session_id)
    REG-->>GW: raw broadcast::Receiver
    GW->>GW: spawn adapter task (ScreencastFrame → Value)
    GW-->>API: Value broadcast::Receiver
    API->>API: spawn relay task (Value → WebSocket broadcast)

    loop CDP screencast frames
        CDP->>REG: EventScreencastFrame
        REG->>REG: relay task: ack + tx.send(frame)
        REG-->>GW: adapter task receives frame
        GW-->>API: relay task receives Value
        API-->>UI: WS event browser.screencast.frame
    end

    UI->>API: POST /api/browser/action {action: stop_screencast}
    API->>GW: browser.request(body)
    GW->>REG: screencasts.stop(session_id, page)
    REG->>CDP: Page.stopScreencast
    REG->>REG: remove from registry, abort relay task
Loading

Reviews (1): Last reviewed commit: "feat(browser): add browser viewing UI wi..." | Re-trigger Greptile

Comment on lines +237 to +239
if tx.send(frame).is_err() {
debug!(session_id = %session_id, "no screencast subscribers, stopping relay");
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Relay task permanently exits when no subscribers, silently breaking future subscriptions

When tx.send(frame).is_err(), the relay task exits (line 239). This becomes a problem because of how the initial receiver is handled in start_screencast (manager.rs:799–802): the returned _rx is discarded immediately when that function returns. The relay task then becomes the sole holder of the tx side. If even a single CDP frame arrives between _rx being dropped and spawn_screencast_relay subscribing (api.rs:1141-1142), the relay task finds no receivers, exits, and is gone permanently.

After this point, any call to subscribe_screencast()registry.subscribe() will successfully hand out a new receiver from handle.tx.subscribe(), but since the relay task is dead no more frames will ever be sent into that channel. From the caller's perspective the screencast appears active but produces no frames.

A simple fix is to keep the relay running even when there are temporarily no receivers:

if tx.send(frame).is_err() {
    debug!(session_id = %session_id, "no screencast subscribers, frame dropped");
    // do NOT break — a subscriber may connect shortly
}

Alternatively, hold a permanent "keep-alive" receiver inside ActiveScreencast.

Comment on lines +1140 to +1143
if is_start_screencast {
if let Some(sid) = session_id {
spawn_screencast_relay(state, &sid).await;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Multiple relay tasks spawned per session causes duplicate frame delivery

Every successful StartScreencast request unconditionally calls spawn_screencast_relay, which in turn calls subscribe_screencast (services.rs:1398-1429). subscribe_screencast always spawns a brand-new adapter task and returns a new Value receiver. spawn_screencast_relay then always spawns an additional broadcast relay task.

There is no guard to check whether a relay is already running for the given session_id. Calling StartScreencast N times for the same session creates N adapter tasks and N relay tasks — every CDP frame gets forwarded N times to all WebSocket clients.

In the UI, startScreencast is called from the "View" button with no dedup, so a user double-clicking or page-refreshing will trigger duplicate relays.

Consider tracking active relay sessions server-side, e.g. via a DashSet<String> in AppState, and skipping spawn_screencast_relay if an entry already exists.

Comment on lines +1112 to +1122
pub async fn list_sessions(&self) -> Vec<crate::pool::BrowserSessionInfo> {
let mut sessions = self.pool.list_sessions().await;
// Annotate which sessions have active screencasts.
let screencast_sessions = self.screencasts.active_sessions().await;
for session in &mut sessions {
// The BrowserSessionInfo doesn't have a screencast field yet,
// but clients can check the screencast endpoint separately.
let _ = screencast_sessions.contains(&session.session_id);
}
sessions
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dead code: screencast_sessions.contains(...) result is silently discarded

The loop body computes whether a session is screencasting but throws the result away with let _ = .... The BrowserSessionInfo struct has no screencasting field, so the information never reaches callers.

The enrichment is correctly performed independently in services.rs, so the actual API response is accurate — but the code inside this loop is completely useless: it performs an async allocation (active_sessions() returning a Vec<String>) and O(n) lookup per session for no effect.

Either add screencasting: bool to BrowserSessionInfo and populate it here (removing the duplicate logic in services.rs), or remove this loop entirely:

pub async fn list_sessions(&self) -> Vec<crate::pool::BrowserSessionInfo> {
    self.pool.list_sessions().await
}

Comment on lines +105 to +149
// If already active, return a new subscriber.
{
let active = self.active.read().await;
if let Some(handle) = active.get(session_id) {
debug!(session_id, "screencast already active, adding subscriber");
return Ok(handle.tx.subscribe());
}
}

// Start CDP screencast via the builder pattern.
let params = StartScreencastParams {
format: Some(StartScreencastFormat::Jpeg),
quality: Some(i64::from(quality.min(100))),
max_width: Some(i64::from(max_width)),
max_height: Some(i64::from(max_height)),
every_nth_frame: Some(1),
};

page.execute(params)
.await
.map_err(|e| crate::error::Error::Cdp(format!("failed to start screencast: {e}")))?;

let (tx, rx) = broadcast::channel(FRAME_CHANNEL_CAPACITY);

// Spawn background task to relay CDP screencast frame events.
let tx_clone = tx.clone();
let sid = session_id.to_string();
let page_clone = page.clone();

let task = tokio::spawn(async move {
relay_screencast_frames(page_clone, tx_clone, sid).await;
});

let inner = Arc::new(ActiveScreencast {
tx: tx.clone(),
abort: task.abort_handle(),
});

self.active
.write()
.await
.insert(session_id.to_string(), inner);

debug!(session_id, "screencast started");
Ok(rx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TOCTOU race: two concurrent start() calls for the same session can both spawn relay tasks

The existence check (line 107, read lock) and the insert (line 143, write lock) are not atomic. Two concurrent callers for the same session_id can both pass the read-lock check, both send the CDP StartScreencast command, and both spawn a relay task. The second insert overwrites the first Arc<ActiveScreencast>, dropping it and aborting the first relay task, while two CDP start commands have been sent.

Fix by holding the write lock for the entire operation:

let mut active = self.active.write().await;
if let Some(handle) = active.get(session_id) {
    return Ok(handle.tx.subscribe());
}
// … execute CDP command, spawn task …
active.insert(session_id.to_string(), inner);

Comment on lines +365 to +381
function onMouse(e) {
relayMouseEvent(e, canvas);
}
canvas.addEventListener("mousedown", onMouse);
canvas.addEventListener("mouseup", onMouse);
canvas.addEventListener("mousemove", onMouse);

// Keyboard: focus the canvas to receive key events
canvas.setAttribute("tabindex", "0");
canvas.addEventListener("keydown", relayKeyEvent);
canvas.addEventListener("keyup", relayKeyEvent);

return () => {
canvas.removeEventListener("mousedown", onMouse);
canvas.removeEventListener("mouseup", onMouse);
canvas.removeEventListener("mousemove", onMouse);
canvas.removeEventListener("keydown", relayKeyEvent);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Scroll/wheel events not relayed — MouseInputType::Wheel is unreachable from the UI

The canvas event listeners cover mousedown, mouseup, and mousemove, but there is no wheel listener. The backend has full support for MouseInputType::Wheel mapped to CDP MouseWheel, but the frontend never generates it. Users cannot scroll pages within the browser viewer.

Suggested change
function onMouse(e) {
relayMouseEvent(e, canvas);
}
canvas.addEventListener("mousedown", onMouse);
canvas.addEventListener("mouseup", onMouse);
canvas.addEventListener("mousemove", onMouse);
// Keyboard: focus the canvas to receive key events
canvas.setAttribute("tabindex", "0");
canvas.addEventListener("keydown", relayKeyEvent);
canvas.addEventListener("keyup", relayKeyEvent);
return () => {
canvas.removeEventListener("mousedown", onMouse);
canvas.removeEventListener("mouseup", onMouse);
canvas.removeEventListener("mousemove", onMouse);
canvas.removeEventListener("keydown", relayKeyEvent);
canvas.addEventListener("mousedown", onMouse);
canvas.addEventListener("mouseup", onMouse);
canvas.addEventListener("mousemove", onMouse);
canvas.addEventListener("wheel", onMouse, { passive: false });

Also add a wheel case in relayMouseEvent and pass e.deltaX/e.deltaY in the action payload, mapping them to DispatchMouseEventParams on the Rust side.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

penso added 25 commits April 1, 2026 09:09
Three fixes for the browser viewing UI:

1. Strip explicit null values from LLM tool call params before
   deserialization — serde(default) only handles missing keys, not
   null values, causing "invalid type: null, expected u64" errors.

2. Share a single BrowserManager between BrowserTool and
   RealBrowserService via Arc<OnceCell>. Previously each had its own
   manager, so sessions created by agents were invisible to the UI.

3. Add "New Session" button to the browser page so users can create
   sessions directly from the UI (navigate to about:blank + auto-start
   screencast). Agents share the same cookie profile.

Entire-Checkpoint: 6719842d0d2c
Use provider-btn-sm and provider-btn-secondary classes consistent
with the MCP and Skills pages.

Entire-Checkpoint: 769d0e7ad9b2
…on styles

- Allow about:blank in URL validation so "New Session" can create an
  empty browser session without navigating to an external URL.
- Use provider-btn / provider-btn-sm classes on all session card buttons
  to match the style used on MCP, Skills, and other pages.

Entire-Checkpoint: 0402a72e2760
- Don't auto-start screencast on about:blank (no frames generated)
- Show navigate bar immediately so user can enter a URL
- Auto-start screencast after navigating to a real page
- Show "Enter a URL above" hint in the canvas placeholder

Entire-Checkpoint: 17379e8abfdf
The service's request() now injects sandbox=true when the field is
absent, matching the behavior of the BrowserTool which reads from
the sandbox router. Without this, clicking "New Session" in the UI
would always launch a host browser even when sandbox is enabled.

Entire-Checkpoint: af07d8ac7130
- Auto-prepend https:// for bare domains (e.g. "lemonde.fr")
- Fall back to Google search for non-URL queries
- Show Google suggestions, session URL history, and direct URL
  matches in a dropdown with keyboard navigation
- Check res.success on navigate responses to surface errors
- Match input and Go button heights using provider-btn-sm sizing

Entire-Checkpoint: eb57efdb4ba2
browserAction() now checks res.success on every response and throws
on failure. Previously errors returned as 200 with success=false
were silently swallowed (e.g. invalid URLs, connection failures).

Entire-Checkpoint: 76e24ac2897a
The relay task exited immediately when tx.send() had no receivers,
which races with the UI subscribing shortly after. CDP sends the
first frame before spawn_screencast_relay can subscribe, causing the
relay to exit and all subsequent frames to be lost.

The relay now drops frames when no one is listening instead of
exiting. The task is still properly cleaned up via its abort_handle
when the screencast is stopped.

Entire-Checkpoint: 28b9900be3c1
- Add integration test that verifies CDP frames arrive through the
  ScreencastRegistry broadcast channel (passes in host mode).
- Add info/warn tracing at every stage of the screencast relay chain:
  CDP frame listener, service subscribe, API relay spawn.
  This will surface exactly where frames get stuck.

Entire-Checkpoint: fcdec9d1c2d8
The v4 protocol requires explicit event subscription. The
browser.screencast.frame event was missing from the subscription
list, so all frames were silently dropped by the broadcast filter.

Entire-Checkpoint: 42a252e4a873
Rust integration tests (crates/browser):
- screencast_host: verifies CDP frames arrive in host mode
- screencast_sandbox: verifies CDP frames arrive in container mode
  (auto-skips when no container runtime available)

Playwright e2e tests (crates/web/ui/e2e/specs/browser.spec.js):
- Browser page renders with heading and buttons
- Empty state message shown when no sessions
- New session button shows creating state
- Navigate bar appears after session creation
- Bare domains auto-normalized with https://
- Screencast delivers frames after navigation (canvas visible)
- Session can be closed
- Sandbox badge rendered when sandbox enabled

Smoke test: added /settings/browser route.
Entire-Checkpoint: 404370604901
…on fixes

Interaction improvements:
- Add mouse wheel/scroll relay via CDP mouseWheel events with
  delta_x/delta_y support
- Prevent default on mousedown (no text selection/drag on canvas)
- Block context menu on canvas
- Auto-focus canvas on mousedown for keyboard capture

Session list UX:
- Clicking a session card selects it and auto-starts screencast
- Active session highlighted with accent border
- Removed redundant View/Stop Viewing buttons — card click handles it
- Buttons (Export Cookies, Close) use stopPropagation to not trigger
  card selection

Entire-Checkpoint: 872e5b20d786
When switching sessions, the canvas was blank until the first
screencast frame arrived via WebSocket. Now fetches an immediate
screenshot via REST API when selecting a session, displaying it
on the canvas while the screencast stream connects.

Also tracks frame MIME type (PNG for screenshots, JPEG for
screencast frames) for correct canvas rendering.

Entire-Checkpoint: 4cf5c9dfbb6b
- Show "Fetching browser view..." while taking screenshot on session
  switch instead of a blank canvas
- Show "paused" badge on sessions that have a URL but aren't being
  actively viewed — clicking them makes them live
- Clear frame data when switching sessions to avoid showing stale
  content from the previous session

Entire-Checkpoint: b8875017d863
…ify cards

- Prefetch screenshots for all sessions in the background after
  fetching the session list — switching is instant from cache
- Clear frame data, screencast state, and stop previous screencast
  when creating a new session (no stale canvas from previous session)
- Reset URL bar to the selected session's URL when switching
- Remove Export Cookies button, keep only a small Close link
- Cache screenshots on first fetch to avoid re-fetching

Entire-Checkpoint: 3da290c6da8b
When clicking "New Session", a placeholder card with "creating" badge
and "Starting browser..." appears immediately in the session list,
selected and highlighted. Once the backend finishes, the placeholder
is replaced with the real session. On failure, the placeholder is
removed.

Entire-Checkpoint: 325a71fa44b7
The canvas is conditionally rendered — placeholders show when there's
no frame data. The useEffect with [] deps ran once on mount when the
canvas didn't exist yet, so mouse/keyboard/scroll listeners were
never attached.

Switched to a ref callback that fires whenever the canvas DOM
element appears or disappears, properly attaching and cleaning up
event listeners. Clicks, scrolling, and keyboard input now work.

Entire-Checkpoint: a2823835c7b2
Wheel events fire at 60fps and each one was creating a separate HTTP
request, overwhelming the server and causing scroll to feel broken.

- Batch wheel deltas into a single request every 50ms
- Throttle mousemove to one event per 50ms (was flooding at 60fps)
- Click (mousedown/mouseup) events remain unthrottled for accuracy

Entire-Checkpoint: 3ce2f51400c7
Major fixes:
- Session switching no longer resets activeSession to null (was
  causing blank canvas and lost state)
- Frame listener stays active across session switches (no gap where
  frames are dropped)
- fetchSessions preserves placeholder "creating" entries
- Removed noisy start/stop screencast toasts during switches
- Screenshot fetch errors trigger session list refresh (handles
  dead sessions gracefully)
- Guard against stale async results when user switches rapidly
- Canvas auto-focuses when it appears for immediate keyboard input
- Click coordinates use offsetX/offsetY for accurate mapping
  (fixes clicks landing too high due to border offset)
- Invalidate screenshot cache when navigating to new URL

Entire-Checkpoint: 5d1ab2347def
Root cause of broken scrolling: CDP rejects mouseWheel events when
deltaX or deltaY are missing. The code conditionally set them only
when non-zero, so vertical-only scrolls omitted deltaX and failed
with "deltaX and deltaY are expected for mouseWheel event".

Fix: always set both deltas for mouseWheel events.

Click accuracy fixes:
- Add offset_top from screencast metadata to y-coordinate (accounts
  for browser chrome/infobars)
- Use dynamic aspect-ratio from frame metadata instead of hardcoded
  16/10 (viewport is 16:9, mismatch caused y-axis distortion)

New integration tests:
- click_dispatches: verifies mousePressed+mouseReleased succeed
- scroll_dispatches: verifies mouseWheel with deltas succeeds
- screencast_metadata_valid: captures actual viewport dimensions
  for coordinate mapping validation

Tests use ephemeral profiles to avoid SingletonLock conflicts.

Entire-Checkpoint: f818e6d815b2
Root cause of captcha/content appearing tiny and off-center:
viewport was 2560x1440@2x (5K physical pixels). Websites laid out
for that resolution, then screencast squished everything down.

Changed default viewport to 1440x900@1x — a standard laptop
resolution. Content renders at a normal size and fills the canvas.

Other improvements:
- rAF-gated canvas rendering: frames are drawn at display refresh
  rate instead of on every signal change, avoiding wasted draws
- Screenshot coordinate mapping uses actual image natural dimensions
  (img.naturalWidth/Height) instead of hardcoded assumptions
- Canvas aspect-ratio follows actual frame dimensions dynamically
  instead of hardcoded 16/10

Entire-Checkpoint: 992c2ba77b21
The session ID was truncated to 12 chars + "..." but the card
already has CSS truncate class which handles overflow naturally.

Entire-Checkpoint: 7ba05c18a610
The URL bar now tracks the remote browser's current URL:

- Switching sessions instantly shows that session's URL
- Clicking links in the remote browser updates the URL bar
  (polled via CDP get_url every 2 seconds)
- Navigating via the URL bar updates it after page loads
- While typing, the live URL is paused; pressing Escape or
  clicking away reverts to the live URL

This makes the browser viewer feel like a real browser tab —
the URL bar always reflects what page you're looking at.

Entire-Checkpoint: 602407a16bde
Sessions were dying from a single CDP error (e.g. Chrome briefly busy
during mouse event) and then flooding logs with cleanup warnings from
every queued event hitting the dead session.

Fixes:
- Idle timeout: 5 min → 30 min (interactive browsing needs more time)
- Hard TTL: 30 min → 2 hours (matches container TIMEOUT via
  browserless_session_timeout_ms)
- cleanup_stale_session checks has_session() before closing, so only
  the first event triggers cleanup — subsequent queued events skip
  silently instead of spamming warnings
- Added BrowserPool::has_session() for the guard check

Entire-Checkpoint: 603c4e91eae2
penso added 20 commits April 2, 2026 10:14
… lazily

Two root causes fixed:

1. action_name from Display includes params (e.g. "mouse_input(x=1,y=2)")
   but the is_input_event check used exact matches. Changed to
   starts_with() so mouse_input/keyboard_input/evaluate are correctly
   recognized as non-fatal.

2. Action hook was set at startup when manager_if_ready() returned None
   (manager lazy-inits on first use). Moved hook registration into the
   manager() init closure so it's applied as soon as the manager exists.
   Session history now works for both agent and UI sessions.

Entire-Checkpoint: b696fa3b9617
Each session switch called start_screencast which spawns a new
WebSocket relay task. Since we stopped calling stop_screencast on
switch, old relay tasks were never cleaned up. After several
switches, multiple relays broadcast duplicate frames, flooding
the WebSocket and freezing the UI.

Fix: check the session's screencasting field from the API before
starting a new screencast. If already running, just ensure the
frame listener is active without spawning another relay.

Entire-Checkpoint: 2201c4b39939
Two issues fixed:

1. Sessions dying from connection errors were never marked as closed
   in the database — cleanup_stale_session removed from pool but
   didn't update SQLite. Now fires the action hook with "close"
   action and "connection lost" error so closed_at gets set.

2. History section only showed sessions with closed_at set. Sessions
   that died without proper close never appeared. Now shows all past
   sessions not currently active in the pool.

Entire-Checkpoint: 7148fca2419c
Replaced the stacked layout (live sessions + history below) with a
tabbed interface:

- "Live" tab shows active browser sessions with count badge
- "History" tab shows past sessions (closed or lost) with count
- Switching to History tab refreshes the list
- Past sessions show "closed" or "lost" badge based on whether
  they were explicitly closed or died from connection loss
- Clicking a history session shows its action log in the right panel

Entire-Checkpoint: 4f9536295073
…croll after switch

Entire-Checkpoint: da2f35c55cc4
Clicking a session in the History tab now creates a new browser
session and navigates to the last URL from the dead session. The
view switches to the Live tab automatically.

"View Log" link at the bottom of each history card still shows
the action log in the right panel.

Sessions without a URL (about:blank) show the action log instead
of reviving.

Entire-Checkpoint: 7815d593f70b
URL changes are now detected via CDP Page.frameNavigated events
instead of polling get_url every 2 seconds. The screencast relay
listens for navigation events alongside screencast frames and
includes the new URL in the frame payload when it changes.

This eliminates the get_url HTTP request every 2 seconds. Scroll
info polling remains at 5-second intervals (reduced from 2s).

Entire-Checkpoint: 72716eae41b2
The evaluate calls every 5 seconds for scroll info are gone. Scroll
position now comes from the screencast frame metadata (scroll_offset_y)
which is already included in every frame at zero cost.

Page height (scrollHeight) is queried once via evaluate when a
navigation event occurs, not polled. This means zero background
HTTP requests while viewing a browser session.

Entire-Checkpoint: e42cba470a49
Chrome by default discards session cookies (no expiry) when it exits.
Sites like LinkedIn use session cookies for auth, so logins were lost
between moltis restarts even with persistent profile directories.

Added --restore-session-cookies flag to both host browser launches
and containerized launches (via CHROME_FLAGS env var). This tells
Chrome to save session cookies to the Cookies database on exit and
restore them on next launch.

Entire-Checkpoint: f7b599438e01
Only truly fatal actions (navigate retry failure, close, snapshot)
trigger session cleanup on connection error. Screenshot, screencast,
mouse, keyboard, evaluate, and get_url/get_title are all non-fatal.

This was causing sessions to die when switching sessions triggered
a screenshot that timed out.

Entire-Checkpoint: 7fc3fb8d154f
When creating a new session with the same profile_id as an existing
running session, the pool now creates a new tab in the existing
browser instead of launching a new container. This shares cookies,
local storage, and login state between sessions in real time.

Implementation: BrowserInstance.browser is now Arc<Browser> so
the handle can be shared. get_or_create checks for an existing
sandboxed instance with the same profile_id and creates a new
BrowserInstance pointing to the same Browser (new tab).

This means: log into LinkedIn in one session, open a new session
with the same profile → already logged in.

Entire-Checkpoint: a1d6b26e98c8
Removed all connection-error cleanup from execute_action. Sessions
were being killed by screenshot timeouts, refresh failures, and
other transient errors. Navigate already has its own retry logic.

Now sessions only die from:
- Explicit close action
- Idle/TTL timeout (pool cleanup)
- Navigate retry failure (its own logic)

All other errors are returned to the caller without destroying
the session. Chrome may recover on the next request.

Entire-Checkpoint: 71776737468e
New config option [tools.browser] sandbox = false forces browsers to
run on the host instead of in containers. Saves ~200-400MB per
browser instance. Defaults to following the global sandbox mode.

Example in moltis.toml:
  [tools.browser]
  sandbox = false  # run browsers on host, not in containers

Entire-Checkpoint: 210224bcb5ef
Unit tests covering every major bug encountered during development:

types.rs (4 tests):
- null timeout_ms fails without stripping (Bug 1)
- all optional fields null after stripping (Bug 1)
- mouseWheel defaults deltas to zero (Bug 6)
- MouseInput Display includes params (Bug 13)

manager.rs (5 tests):
- key_to_vk Backspace=8, Enter=13 (Bug 7)
- key_to_vk arrows 37/38/39/40 (Bug 7)
- key_to_vk printable returns None (Bug 7)
- key_to_vk Delete=46, Escape=27, Tab=9 (Bug 7)

pool.rs (3 tests):
- dangling symlink detected by symlink_metadata (Bug 11)
- same profile_id produces same path (Bug 19)
- different profile_ids produce different paths (Bug 19)

screencast.rs (1 test):
- frame url serialized only when Some (Bug 15)

Entire-Checkpoint: b639577544f3
6 new e2e tests covering specific bugs:

- Bug 2: sessions created via REST API appear in UI list
  (validates shared BrowserManager between tool and service)
- Bug 15: URL bar shows target URL immediately on navigation
  (validates no flicker from poll overwriting)
- Bug 16: switching sessions rapidly causes no JS errors
  (validates no relay task accumulation)
- Bug 17: closed session appears in History tab
  (validates session history persistence and UI)
- Delayed highlight: creating session shows placeholder immediately
- Bug 12: dead session shows error and recovers
  (validates no stuck "Fetching browser view..." state)

Entire-Checkpoint: 0db97ecdd06f
The [tools.browser] sandbox config only affected UI sessions (via
RealBrowserService). The BrowserTool always used the SandboxRouter
(global exec sandbox mode), ignoring the browser-specific config.

Added sandbox_override to BrowserTool that takes precedence over
the router. Wired from config.tools.browser.sandbox in server.rs.

Also added stealth Chrome flags to reduce headless detection:
- --disable-blink-features=AutomationControlled (removes navigator.webdriver)
- --headless=new (more realistic headless mode)
- Realistic macOS Chrome user agent
- --disable-infobars, --disable-automation-extension

Entire-Checkpoint: f3232341748e
penso added 9 commits April 3, 2026 11:58
When the browser tool returns a result with a session_id, a
clickable link appears below the tool card in the chat:
"🌐 View browser session" → navigates to /settings/browser with
the session auto-selected.

The browser page reads the ?session= parameter from the URL hash
and auto-selects the matching session (with screencast).

Entire-Checkpoint: 9b682dff16a6
…f query params

The settings router strips query strings, so ?session=xxx was lost.
Now uses navigateToBrowserSession() to set a pending session ID
before calling navigate("/settings/browser"). The browser page
reads it on init and auto-selects the session.

Works for both live tool cards and history re-rendering.

Entire-Checkpoint: fa56d7b1add5
Two fixes:

1. Reviving a dead session now passes the old session_id to the
   navigate action. get_or_create registers the new browser under
   the same ID, so the session keeps its identity in the UI and
   history.

2. Host-mode browser launches now clean up stale SingletonLock/
   Cookie/Socket files before starting. Previously only sandbox
   launches did this, causing "Failed to create SingletonLock"
   errors when sandbox=false.

Entire-Checkpoint: ab20339a2d2b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants