GitHub - drstrangerujn/aether: The invisible element that connects AI to your browser. MCP-native browser control for AI agents.

中文 · English

A · E · T · H · E · R
The invisible element that connects AI to your browser.

Quick Start · Hint Map · Safe Mode · Skill · Roadmap

Other browser agents fight websites. Aether doesn't.

Instead of faking fingerprints or managing headless browsers, Aether uses Playwright to drive a real Chromium instance with your saved sessions. One npm install, one command. No Chrome extension, no WebSocket relay, no manual setup. It runs headless on a VPS or headed on your desktop — same code, same MCP tools.

Why

You've tried letting AI control a browser. You know what happens.

Headless Chrome opens. No cookies, no login, nothing. You're a stranger to every site. Cloudflare blocks you. So you bolt on stealth plugins, fake the User-Agent, randomize Canvas. It works Tuesday, breaks Thursday. The site updates, you update, nobody wins. And even when you get past the gate, the AI stares at 50,000 tokens of raw DOM and clicks the wrong thing.

Aether skips all of that. Playwright launches a real browser. Pass --storage-state to restore your sessions. The AI sees a Hint Map — a structured perception of the page in 200-800 tokens — not raw DOM.

How

┌──────────────────────────────────────────┐
│  Chromium (Playwright)                    │
│                                          │
│  ✓ Real rendering   ✓ Saved sessions     │
│  ✓ Headless / VPS   ✓ Headed / desktop   │
│                                          │
│  ┌──────────────────────────────────┐    │
│  │ Injected Script (auto-loaded)   │    │
│  │  · Hint Map — page perception   │    │
│  │  · Self-Healing — auto-recover  │    │
│  │  · Auto Dismiss — kill popups   │    │
│  └────────────┬─────────────────────┘    │
└───────────────┼──────────────────────────┘
                │ Playwright (in-process)
┌───────────────┼──────────────────────────┐
│ Aether MCP Server                         │
│  · 20 tools · Safe Mode · Path Cache     │
└───────────────┼──────────────────────────┘
                │ MCP (stdio)
┌───────────────┼──────────────────────────┐
│ Any AI: Claude · GPT · Custom            │
└──────────────────────────────────────────┘

No Docker. No cloud. No API keys. One npm install.

Quick Start

1. Install

git clone https://github.com/drstrangerujn/aether.git
cd aether/server && npm install

Playwright auto-installs Chromium on npm install.

2. Start the server

# Headless (VPS / CI)
npm start

# Headed (see the browser)
npm run start:headed

# Restore saved sessions
node src/index.js --storage-state ~/.aether/session.json

3. Connect your AI

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "aether": {
      "command": "node",
      "args": ["/path/to/aether/server/src/index.js"]
    }
  }
}

That's it. No extension to load, no chrome://extensions, no Developer Mode.

CLI flags:

Flag	Default	Description
`--headless`	on	Run headless (no GUI)
`--headed`	off	Show the browser window
`--viewport 1920x1080`	1280x800	Browser viewport size
`--storage-state path`	—	Restore cookies/sessions from file
`--cdp ws://...`	—	Connect to an existing browser via CDP
`--locale zh-CN`	zh-CN	Browser locale
`--user-agent "..."`	—	Custom User-Agent

Hint Map

The core idea: don't show the AI raw HTML or screenshots. Show it a structured perception of the page.

{
  "summary": "47 elements across [nav, main, sidebar] 🍪 cookie banner",
  "interactables": [
    { "id": "h0", "type": "input",  "region": "main", "priority": 90, "placeholder": "Search..." },
    { "id": "h1", "type": "button", "region": "main", "priority": 85, "text": "Search" },
    { "id": "h2", "type": "link",   "region": "nav",  "priority": 60, "text": "Electronics" }
  ],
  "content": {
    "semantics": { "price": ["¥299", "¥199"], "count": ["128 reviews"] },
    "tables": [{ "headers": ["Name", "Price", "Rating"], "rowCount": 25 }]
  },
  "state": { "popup": false, "login": false, "captcha": false, "cookieBanner": true }
}

What makes this different from accessibility snapshots or DOM dumps:

Regions — every element knows if it's in the nav, main content, sidebar, or a modal. The AI can ignore the noise.
Priority scoring — elements are ranked by viewport visibility, size, actionability. Modals sort first.
Semantic extraction — prices, dates, emails, counts, percentages are pre-extracted. No regex needed by the AI.
Deduplication — overlapping elements at the same pixel position are collapsed.
One-line summary — the AI reads one sentence and knows where it is.

Token cost: ~200-800 tokens per page, vs 10,000-50,000 for raw DOM.

Safe Mode

When the AI tries to click "Delete Account" or "Confirm Payment", Aether intercepts it:

{
  "_aether_approval_required": true,
  "category": "delete",
  "description": "Action: click | Target: \"Delete Account\"",
  "options": [
    "approve       — execute this action",
    "approve_all   — always allow this category",
    "reject        — cancel this action",
    "reject_always — never allow this category"
  ]
}

The AI presents this to you. You decide. Categories: payment, delete, send, account, download, code_execution.

Set approve_all once for a category you trust, and it never asks again. Set reject_always for categories you want locked down permanently.

Tools (20)

	Tool	What it does
👁	`get_hint_map`	Structured page perception. Always call first. Auto-dismisses popups.
👁	`screenshot`	Viewport capture
👁	`full_screenshot`	Full page capture
👁	`extract`	Pull text from elements
👁	`detect_qr`	Find & extract QR codes (login, payment)
✋	`navigate`	Open URL
✋	`click`	Click by hint ID or text. Self-heals if element moved.
✋	`type`	Type into inputs. `pressEnter` to submit
✋	`scroll`	Scroll page
✋	`wait_for`	Wait for elements / text / URL changes
✋	`auto_dismiss`	Kill cookie banners, popups, overlays
✋	`execute_js`	Run JS in page. Always requires approval
🔒	`safe_mode_respond`	Handle approval requests
🔒	`safe_mode_policy`	View/change policies
🧠	`cache_start`	Start recording a replayable workflow
🧠	`cache_stop`	Save recorded workflow
🧠	`cache_replay`	Replay a saved workflow (skip AI inference)
🧠	`cache_list`	List saved workflows
🧠	`cache_delete`	Delete a saved workflow
💾	`save_session`	Save cookies/sessions for next time
📄	`page_to_pdf`	Export page as PDF
📋	`get_tabs`	List open tabs
📋	`get_audit_log`	Full action history

As a Skill

Aether ships with skill/SKILL.md — instructions that teach any AI agent how to use it. The skill covers: when to call get_hint_map, how to handle Safe Mode approvals, multi-step patterns (search → filter → extract), popup dismissal, and security rules.

# Claude Code
cp -r aether/skill ~/.claude/skills/aether

Compared To

	Aether	browser-use	Stagehand	Manus Operator
No extension needed	✅ Playwright	⚠️ Python SDK	⚠️ Own SDK	❌ Manus only
VPS / headless	✅ Out of the box	⚠️ Config needed	❌ Cloud only	❌
Session persistence	✅ `--storage-state`	⚠️ Manual	❌	✅
Works with any AI	✅ MCP standard	⚠️ Python SDK	⚠️ Own SDK	❌ Manus only
Open source	✅ MIT	✅	✅	❌
Page perception	✅ Hint Map	❌ Raw DOM	⚠️ Partial	❓ Unknown
Safe Mode	✅ Built-in	❌	❌	❌
Self-Healing	✅ Auto-recover	❌	✅	❌
Path Cache	✅ Record & replay	❌	✅ Cache	❌

Project Structure

aether/
├── server/
│   └── src/
│       ├── index.js          MCP Server + Safe Mode + CLI
│       ├── browser.js         Playwright browser manager
│       ├── injected.js        Hint Map + Self-Healing + Auto Dismiss (injected into pages)
│       ├── cache.js           Path Cache engine
│       └── profiles.js        Profile manager
├── extension/                 (legacy) Chrome Extension — kept for reference
├── skill/
│   └── SKILL.md              AI agent instructions
└── scripts/
    └── test-e2e.js           E2E tests

Roadmap

License

MIT

Aether — αἰθήρ — the fifth element, invisible but everywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
extension		extension
scripts		scripts
server		server
skill		skill
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why

How

Quick Start

Hint Map

Safe Mode

Tools (20)

As a Skill

Compared To

Project Structure

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why

How

Quick Start

Hint Map

Safe Mode

Tools (20)

As a Skill

Compared To

Project Structure

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages