Waze for Web Scraping
Marrow is a shared registry that maps the structure of the web on-demand, making AI agents resilient to UI changes.
AI agents today are fragile:
- They scrape blindly using brittle CSS selectors
- They break when websites change
- They waste tokens re-analyzing the same HTML
- Every developer reinvents the wheel
Marrow decouples understanding the UI (slow, expensive) from using it (fast, cheap).
Instead of this:
await page.click(".btn-primary"); // Breaks when CSS changesDo this:
const map = await marrow.getMap("linkedin.com/jobs");
await page.click(map.elements.apply_button.strategies[0].value);Marrow doesn't crawl the web upfront. It maps pages on-demand when users request them.
Day 1: Database is empty.
Day 2: User A requests linkedin.com/jobs → Marrow maps it (10s, $0.004).
Day 3: Users B, C, D request linkedin.com/jobs → Instant cache hit (0s, $0.00).
This is Waze. The first driver on a new road does the work. Everyone else benefits.
A stealth browser that navigates pages without getting blocked.
- Uses Playwright + stealth plugins
- Extracts HTML and accessibility tree
- Runs headless or visible for debugging
Gemini 2.0 Flash analyzes the page structure and returns stable selectors.
- Identifies containers, buttons, links, forms
- Provides multiple fallback strategies (CSS, XPath, ARIA)
- Enforces semantic naming (
job_card, notdiv_123)
A shared database (Convex/Supabase) that stores maps.
- Serves cached maps in <50ms
- Auto-refreshes stale maps
- Self-heals broken selectors
Users provide their own Gemini API key. Marrow provides the infrastructure.
Why this works:
- Zero AI costs for Marrow (users pay Google directly)
- Network effects (shared maps benefit everyone)
- Scalable (no upfront crawling needed)
Cost per map: ~$0.004 (less than half a penny)
Cost per cached lookup: ~$0.00
🚧 Phase 3 Complete - Cartographer + Mapper integration working
🔨 Phase 4 In Progress - Building the Registry (JIT database)
# Clone the repo
git clone https://github.com/tomiwa-a/marrow.git
cd marrow
# Install dependencies
npm install
# Set up environment
cp .env.example .env
# Add your GEMINI_API_KEY
# Test the integration
npx tsx scripts/test-integration.ts https://news.ycombinator.comProblem: AI assistants hallucinate selectors when navigating documentation or web apps.
Solution: Marrow MCP Server provides reliable selectors.
// Claude Code calls Marrow via MCP:
const map = await marrow.getMap("stripe.com/docs");
await page.click(map.elements.search_input.strategies[0].value);Problem: Automation scripts break when websites change CSS.
Solution: Tools query Marrow for current selectors.
// OpenClaw sends Instagram DM:
const map = await marrow.getMap("instagram.com/direct");
await page.click(map.elements.new_message_button.strategies[0].value);Reduce scraper maintenance by 90% with shared, auto-healing maps.
Save tokens by using cached maps instead of re-analyzing pages.
Write resilient tests that survive CSS refactors.
Next Milestones:
- Phase 4: JIT Registry (Convex backend)
- Phase 5: Self-healing selectors
- Phase 6: Public API
What Marrow does: Maps the structure of pages (CSS selectors, element locations).
What Marrow does NOT do: Scrape content (text, images, videos).
Analogy: Marrow sells treasure maps, not the treasure itself.
Mapping functional facts (e.g., "the button has ID #submit") is significantly safer than scraping creative content.
The original V1 autonomous job agent has been archived to v1_agent/.
It is a standalone CLI tool that navigates LinkedIn.
Built by @tomiwa_amole
MIT