07 Apr 06:03

appergb

e72c743

desktop-agent-ops v1.4.1 Latest

Latest

v1.4.1 (2026-04-07)

Features

Adaptive Window-Crop Executor Entry
- local_agent.py screenshot tool now accepts app and region_label
- executor now prefers app-window crops before falling back to full-screen captures
- added explicit image_space metadata so cropped image coordinates can be decoded outside the model
- added translate_image_point tool so screenshot-driven flows can remap image-space points into screen coordinates deterministically

Fixes

Retina / HiDPI coordinate remapping
- added pixel-size-aware remapping in image_space.py instead of assuming simple offset-only translation
- updated executor context so models see crop bounds, image pixel size, and mapping rules together

Documentation

documented the window-crop-first rule and fallback behavior in SKILL.md, desktop-agent-ops.md, and references/workflow.md
clarified that screenshot-driven flows should decode image coordinates outside the model and only fall back to whole-screen captures when window capture is unavailable

🤖 Generated with Claude Code

Assets 4

03 Apr 01:09

appergb

v1.2.1

598d57f

v1.2.1 — MCP-First + Three-Layer Smart Targeting

What's New

Tool Priority Decision Flow (MCP-First)

Priority 1: MCP Servers (chrome-devtools, fetch, etc.) — always prefer structured APIs
Priority 2: Native CLI / AppleScript — direct control without screen parsing
Priority 3: Desktop Agent Ops — screen recognition as last resort only
Decision checklist added to SKILL.md entry point

Three-Layer Smart Targeting

Layer	Method	Speed	When Used
1	Accessibility API (AXUIElement)	~34ms	Native apps (Finder, Safari, Notes)
2	Vision Framework OCR	~147ms	Apps hiding UI (WeChat, QQ, Electron)
3	Tesseract OCR	~2187ms	Linux/Windows fallback

New Files

ax_provider.py — macOS Accessibility API provider
vision_ocr.py — macOS Vision Framework OCR (no Tesseract needed)

Key Changes

target_resolver.py — accessibility-first provider chain with auto-degradation
ocr_text.py — multi-backend (--backend auto|vision|tesseract)
first_run_setup.py — macOS installs pyobjc; Tesseract now optional
Tesseract removed from mandatory brew install on macOS
50/50 tests passing

Assets 4

03 Apr 00:57

appergb

v1.2.0

3b0f323

v1.2.0 — Three-Layer Smart Targeting

Three-layer smart targeting: Accessibility API (34ms) → Vision OCR (147ms) → Tesseract (2187ms). macOS no longer requires Tesseract installation. New: ax_provider.py, vision_ocr.py. See CHANGELOG.md for details.

Assets 2

02 Apr 11:04

appergb

v1.1.0

5c2a301

v1.1.0 — Custom Workflows + OCR Ambiguity Fix

Release Notes — v1.1.0 (2026-04-02)

New Features

Custom Workflow System — Define reusable multi-step desktop automations in Markdown + YAML frontmatter
- workflow_loader.py: Discover and parse workflows from bundled and user directories
- workflow_runner.py: Execute workflows with parameter substitution, retry logic, and task context
- preview command for Agent safety review before execution (no hardcoded whitelist)
- 3 bundled example workflows: send-chat-message, browser-search, open-app-and-click
Secret Scanner — Pre-upload security scanning (secret_scanner.py)
- 13 regex patterns: AWS keys, GitHub tokens, API keys, private keys, connection strings, etc.
- Shannon entropy detection for unknown secret formats
- Severity levels: error (blocks upload) / warning (skippable with --force)
Workflow Sharing — Contribute workflows to community via GitHub PR (workflow_share.py)
- Automated preflight: format validation + secret scan + gh auth check
- One-command fork → branch → commit → PR creation
- PR body auto-generated with workflow metadata and scan results

Fixes

OCR ambiguity guard — Example 3 send-button lookup now uses --region-label primary_action to prevent false-positive when message text contains "发送"
Removed vague "OR" fallback — Input field targeting no longer offers "click at bottom center" as alternative; window_regions.py --label bottom_input is now mandatory
Reference doc trigger rules — Changed from "Load as needed" to explicit MUST-read conditions for platform, chat-app, WeChat, validation, and targeting docs
Added post-type screenshot verification step in Example 3

Documentation

Added skill/references/custom-workflows.md workflow authoring guide
Updated SKILL.md with Custom Workflows section and Agent Safety Review Protocol
Updated README with workflow system documentation

Install

Download desktop-agent-ops-v1.1.0.zip and follow the setup instructions in SKILL.md.

SHA-256 checksum available in desktop-agent-ops-v1.1.0.sha256.

Assets 4

25 Mar 04:57

appergb

v1.0.3

d302d2c

v1.0.3 — Performance & Reliability

Summary

Major reliability and performance release. Fixes CJK text input, Enter-to-send, minimized window restoration, and 10+ other bugs. End-to-end WeChat message sending now works reliably and is 7.6x faster (0.59s vs 4.49s).

Highlights

Clipboard-first input on all platforms — cliclick silently dropped CJK characters
AppleScript key code as primary key press path — cliclick kp:return not recognized by WeChat
Minimized window restoration — Dock click approach for minimized windows
7.6x faster end-to-end: focus 0.29s + type 0.17s + send 0.13s = 0.59s total
8 new example cases (Case 12–19): right-click, drag-and-drop, system settings, form fill, dropdown, toggle/slider, cross-app copy-paste, browser tabs
12 bug fixes across scroll, screenshot, pixel-color, window bounds, drag, hotkey, and more

Install via ClawHub

npx clawhub@latest install desktop-agent-ops

See full details in release-notes-v1.0.3.md and CHANGELOG.md.

Assets 4

23 Mar 07:19

appergb

v1.0.0

19d32f9

v1.0.0 — Desktop Agent Ops

Desktop Agent Ops v1.0.0

Cross-platform desktop GUI automation skill for AI agents.

Highlights

One-command setup: python3 scripts/first_run_setup.py handles everything
Window-scoped OCR: Targets only the active app window, never the wrong app
Auto DPI scaling: Retina, HiDPI, all resolutions handled automatically
Multi-language OCR: Auto-detects system language (中文, 日本語, 한국어, etc.)
CJK text input: Reliable Unicode input via clipboard-paste on all platforms
17 desktop commands: screenshot, click, type, scroll, drag, hotkey, focus-app...

Platforms

macOS (Retina supported)
Windows (HiDPI supported)
Linux X11 (HiDPI supported)

Installation

Download desktop-agent-ops-skill-clean.zip and extract to your skill directory. The agent will auto-setup on first use.

Assets 3

Releases: appergb/desktop-agent-ops

desktop-agent-ops v1.4.1

v1.4.1 (2026-04-07)

Features

Fixes

Documentation

Uh oh!

v1.2.1 — MCP-First + Three-Layer Smart Targeting

What's New

Tool Priority Decision Flow (MCP-First)

Three-Layer Smart Targeting

New Files

Key Changes

Uh oh!

v1.2.0 — Three-Layer Smart Targeting

Uh oh!

v1.1.0 — Custom Workflows + OCR Ambiguity Fix

Release Notes — v1.1.0 (2026-04-02)

New Features

Fixes

Documentation

Install

Uh oh!

v1.0.3 — Performance & Reliability

Summary

Highlights

Install via ClawHub

Uh oh!

v1.0.0 — Desktop Agent Ops

Desktop Agent Ops v1.0.0

Highlights

Platforms

Installation

Uh oh!