Releases: appergb/desktop-agent-ops
desktop-agent-ops v1.4.1
v1.4.1 (2026-04-07)
Features
- Adaptive Window-Crop Executor Entry
local_agent.pyscreenshottool now acceptsappandregion_label- executor now prefers app-window crops before falling back to full-screen captures
- added explicit
image_spacemetadata so cropped image coordinates can be decoded outside the model - added
translate_image_pointtool so screenshot-driven flows can remap image-space points into screen coordinates deterministically
Fixes
- Retina / HiDPI coordinate remapping
- added pixel-size-aware remapping in
image_space.pyinstead of assuming simple offset-only translation - updated executor context so models see crop bounds, image pixel size, and mapping rules together
- added pixel-size-aware remapping in
Documentation
- documented the window-crop-first rule and fallback behavior in
SKILL.md,desktop-agent-ops.md, andreferences/workflow.md - clarified that screenshot-driven flows should decode image coordinates outside the model and only fall back to whole-screen captures when window capture is unavailable
🤖 Generated with Claude Code
v1.2.1 — MCP-First + Three-Layer Smart Targeting
What's New
Tool Priority Decision Flow (MCP-First)
- Priority 1: MCP Servers (chrome-devtools, fetch, etc.) — always prefer structured APIs
- Priority 2: Native CLI / AppleScript — direct control without screen parsing
- Priority 3: Desktop Agent Ops — screen recognition as last resort only
- Decision checklist added to SKILL.md entry point
Three-Layer Smart Targeting
| Layer | Method | Speed | When Used |
|---|---|---|---|
| 1 | Accessibility API (AXUIElement) | ~34ms | Native apps (Finder, Safari, Notes) |
| 2 | Vision Framework OCR | ~147ms | Apps hiding UI (WeChat, QQ, Electron) |
| 3 | Tesseract OCR | ~2187ms | Linux/Windows fallback |
New Files
ax_provider.py— macOS Accessibility API providervision_ocr.py— macOS Vision Framework OCR (no Tesseract needed)
Key Changes
target_resolver.py— accessibility-first provider chain with auto-degradationocr_text.py— multi-backend (--backend auto|vision|tesseract)first_run_setup.py— macOS installs pyobjc; Tesseract now optional- Tesseract removed from mandatory
brew installon macOS - 50/50 tests passing
v1.2.0 — Three-Layer Smart Targeting
Three-layer smart targeting: Accessibility API (34ms) → Vision OCR (147ms) → Tesseract (2187ms). macOS no longer requires Tesseract installation. New: ax_provider.py, vision_ocr.py. See CHANGELOG.md for details.
v1.1.0 — Custom Workflows + OCR Ambiguity Fix
Release Notes — v1.1.0 (2026-04-02)
New Features
-
Custom Workflow System — Define reusable multi-step desktop automations in Markdown + YAML frontmatter
workflow_loader.py: Discover and parse workflows from bundled and user directoriesworkflow_runner.py: Execute workflows with parameter substitution, retry logic, and task contextpreviewcommand for Agent safety review before execution (no hardcoded whitelist)- 3 bundled example workflows: send-chat-message, browser-search, open-app-and-click
-
Secret Scanner — Pre-upload security scanning (
secret_scanner.py)- 13 regex patterns: AWS keys, GitHub tokens, API keys, private keys, connection strings, etc.
- Shannon entropy detection for unknown secret formats
- Severity levels:
error(blocks upload) /warning(skippable with --force)
-
Workflow Sharing — Contribute workflows to community via GitHub PR (
workflow_share.py)- Automated preflight: format validation + secret scan + gh auth check
- One-command fork → branch → commit → PR creation
- PR body auto-generated with workflow metadata and scan results
Fixes
- OCR ambiguity guard — Example 3 send-button lookup now uses
--region-label primary_actionto prevent false-positive when message text contains "发送" - Removed vague "OR" fallback — Input field targeting no longer offers "click at bottom center" as alternative;
window_regions.py --label bottom_inputis now mandatory - Reference doc trigger rules — Changed from "Load as needed" to explicit MUST-read conditions for platform, chat-app, WeChat, validation, and targeting docs
- Added post-type screenshot verification step in Example 3
Documentation
- Added
skill/references/custom-workflows.mdworkflow authoring guide - Updated
SKILL.mdwith Custom Workflows section and Agent Safety Review Protocol - Updated README with workflow system documentation
Install
Download desktop-agent-ops-v1.1.0.zip and follow the setup instructions in SKILL.md.
SHA-256 checksum available in desktop-agent-ops-v1.1.0.sha256.
v1.0.3 — Performance & Reliability
Summary
Major reliability and performance release. Fixes CJK text input, Enter-to-send, minimized window restoration, and 10+ other bugs. End-to-end WeChat message sending now works reliably and is 7.6x faster (0.59s vs 4.49s).
Highlights
- Clipboard-first input on all platforms — cliclick silently dropped CJK characters
- AppleScript key code as primary key press path — cliclick
kp:returnnot recognized by WeChat - Minimized window restoration — Dock click approach for minimized windows
- 7.6x faster end-to-end: focus 0.29s + type 0.17s + send 0.13s = 0.59s total
- 8 new example cases (Case 12–19): right-click, drag-and-drop, system settings, form fill, dropdown, toggle/slider, cross-app copy-paste, browser tabs
- 12 bug fixes across scroll, screenshot, pixel-color, window bounds, drag, hotkey, and more
Install via ClawHub
npx clawhub@latest install desktop-agent-opsSee full details in release-notes-v1.0.3.md and CHANGELOG.md.
v1.0.0 — Desktop Agent Ops
Desktop Agent Ops v1.0.0
Cross-platform desktop GUI automation skill for AI agents.
Highlights
- One-command setup:
python3 scripts/first_run_setup.pyhandles everything - Window-scoped OCR: Targets only the active app window, never the wrong app
- Auto DPI scaling: Retina, HiDPI, all resolutions handled automatically
- Multi-language OCR: Auto-detects system language (中文, 日本語, 한국어, etc.)
- CJK text input: Reliable Unicode input via clipboard-paste on all platforms
- 17 desktop commands: screenshot, click, type, scroll, drag, hotkey, focus-app...
Platforms
- macOS (Retina supported)
- Windows (HiDPI supported)
- Linux X11 (HiDPI supported)
Installation
Download desktop-agent-ops-skill-clean.zip and extract to your skill directory. The agent will auto-setup on first use.