AI-powered screen companion for Windows. See your screen, hear your voice, point at answers.
Windows companion to farzaa/clicky (macOS).
- Sees your screen — captures screenshots and sends them to Claude with vision
- Hears you — push-to-talk voice input with real-time transcription
- Speaks back — text-to-speech responses via ElevenLabs or Windows SAPI
- Points at things — animated cursor overlay that highlights UI elements Claude references
- Lives in your tray — runs quietly as a system tray app
You'll need at minimum an Anthropic API key. Optionally:
- AssemblyAI key for voice transcription
- ElevenLabs key for natural TTS
git clone https://github.com/tekram/clicky-windows.git
cd clicky-windows
npm install
npx tsc
npm run devNote:
npm run devdoes not compile TypeScript for you. You must runnpx tscbefore the first launch and after every source change, or runnpx tsc --watchin a second terminal.
Open Settings from the tray icon and enter your API keys.
Toggle HIPAA mode in Settings to force all processing to stay local:
- Transcription: local Whisper (no audio leaves device)
- TTS: Windows SAPI (no text leaves device)
- Only the Claude API call goes external (requires BAA with Anthropic)
src/
├── main/ # Electron main process
│ ├── index.ts # App entry, window creation
│ ├── companion.ts # Central orchestrator (voice → screen → claude → tts → overlay)
│ ├── screenshot.ts # Screen capture via desktopCapturer
│ ├── hotkey.ts # Global push-to-talk hotkey
│ ├── audio.ts # Audio capture coordination
│ ├── tray.ts # System tray setup
│ └── settings.ts # Persistent settings (electron-store)
├── services/ # External service integrations
│ ├── claude.ts # Anthropic Claude API (vision + chat)
│ ├── transcription/ # Pluggable: AssemblyAI, OpenAI, local Whisper
│ └── tts/ # Pluggable: ElevenLabs, Windows SAPI
├── preload/ # Context bridge for renderer
└── renderer/ # UI
├── overlay/ # Transparent click-through window with cursor animation
└── settings/ # Settings panel
This is a Windows-native reimplementation of farzaa/clicky. The macOS version uses Swift/SwiftUI with ScreenCaptureKit — APIs that don't exist on Windows. This version uses Electron + TypeScript to provide the same experience on Windows.
Shared concepts: POINT tag protocol, conversation flow, proxy worker architecture. Different: Everything else (language, framework, system APIs).
PRs welcome. See docs/plans/ for what's in progress.
MIT