recognize

Talk to your Mac and it listens. A fast, local speech recognition CLI for macOS, powered by whisper.cpp with CoreML and Metal acceleration on Apple Silicon.

No cloud. No API keys. No latency. Just speak.

Voice Mode for Claude Code

The flagship feature: talk to Claude Code instead of typing. Say /r in Claude Code, speak your request, and the transcript becomes Claude's input automatically.

You: /r
> Speak now... (recording)
> (you speak for a few seconds, then pause)
> (auto-stops after 5s of silence)
Claude: I'll help you with that. Let me...

One command. No manual stop needed. recognize listens, auto-stops when you're done talking, and feeds the transcript straight to Claude.

Command	Alias	What it does
`/recognize`	`/r`	Speak, auto-stop on silence, Claude responds
`/recognize c`	`/rc`	Continuous recording until you manually stop
`/recognize m`	`/r m`	Meeting mode with AI-powered summary
`/recognize p`	`/rp`	Push-to-talk (hold space to speak)
`/recognize-stop`	`/rs`	Stop a continuous/meeting session
`/recognize-history`	`/rh`	Search past transcriptions

Setup: Install the recognize plugin for Claude Code, or copy command files to ~/.claude/commands/ and scripts to ~/.recognize/. See Installation below.

What You Can Do

Real-time transcription

Speak into your mic and see text appear in real-time. Works with 99+ languages.

recognize -m base.en                              # English, good speed/accuracy
recognize -m medium --output-mode bilingual -l zh  # Chinese with English translation
recognize -m base.en --step 0 --length 30000       # VAD mode (recommended)

Meeting transcription with AI summary

Record a meeting, get structured notes with action items, speaker labels, and decisions - organized by Claude.

recognize --meeting                        # Start recording, Ctrl-C when done
recognize --meeting --name "team-standup"   # Named output file
recognize --meeting --tinydiarize -m small.en-tdrz  # With speaker tracking

When you stop recording, the transcript is sent to Claude CLI which produces a structured summary with meeting type classification, action items, key decisions, and follow-up drafts. Long meetings (>20k words) use multi-pass summarization. Falls back to raw transcript if Claude CLI isn't available.

Subtitle generation

Generate professional subtitles for video content.

recognize -m base.en --export --export-format srt   # SRT for video players
recognize -m base.en --export --export-format vtt   # WebVTT for web

Scripting and piping

Clean stdout output with no ANSI codes when piped - ideal for building speech-powered workflows.

recognize --no-export --no-timestamps 2>/dev/null | my-processor
recognize --no-export --no-timestamps > transcript.txt 2>/dev/null &

Auto-copy to clipboard

Transcription is automatically copied to your clipboard when a session ends.

recognize -m base.en --auto-copy

Installation

One-command install (downloads pre-built binary + Claude Code plugin):

curl -sSL https://raw.githubusercontent.com/anthropic-xi/recogniz.ing/main/src/cli/install.sh | sh

Or with Homebrew:

brew tap recognizing/tap && brew install recognize

Or build from source:

make install-deps && make build && make install

Requirements: macOS 12.0+. Models download automatically on first use.

Available Models

Model	Size	Speed	Languages	Best for
`tiny.en`	39 MB	Fastest	English	Quick drafts, voice commands
`base.en`	148 MB	Fast	English	Daily use, good accuracy
`small.en`	488 MB	Medium	English	Higher accuracy
`medium.en`	1.5 GB	Slower	English	High accuracy
`large-v3`	3.1 GB	Slowest	99 languages	Maximum accuracy
`large-v3-turbo`	1.5 GB	Fast	99 languages	Best accuracy/speed ratio

For multilingual or bilingual output, use models without the .en suffix (base, medium, large-v3).

make list-models       # See all available models
make list-downloaded   # See what you have installed

Build Commands

make build             # Full build (sync upstream + configure + compile)
make rebuild           # Quick rebuild (skips configure and upstream sync)
make fresh             # Clean + full build
make clean             # Remove build artifacts
make test              # Smoke test (--help check)
make install           # Install to /usr/local/bin
make install-user      # Install to ~/bin
make sync-upstream     # Pull latest whisper.cpp (auto-runs on build)

make build automatically pulls the latest whisper.cpp from upstream before compiling. Use make rebuild for fast iteration without the sync.

Configuration

Settings are layered: CLI args > env vars > project config > user config.

# Set defaults
recognize config set model base.en
recognize config set auto_copy_enabled true
recognize config set silence_timeout 5

# Or use environment variables
export WHISPER_MODEL=base.en
export WHISPER_SILENCE_TIMEOUT=5

# View current config
recognize config list

Config files live at ~/.recognize/config.json (user) and .whisper-config.json (project). See the full configuration reference below.

Multi-Language Support

Three output modes for seamless translation:

original - Transcribe in the spoken language (default)
english - Translate everything to English
bilingual - Show both original and English side by side

recognize -m medium --output-mode bilingual -l zh   # Chinese + English
recognize -m medium --output-mode english -l ja      # Japanese -> English
recognize -m medium --output-mode original -l es     # Spanish as-is

Bilingual mode runs two inference passes (~2x processing time). Use medium, large-v3, or large-v3-turbo for best translation quality.

Export Formats

Export transcriptions to TXT, Markdown, JSON, CSV, SRT, VTT, or XML.

recognize -m base.en --export --export-format md --export-file notes.md
recognize -m base.en --export --export-format json --export-include-confidence
recognize -m base.en --export --export-format srt --export-file subtitles.srt

Configure default export behavior:

recognize config set export_enabled true
recognize config set export_format json

Performance Tips

CoreML is enabled by default - best performance on Apple Silicon
--coreml-gpu-only skips ANE compilation for instant startup (~3x slower inference, but no 5-min cold start with large models)
VAD mode (--step 0) is the most efficient for real-time use
Thread count auto-tunes: 4 with CoreML (decoder-only), up to 8 without
large-v3-turbo gets near large-v3 accuracy at ~40% faster speed
tiny.en for the fastest possible processing when accuracy is less critical

Speaker Segmentation

Track who's speaking with automatic [Speaker N] labels:

recognize -m small.en-tdrz --tinydiarize
recognize -m small.en-tdrz --tinydiarize --step 0 --length 30000  # VAD mode

Currently English-only, requires models with tdrz suffix.

Command Line Reference

Basic Options

Flag	Description	Default
`-m, --model`	Model name or file path	(interactive)
`-l, --language`	Source language	`en`
`-t, --threads`	Processing threads	4
`-v, --version`	Show version
`-h, --help`	Show help

Audio Options

Flag	Description	Default
`-c, --capture`	Audio capture device ID	-1 (default)
`--step`	Audio step size in ms (0 for VAD)	3000
`--length`	Audio length in ms	10000
`--keep`	Audio kept from previous step in ms	200

Processing Options

Flag	Description	Default
`-tr, --translate`	Translate to English	off
`-vth, --vad-thold`	VAD threshold	0.6
`-fth, --freq-thold`	High-pass frequency cutoff	100.0
`-bs, --beam-size`	Beam search size	-1
`-mt, --max-tokens`	Max tokens per chunk	32
`-fa, --flash-attn`	Flash attention	on

Output Options

Flag	Description	Default
`-f, --file`	Output to file
`-om, --output-mode`	`original`, `english`, or `bilingual`	`original`
`-sa, --save-audio`	Save recorded audio to WAV	off
`--no-timestamps`	Disable timestamps	off
`-ps, --print-special`	Print special tokens	off

CoreML Options

Flag	Description	Default
`--coreml`	Enable CoreML acceleration	on
`--no-coreml`	Disable CoreML
`--coreml-gpu-only`	Skip ANE (fast startup, ~3x slower inference)	off
`-cm, --coreml-model`	Specific CoreML model path

Accuracy Options

Flag	Description	Default
`--initial-prompt`	Whisper conditioning prompt
`--suppress-regex`	Regex pattern to suppress
`--vad-model`	Silero VAD model path or `auto`
`--silence-timeout N`	Auto-stop after N seconds of silence	0 (off)

Export Options

Flag	Description	Default
`--export`	Enable export on session end	off
`--export-format`	`txt`, `md`, `json`, `csv`, `srt`, `vtt`, `xml`	`txt`
`--export-file`	Output file path	auto
`--export-no-metadata`	Exclude metadata	off
`--export-no-timestamps`	Exclude timestamps	off
`--export-include-confidence`	Include confidence scores	off

Auto-Copy Options

Flag	Description	Default
`--auto-copy`	Copy transcript to clipboard on exit	off
`--auto-copy-max-duration N`	Max session hours before skip	2
`--auto-copy-max-size N`	Max transcript bytes before skip	1MB

Meeting Options

Flag	Description	Default
`--meeting`	Enable meeting mode with AI summary	off
`--no-meeting`	Disable meeting mode
`--prompt`	Custom prompt text or file
`--name`	Output filename prefix
`--meeting-timeout N`	Claude CLI timeout in seconds	120
`--meeting-max-single-pass N`	Words before multi-pass	20000

Speaker Segmentation Options

Flag	Description	Default
`-tdrz, --tinydiarize`	Enable speaker segmentation	off

Push-to-Talk Options

Flag	Description	Default
`--ptt`	Enable push-to-talk mode	off
`--ptt-key KEY`	PTT key: `space`, `right_option`, `right_ctrl`, `fn`, `f13`	`space`
`--ptt-pre-roll N`	Pre-roll audio in ms before key press	300

Model Management

Flag	Description
`--list-models`	List available models
`--list-downloaded`	Show downloaded models with sizes
`--show-storage`	Storage usage breakdown
`--delete-model MODEL`	Delete a specific model
`--delete-all-models`	Delete all models
`--cleanup`	Remove orphaned files

History

Command	Description
`history list [--limit N]`	Recent transcriptions
`history search "<query>"`	Full-text search
`history show <id>`	Full transcript by ID
`history count`	Total entries
`history clear [--older-than Nd]`	Delete entries

Configuration Reference

Config Commands

recognize config list                    # Show all settings
recognize config set model base.en       # Set a value
recognize config get model               # Get a value
recognize config unset model             # Remove a value
recognize config reset                   # Reset to defaults

Environment Variables

All settings support WHISPER_ prefixed env vars:

WHISPER_MODEL=base.en
WHISPER_LANGUAGE=en
WHISPER_THREADS=8
WHISPER_COREML=true
WHISPER_SILENCE_TIMEOUT=5
WHISPER_AUTO_COPY=true
WHISPER_MEETING=true
WHISPER_MEETING_NAME=standup

Config File (`~/.recognize/config.json`)

{
  "default_model": "base.en",
  "threads": 8,
  "use_coreml": true,
  "language": "en",
  "vad_threshold": 0.6,
  "step_ms": 3000,
  "length_ms": 10000,
  "auto_copy_enabled": true,
  "silence_timeout": 0,
  "meeting_mode": false,
  "meeting_timeout": 120,
  "meeting_max_single_pass": 20000
}

Troubleshooting

Build fails: Ensure SDL2 (brew install sdl2) and CMake (brew install cmake) are installed. Try make fresh for a clean build.

No audio: Check microphone permissions in System Settings > Privacy & Security > Microphone. Try -c to select a different audio device.

Poor accuracy: Use a larger model (base.en -> small.en), try VAD mode (--step 0), or adjust VAD threshold (-vth).

CoreML issues: Runs fine without CoreML (auto-fallback). Force disable with --no-coreml if needed. If startup is slow with large models (ANE compilation), use --coreml-gpu-only for instant startup at the cost of ~3x slower inference.

Documentation

TUTORIAL.md - Comprehensive usage guide
make help - All Makefile targets
recognize --help - Full CLI options

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
docs/plans		docs/plans
homebrew		homebrew
plugin		plugin
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TUTORIAL.md		TUTORIAL.md
audio_processor.cpp		audio_processor.cpp
audio_processor.h		audio_processor.h
cli_parser.cpp		cli_parser.cpp
cli_parser.h		cli_parser.h
config_manager.cpp		config_manager.cpp
config_manager.h		config_manager.h
export_manager.cpp		export_manager.cpp
export_manager.h		export_manager.h
history_manager.cpp		history_manager.cpp
history_manager.h		history_manager.h
install.sh		install.sh
meeting_manager.cpp		meeting_manager.cpp
meeting_manager.h		meeting_manager.h
model_manager.cpp		model_manager.cpp
model_manager.h		model_manager.h
ptt_manager.cpp		ptt_manager.cpp
ptt_manager.h		ptt_manager.h
recognize.cpp		recognize.cpp
session_types.h		session_types.h
text_processing.cpp		text_processing.cpp
text_processing.h		text_processing.h
whisper_params.h		whisper_params.h

Folders and files

Latest commit

History

Repository files navigation

recognize

Voice Mode for Claude Code

What You Can Do

Real-time transcription

Meeting transcription with AI summary

Subtitle generation

Scripting and piping

Auto-copy to clipboard

Installation

Available Models

Build Commands

Configuration

Multi-Language Support

Export Formats

Performance Tips

Speaker Segmentation

Command Line Reference

Basic Options

Audio Options

Processing Options

Output Options

CoreML Options

Accuracy Options

Export Options

Auto-Copy Options

Meeting Options

Speaker Segmentation Options

Push-to-Talk Options

Model Management

History

Configuration Reference

Config Commands

Environment Variables

Config File (~/.recognize/config.json)

Troubleshooting

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Config File (`~/.recognize/config.json`)

Packages