Whisper Dictation

A lightweight macOS dictation tool that uses Whisper to transcribe your voice and type the result into any app. On Apple Silicon it uses the MLX Whisper backend by default, with OpenAI Whisper kept as a fallback. Runs silently as a menu bar app — no terminal needed.

Hold fn or middle mouse button to record
Release to transcribe and type
Menu bar icon shows status: 🎤 idle · 🔴 recording · ⏳ transcribing · 📞 external transcription running

Requirements

macOS (Apple Silicon recommended)
Python 3.10+
Microsoft Word (for PDF export, optional)

Setup

1. Clone the repo

git clone https://github.com/matthumble/whisper-dictation.git
cd whisper-dictation

2. Prepare the transcription model

The tool uses the small model by default.

On Apple Silicon, the default MLX backend will download the converted model automatically on first run.

The OpenAI Whisper fallback still uses a local model directory:

mkdir -p ~/whisper-models

To pre-download the fallback OpenAI Whisper model manually:

python3 -c "import whisper; whisper.load_model('small', download_root='<path-to-your-models-folder>')"

Update WHISPER_MODEL_DIR in dictation.py to point to your models folder:

WHISPER_MODEL_DIR = Path.home() / "whisper-models"

3. Create a virtual environment and install dependencies

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

4. Test it manually first

.venv/bin/python3 dictation.py

You should see 🎤 appear in your menu bar. If you get an accessibility warning, follow the permissions step below.

5. Grant Accessibility permission

macOS requires Accessibility access for the tool to read keyboard and mouse input.

Go to System Settings → Privacy & Security → Accessibility and add:

<path-to-your-project>/.venv/bin/python3.x

You may also be prompted for Microphone access the first time you record.

6. Install as a launchd service (auto-start on login)

Copy the example plist, update the paths inside it, then load it:

cp com.example.dictation.plist ~/Library/LaunchAgents/com.yourname.dictation.plist

Edit the plist to point to your Python binary and dictation.py path, then:

launchctl load ~/Library/LaunchAgents/com.yourname.dictation.plist

To restart after changes:

launchctl kickstart -k gui/$(id -u)/com.yourname.dictation

The Restart Dictation menu item uses the same launchctl kickstart command. By default it targets the label com.example.dictation. If you used a different label in your plist, set LAUNCH_JOB_LABEL in the plist's EnvironmentVariables so the restart action targets the right job:

<key>EnvironmentVariables</key>
<dict>
    <key>PATH</key>
    <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
    <key>LAUNCH_JOB_LABEL</key>
    <string>com.yourname.dictation</string>
</dict>

Configuration

All settings are at the top of dictation.py:

Setting	Default	Description
`TRANSCRIPTION_BACKEND`	`mlx`	Backend: `mlx` on Apple Silicon, with automatic fallback to OpenAI Whisper if MLX fails
`MLX_MODEL_REPO`	`mlx-community/whisper-small-mlx`	Hugging Face repo for the converted MLX Whisper model
`WHISPER_MODEL_DIR`	`~/whisper-models`	Path to Whisper model files
`MODEL_SIZE`	`small`	Whisper model: `tiny`, `base`, `small`, `medium`, `large`, `turbo`
`MIN_DURATION_SEC`	`0.5`	Ignore clips shorter than this (seconds)
`EXTERNAL_TRANSCRIPTION_PATTERNS`	`whisper`, `macwhisper`, `transcribe`, etc.	Process-name matches that make dictation show 📞 and ignore new recording starts

When 📞 appears, dictation is intentionally unavailable because another Whisper-style transcription process appears to be running. This avoids competing with your larger call-transcription model for local compute.

Changing the hotkey

By default the tool uses fn and the physical middle mouse button. Middle-click is captured with Quartz and swallowed before it reaches the focused app, so it should not move the text cursor while starting dictation.

For an MX Master mouse, map the wheel click to the normal middle button action. Do not map it to a keyboard shortcut unless you also change the trigger code.

The fn key is hardcoded via Quartz. To disable it entirely, remove the fn handling from _quartz_callback.

Stopping the tool

Click 🎤 in the menu bar and select Quit Dictation.

If dictation stops responding but the menu bar icon is still present, use Restart Dictation from the same menu to reload the background process without opening Terminal.

If running as a launchd service it will restart automatically (by design). To stop it permanently:

launchctl unload ~/Library/LaunchAgents/com.yourname.dictation.plist

Troubleshooting

No menu bar icon appears

Check dictation.log for errors
Verify Accessibility permission is granted

fn key not working

Make sure Accessibility permission is granted for the correct Python binary
Some keyboards remap fn — try reassigning to a different trigger
If several fn presses start and stop immediately with no captured audio, check dictation.log for repeated No audio captured, fallback input selection, or CoreAudio Invalid Property Value messages. On April 25, 2026 this cleared after launchd restarted the app and it returned to the MacBook mic. If it recurs, consider adding an automatic restart after several empty recordings in a short window.

Transcription is slow

Switch to tiny or base model for faster (less accurate) results
The first transcription after startup is slower due to model warmup
The first MLX run may also spend extra time downloading the converted model from Hugging Face

Text pastes in wrong app

Release the key slowly — there is a small delay before paste fires

Python keeps appearing in the Dock

This happens when macOS treats the host interpreter as a normal foreground app instead of a menu bar accessory
The current dictation.py overrides rumps startup to force accessory mode so only the menu bar icon stays visible

Quit from the menu bar just reopens

If your LaunchAgent plist uses <key>KeepAlive</key><true/>, launchd will restart the app after it exits
Remove KeepAlive if you want Quit to stop the app until the next manual launch or login

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
com.example.dictation.plist		com.example.dictation.plist
dictation.py		dictation.py
dictation_audio.py		dictation_audio.py
dictation_config.py		dictation_config.py
dictation_hotkey.py		dictation_hotkey.py
dictation_logging.py		dictation_logging.py
dictation_media.py		dictation_media.py
dictation_menu.py		dictation_menu.py
dictation_output.py		dictation_output.py
dictation_prefs.py		dictation_prefs.py
dictation_restart.py		dictation_restart.py
dictation_state.py		dictation_state.py
dictation_transcribe.py		dictation_transcribe.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Dictation

Requirements

Setup

1. Clone the repo

2. Prepare the transcription model

3. Create a virtual environment and install dependencies

4. Test it manually first

5. Grant Accessibility permission

6. Install as a launchd service (auto-start on login)

Configuration

Changing the hotkey

Stopping the tool

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictation

Requirements

Setup

1. Clone the repo

2. Prepare the transcription model

3. Create a virtual environment and install dependencies

4. Test it manually first

5. Grant Accessibility permission

6. Install as a launchd service (auto-start on login)

Configuration

Changing the hotkey

Stopping the tool

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages