Skip to content

k161196/audio-flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Flow

A lightweight audio transcription application for Hyprland that lets you quickly record and transcribe audio with a beautiful floating pill UI.

Features

  • 🎙️ Quick Voice Recording - Click to start/stop recording
  • 🤖 AI Transcription - Automatic transcription using Whisper.cpp
  • 📋 Auto-copy & Auto-paste - Transcribed text automatically copied to clipboard and pasted to your focused window
  • 🎯 Smart Paste Detection - Automatically detects terminals vs apps and uses correct paste shortcut
  • Beautiful UI - Animated wave visualization while recording
  • Fast - Toggle visibility with a global hotkey
  • 🪟 Floating Window - Always accessible via Hyprland special workspace

Prerequisites

Required

  • Hyprland - Wayland compositor
  • PipeWire - Audio recording (pw-record command)
  • Whisper.cpp - Speech-to-text model
    • Binary: $HOME/open-source-projects/whisper.cpp/build/bin/whisper-cli
    • Model: $HOME/open-source-projects/whisper.cpp/models/ggml-base.en.bin
  • wtype - Keyboard input simulation for auto-paste (recommended)
    • Arch: sudo pacman -S wtype
    • Debian/Ubuntu: sudo apt install wtype

Build Dependencies

  • Rust (latest stable)
  • GPUI v0.2.0 dependencies

Installation

1. Clone and Build

cd ~/projects/audio-flow
cargo build --release

2. Install Whisper.cpp

If you haven't already:

cd ~/open-source-projects/
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make
bash ./models/download-ggml-model.sh base.en

3. Verify PipeWire

pw-record --help  # Should show usage information

Hyprland Configuration

Add the following to your Hyprland config (~/.config/hypr/hyprland.conf):

Window Rules

# Audio Flow - configure as floating, centered window
windowrulev2 = float, class:^(audio-flow)$
windowrulev2 = size 400 100, class:^(audio-flow)$
windowrulev2 = center, class:^(audio-flow)$
windowrulev2 = noborder, class:^(audio-flow)$
windowrulev2 = noshadow, class:^(audio-flow)$

Global Hotkey

Add a keybind to launch the audio flow application:

# Launch audio transcription window
bind = SUPER, R, exec, ~/projects/audio-flow/target/release/audio-flow

# Optional: Auto-start recording on launch
bind = SUPER_SHIFT, R, exec, ~/projects/audio-flow/target/release/audio-flow --start-recording-on-launch

Note: The app enforces single-instance mode. Pressing Super+R when an instance is already running will close the existing instance and start a new one. This ensures you always get a fresh recording session.

Command-Line Flags

  • --start-recording-on-launch - Skip the Idle state and start recording immediately when the app launches
    • Use this for faster workflow: launch → app starts recording → press key to stop and transcribe
    • Example: audio-flow --start-recording-on-launch

Recommended Workflow

Fastest workflow (using Super+R for everything):

  1. Press Super+R → App launches and starts recording immediately
  2. Speak your message
  3. Press Super+R → App stops recording, transcribes, copies to clipboard, and closes automatically

This gives you a seamless experience where Super+R handles both launching and completing the transcription.

Usage

1. Launch the Application

Press Super+R to start the application. The window will appear centered on your screen.

2. Record and Transcribe

Standard Mode:

  1. Click or press Space - Start recording (you'll see animated wave bars)
  2. Click or press Space again - Stop recording and begin transcription
  3. Wait - "Transcribing..." state appears briefly
  4. Done! - Transcribed text appears and is auto-copied to clipboard

Auto-Start Mode (with --start-recording-on-launch flag):

  1. App starts recording immediately (animated wave bars visible)
  2. Click or press Space - Stop recording and begin transcription
  3. Wait - "Transcribing..." state appears briefly
  4. Done! - Transcribed text appears and is auto-copied to clipboard

3. Keyboard Shortcuts

  • Super+R - Stop recording and transcribe (when app is focused)
  • Space - Start/stop recording (same as clicking)
  • ESC - Close the window and quit immediately

4. Next Recording

Simply press Super+R again. The app will automatically close any existing instance and start a fresh recording session.

UI States

State Description Visual
Idle Ready to record Blue microphone icon + "Tap to record"
Recording Recording audio Red dot + animated wave bars
Processing Transcribing audio Spinner + "Transcribing..."
Success Text ready Transcribed text + "✓ Copied to clipboard"
Error Something went wrong Red icon + error message

Configuration

Audio Flow uses a configuration file located at ~/.config/audio-flow/config.toml. The config file is automatically created with default values on first run.

Configuration File

Edit ~/.config/audio-flow/config.toml to customize paths:

whisper_binary_path = "$HOME/open-source-projects/whisper.cpp/build/bin/whisper-cli"
whisper_model_path = "$HOME/open-source-projects/whisper.cpp/models/ggml-base.en.bin"
pipewire_binary_path = "/usr/bin/pw-record"
recording_output_dir = "$HOME/Music/recordings"
recording_filename = "temp.wav"
auto_paste = true

Configuration Options

Option Description Default
whisper_binary_path Path to whisper-cli binary $HOME/open-source-projects/whisper.cpp/build/bin/whisper-cli
whisper_model_path Path to Whisper model file $HOME/open-source-projects/whisper.cpp/models/ggml-base.en.bin
pipewire_binary_path Path to pw-record binary /usr/bin/pw-record
recording_output_dir Directory for audio recordings $HOME/Music/recordings
recording_filename Filename for temporary recordings temp.wav
auto_paste Automatically paste transcribed text to focused window true

Example: Using a Different Whisper Model

To use the larger base model (not English-only):

whisper_model_path = "$HOME/open-source-projects/whisper.cpp/models/ggml-base.bin"

Example: Custom Recording Location

recording_output_dir = "/tmp/audio-flow-recordings"
recording_filename = "recording.wav"

Resetting Configuration

To reset to defaults, simply delete the config file:

rm ~/.config/audio-flow/config.toml

The app will recreate it with default values on next launch.

Troubleshooting

Configuration validation failed

If you see an error message about configuration validation:

Configuration validation failed: Whisper binary not found at: /path/to/whisper-cli

Check your config file at ~/.config/audio-flow/config.toml and verify all paths are correct:

# Check Whisper binary
ls -la ~/.config/audio-flow/config.toml

# Verify paths in config
cat ~/.config/audio-flow/config.toml

# Option 1: Fix the paths manually
nano ~/.config/audio-flow/config.toml

# Option 2: Reset to defaults
rm ~/.config/audio-flow/config.toml
audio-flow  # Will recreate with defaults

Window doesn't appear

# Try launching manually
~/projects/audio-flow/target/release/audio-flow

# Check Hyprland window rules
hyprctl clients | grep audio-flow

# Verify the binary exists
ls -la ~/projects/audio-flow/target/release/audio-flow

Recording fails

# Check PipeWire status
systemctl --user status pipewire

# Test recording manually
pw-record test.wav
# Press Ctrl+C after a few seconds
aplay test.wav

Transcription fails

# Check Whisper.cpp binary
~/open-source-projects/whisper.cpp/build/bin/whisper-cli --help

# Check model file
ls -lh ~/open-source-projects/whisper.cpp/models/ggml-base.en.bin

Clipboard not working

Make sure you're running under Wayland (not X11):

echo $XDG_SESSION_TYPE  # Should output "wayland"

Auto-paste not working

Check if wtype is installed:

which wtype  # Should show /usr/bin/wtype or similar

Install wtype if missing:

# Arch Linux
sudo pacman -S wtype

# Debian/Ubuntu
sudo apt install wtype

Disable auto-paste if needed:

Edit ~/.config/audio-flow/config.toml:

auto_paste = false

Terminal paste not working:

Auto-paste detects terminals and uses Ctrl+Shift+V automatically for:

  • Ghostty, Kitty, Alacritty, WezTerm, Foot, Konsole
  • GNOME Terminal, xterm, Terminator, Tilix
  • And many others

If your terminal isn't detected, the app will use Ctrl+V (which may not work). Open an issue on GitHub with your terminal name.

Development

Project Structure

audio-flow/
├── src/
│   ├── main.rs           # Application entry point
│   ├── app.rs            # Main app logic and window management
│   ├── state.rs          # State machine (Idle/Recording/Processing/Success/Error)
│   ├── ui/
│   │   ├── mod.rs        # UI module exports
│   │   └── pill.rs       # Pill UI component with animations
│   ├── audio/
│   │   ├── mod.rs        # Audio module exports
│   │   ├── recorder.rs   # PipeWire integration
│   │   └── transcriber.rs# Whisper.cpp integration
│   ├── clipboard.rs      # Wayland clipboard integration
│   ├── paste.rs          # Auto-paste with Hyprland window detection
│   ├── config.rs         # Configuration file management
│   └── notifications.rs  # Desktop notifications
├── CLAUDE.md             # Development learnings and best practices
├── README.md             # User documentation
└── Cargo.toml            # Dependencies

Building

cargo build                # Debug build
cargo build --release      # Release build (recommended)
cargo run                  # Run debug build

Documentation

License

[Your License Here]

Credits

About

A lightweight audio transcription application for Hyprland that lets you quickly record and transcribe audio with a beautiful floating pill UI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages