A Python toolkit for formatting audio snippets with an agent in the loop. Built on WhisperX; extract word-level, forced-aligned, speaker-labeled CSVs from audio, then search, format, and chunk them.
Important
Versions below 1.0.0 are considered unstable; APIs, CLI flags, and output formats may change without notice between releases. Pin an exact version if you need stability, and review the changelog before upgrading. Feedback, bug reports, and feature requests are very welcome; please open an issue.
| Module | Description | Docs |
|---|---|---|
extract |
Transcribe audio with speaker diarization | → |
format |
Format CSV transcripts into readable scripts | → |
chunk |
Split audio into segments via YAML config | → |
search |
Fuzzy search transcripts by word or phrase | → |
# Install uv (skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/beckettfrey/speech-mine
cd speech-mine
# Install dependencies into a local virtualenv
uv syncImportant
A .env file (e.g. holding HF_TOKEN for the integration test suite) is intended for local development only. Never feed .env contents to an LLM, paste them into a chat, commit them, or expose them as MCP tool inputs. Tokens placed in conversation context can be logged, cached, or echoed back into tool calls.
# 1. (Optional) Chunk a long recording into segments
uv run speech-mine chunk recording.wav chunks.yaml chunks/
# 2. Extract a transcript
uv run speech-mine extract interview.mp3 output.csv \
--hf-token YOUR_TOKEN \
--num-speakers 2 \
--compute-type float32
# 3. Format into a readable script
uv run speech-mine format output.csv script.txt
# 4. Search it
uv run speech-mine search "topic of interest" output.csv --pretty
# 5. (Optional) Chunk the recording again around segments of interest
uv run speech-mine chunk recording.wav segments.yaml clips/speech-mine includes an MCP server that exposes all tools to Claude Code and other MCP clients.
Install globally (no clone needed):
claude mcp add speech-mine --env HF_TOKEN=your_huggingface_token -- uvx --from speech-mine speech-mine-mcpThis pulls the latest published version from PyPI via uvx. After running it, restart Claude Code — the search_transcript, extract_audio, chunk_audio, and other tools will be available in your session.
# Serve docs locally
uv run mkdocs serveOr browse the docs/ folder directly.
MIT
