Converts video / audio conversations to text and subsequently provides a summary into a manageable report.
Note: this project was miograted from using Poetry to uv with poetry2uv
Clone the repository. Use the fast Python package manager uv to install all the dependencies of Voicebrief.
uv syncNote: Before running uv sync, install platform-specific dependencies. See the System Dependencies section below for details.
Create a text file ".env" in the root of the project. This will contain the "OPENAI_API_KEY" environment variable used by the application to obtain the token associated to a valid OpenAI account when calling the API.
OPENAI_API_KEY=sk-A_seCR_et_key_GENERATED_foryou_by_OPENAI
OPENAI_MODEL=gpt-4o # Optional: defaults to gpt-4o if not specifiedThe key is loaded into the execution context of the application when run from the command line or run in the debugger.
Alternatively, if the file is not present, then 'voicebrief' will look for the environment variable "OPENAI_API_KEY".
You can also specify a different OpenAI model by setting the OPENAI_MODEL environment variable (e.g., "gpt-4o", "gpt-4-turbo", etc.).
Voicebrief uses the following tools to test and verify the code:
- pytest: my favourite Python testing tool
- mypy: Optional static type checking for Python
- flake8: lightweight linting tool/style enforcer (PEP8)
- black: Python code formatter
You can run each tool with:
uv run tool-name [path]for example:
uv run mypy voicebriefor run all tools automatically with:
uv run check-allIn this case, the 'check-all' command stops running the commands if one fails.
Note that the 'flake8' tool obtains its settings from the '.flake8' config file, not pyproject.toml.
The 'check-all' command is implemented in the dev_env/runchecks.py script. You can run it directly with:
uv run python devenv/run_checks.pyUsage of the tool:
voicebrief -h
usage: voicebrief [-h] [-v] [-m] [-o] [-V] [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG}] [-g]
[path] [destination]
Voicebrief - Converts video / audio conversations to text and subsequently provides a summary into a manageable report.
positional arguments:
path Path to the media file
destination Optional destination directory (default: directory of "path" parameter)
options:
-h, --help show this help message and exit
-v, --video Consider "path" to be a video and extract the audio
-m, --markdown Generate a full human-readable markdown transcript with highest fidelity
-o, --optimized Generate optimized transcript (processed and structured version)
-V, --verbose Enable verbose debug logging (same as --log-level DEBUG)
--log-level LEVEL Set log level. Env fallback: VOICEBRIEF_LOG_LEVEL.
-g, --gui Launch the GTK interface (requires the optional `gui` extra)
When dealing with audio files larger than 20Mb, the audio file will be "split" into different files, stored in the sub-directoty "chunks" of the destination path. For each audio file a transcript text will be saved (stored with the prefix "transcript").
Voicebrief provides flexible output options:
- Raw transcripts (always generated): Original transcriptions from OpenAI Whisper, saved with prefix "transcription_"
- Optimized transcript (
-o, --optimized): AI-processed and structured version with improved organization and paragraph formatting, saved with prefix "optimized_" - Markdown transcript (
-m, --markdown): Full human-readable markdown document with highest fidelity to original content, formatted with proper headings and structure, saved as "full_md_*.md"
You can use -m and -o together to generate both versions, or neither to get only raw transcripts.
Examples:
# Generate only raw transcripts
voicebrief audio.mp3
# Generate raw transcripts + optimized version
voicebrief audio.mp3 -o
# Generate raw transcripts + markdown version
voicebrief audio.mp3 -m
# Generate all versions (raw + optimized + markdown)
voicebrief audio.mp3 -o -m
# Process video with markdown output
voicebrief video.mp4 -v -mTo use the GUI, install the optional dependencies with uv sync --extra gui (Linux GNOME and macOS are supported). The GTK window provides file pickers and toggles for all CLI parameters.
- Set a log level via flag:
voicebrief --log-level DEBUG <path>- or shorthand:
voicebrief -V <path>
- Or via environment variable:
VOICEBRIEF_LOG_LEVEL=DEBUG voicebrief <path>
- Logs include steps for video->audio extraction, ffmpeg chunking, transcription calls, and summary writing.
Activate the Python virtual environment with
uv venv
source .venv/bin/activateThe summary should have certain guarantees related with the key-points and perhaps some meta-data: key participants, tone of conversation etc.
'Voicebrief' utilizes the moviepy library for extensive video editing operations. moviepy itself relies on FFmpeg, a powerful multimedia framework capable of handling a vast array of video and audio formats. This dependency is crucial as FFmpeg performs the encoding and decoding of media, allowing 'voicebrief' to manipulate video and audio data effectively.
Before using 'voicebrief', ensure that FFmpeg is installed and accessible from your system's command line interface (CLI). Here's how you can verify the installation of FFmpeg on different operating systems:
- Open the Command Prompt.
- Type
ffmpeg -versionand press Enter. - If
FFmpegis installed, you will see the version information. - If you get an error saying 'ffmpeg' is not recognized, it is not installed.
- Open a Terminal window.
- Type
ffmpeg -versionand press Enter. - If
FFmpegis installed, you will see the version information. - If it's not installed, you might see a command suggestion for installation or an error message.
- Open the Terminal app.
- Type
ffmpeg -versionand press Enter. - If
FFmpegis installed, the version information will be displayed. - If it's not installed, you'll receive an error message indicating it's not found.
If FFmpeg is not installed, follow the instructions below for your operating system:
- Download the
FFmpegbuild from https://ffmpeg.org/download.html#build-windows. - Extract the downloaded ZIP file.
- Add the
binfolder within the extracted folder to your system's Environment Variables in the Path section. - Verify the installation by following the verification steps above.
- Update your package list:
sudo apt-get update. - Install
FFmpegby running:sudo apt-get install ffmpeg. - Verify the installation using the steps provided in the verification section.
- Install Homebrew, if it's not already installed, with:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)". - Install required system dependencies:
brew install pkg-config cairo ffmpeg
pkg-configandcairoare required for building pycairo (GTK dependency)ffmpegis required by moviepy/pydub
- Verify the FFmpeg installation using the steps provided in the verification section.
Note: On Python 3.13+, the stdlib audioop was removed. Voicebrief declares the audioop-lts backport automatically for 3.13, so a normal uv sync installs it; no extra steps needed.
Ensure that FFmpeg is correctly installed and configured before proceeding with the usage of 'voicebrief'.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Copyright © 2024 Iwan van der Kleijn
Licensed under the MIT License MIT
