Voicebrief

Converts video / audio conversations to text and subsequently provides a summary into a manageable report.

Note: this project was miograted from using Poetry to uv with poetry2uv

Installation

Clone the repository. Use the fast Python package manager uv to install all the dependencies of Voicebrief.

uv sync

Note: Before running uv sync, install platform-specific dependencies. See the System Dependencies section below for details.

Configuration for usage with OpenAI

Create a text file ".env" in the root of the project. This will contain the "OPENAI_API_KEY" environment variable used by the application to obtain the token associated to a valid OpenAI account when calling the API.

OPENAI_API_KEY=sk-A_seCR_et_key_GENERATED_foryou_by_OPENAI
OPENAI_MODEL=gpt-4o  # Optional: defaults to gpt-4o if not specified

The key is loaded into the execution context of the application when run from the command line or run in the debugger.

Alternatively, if the file is not present, then 'voicebrief' will look for the environment variable "OPENAI_API_KEY".

You can also specify a different OpenAI model by setting the OPENAI_MODEL environment variable (e.g., "gpt-4o", "gpt-4-turbo", etc.).

Tests, checks etc

Voicebrief uses the following tools to test and verify the code:

pytest: my favourite Python testing tool
mypy: Optional static type checking for Python
flake8: lightweight linting tool/style enforcer (PEP8)
black: Python code formatter

You can run each tool with:

uv run tool-name [path]

for example:

uv run mypy voicebrief

or run all tools automatically with:

uv run check-all

In this case, the 'check-all' command stops running the commands if one fails.

Note that the 'flake8' tool obtains its settings from the '.flake8' config file, not pyproject.toml.

The 'check-all' command is implemented in the dev_env/runchecks.py script. You can run it directly with:

uv run python devenv/run_checks.py

Usage

Usage of the tool:

voicebrief -h
usage: voicebrief [-h] [-v] [-m] [-o] [-V] [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG}] [-g]
                  [path] [destination]

Voicebrief - Converts video / audio conversations to text and subsequently provides a summary into a manageable report.

positional arguments:
  path         Path to the media file
  destination  Optional destination directory (default: directory of "path" parameter)

options:
  -h, --help            show this help message and exit
  -v, --video           Consider "path" to be a video and extract the audio
  -m, --markdown        Generate a full human-readable markdown transcript with highest fidelity
  -o, --optimized       Generate optimized transcript (processed and structured version)
  -V, --verbose         Enable verbose debug logging (same as --log-level DEBUG)
  --log-level LEVEL     Set log level. Env fallback: VOICEBRIEF_LOG_LEVEL.
  -g, --gui             Launch the GTK interface (requires the optional `gui` extra)

When dealing with audio files larger than 20Mb, the audio file will be "split" into different files, stored in the sub-directoty "chunks" of the destination path. For each audio file a transcript text will be saved (stored with the prefix "transcript").

Output Options

Voicebrief provides flexible output options:

Raw transcripts (always generated): Original transcriptions from OpenAI Whisper, saved with prefix "transcription_"
Optimized transcript (-o, --optimized): AI-processed and structured version with improved organization and paragraph formatting, saved with prefix "optimized_"
Markdown transcript (-m, --markdown): Full human-readable markdown document with highest fidelity to original content, formatted with proper headings and structure, saved as "full_md_*.md"

You can use -m and -o together to generate both versions, or neither to get only raw transcripts.

Examples:

# Generate only raw transcripts
voicebrief audio.mp3

# Generate raw transcripts + optimized version
voicebrief audio.mp3 -o

# Generate raw transcripts + markdown version
voicebrief audio.mp3 -m

# Generate all versions (raw + optimized + markdown)
voicebrief audio.mp3 -o -m

# Process video with markdown output
voicebrief video.mp4 -v -m

To use the GUI, install the optional dependencies with uv sync --extra gui (Linux GNOME and macOS are supported). The GTK window provides file pickers and toggles for all CLI parameters.

Logging

Set a log level via flag:
- voicebrief --log-level DEBUG <path>
- or shorthand: voicebrief -V <path>
Or via environment variable:
- VOICEBRIEF_LOG_LEVEL=DEBUG voicebrief <path>
Logs include steps for video->audio extraction, ffmpeg chunking, transcription calls, and summary writing.

Development

Activate the Python virtual environment with

uv venv
source .venv/bin/activate

LEFT TO(BE)DO(NE)

Prompt engineering (COULD HAVE)

The summary should have certain guarantees related with the key-points and perhaps some meta-data: key participants, tone of conversation etc.

System Dependencies

Why FFmpeg?

'Voicebrief' utilizes the moviepy library for extensive video editing operations. moviepy itself relies on FFmpeg, a powerful multimedia framework capable of handling a vast array of video and audio formats. This dependency is crucial as FFmpeg performs the encoding and decoding of media, allowing 'voicebrief' to manipulate video and audio data effectively.

Verifying FFmpeg Installation

Before using 'voicebrief', ensure that FFmpeg is installed and accessible from your system's command line interface (CLI). Here's how you can verify the installation of FFmpeg on different operating systems:

Windows

Open the Command Prompt.
Type ffmpeg -version and press Enter.
If FFmpeg is installed, you will see the version information.
If you get an error saying 'ffmpeg' is not recognized, it is not installed.

Linux

Open a Terminal window.
Type ffmpeg -version and press Enter.
If FFmpeg is installed, you will see the version information.
If it's not installed, you might see a command suggestion for installation or an error message.

macOS

Open the Terminal app.
Type ffmpeg -version and press Enter.
If FFmpeg is installed, the version information will be displayed.
If it's not installed, you'll receive an error message indicating it's not found.

Installing FFmpeg

If FFmpeg is not installed, follow the instructions below for your operating system:

Windows

Download the FFmpeg build from https://ffmpeg.org/download.html#build-windows.
Extract the downloaded ZIP file.
Add the bin folder within the extracted folder to your system's Environment Variables in the Path section.
Verify the installation by following the verification steps above.

Linux (Ubuntu/Debian)

Update your package list: sudo apt-get update.
Install FFmpeg by running: sudo apt-get install ffmpeg.
Verify the installation using the steps provided in the verification section.

macOS

Install Homebrew, if it's not already installed, with: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)".
Install required system dependencies:
```
brew install pkg-config cairo ffmpeg
```
- pkg-config and cairo are required for building pycairo (GTK dependency)
- ffmpeg is required by moviepy/pydub
Verify the FFmpeg installation using the steps provided in the verification section.

Note: On Python 3.13+, the stdlib audioop was removed. Voicebrief declares the audioop-lts backport automatically for 3.13, so a normal uv sync installs it; no extra steps needed.

Ensure that FFmpeg is correctly installed and configured before proceeding with the usage of 'voicebrief'.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Copyright and license

Licensed under the MIT License MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
devenv		devenv
images		images
specs		specs
tests		tests
voicebrief		voicebrief
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE.txt		LICENSE.txt
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setpath.ps1		setpath.ps1
setpath.sh		setpath.sh
uv.lock		uv.lock
voicebrief.code-workspace		voicebrief.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voicebrief

Installation

Configuration for usage with OpenAI

Tests, checks etc

Usage

Output Options

Logging

Development

LEFT TO(BE)DO(NE)

Prompt engineering (COULD HAVE)

System Dependencies

Why FFmpeg?

Verifying FFmpeg Installation

Windows

Linux

macOS

Installing FFmpeg

Windows

Linux (Ubuntu/Debian)

macOS

Contributing

Copyright and license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

soyrochus/voicebrief

Folders and files

Latest commit

History

Repository files navigation

Voicebrief

Installation

Configuration for usage with OpenAI

Tests, checks etc

Usage

Output Options

Logging

Development

LEFT TO(BE)DO(NE)

Prompt engineering (COULD HAVE)

System Dependencies

Why FFmpeg?

Verifying FFmpeg Installation

Windows

Linux

macOS

Installing FFmpeg

Windows

Linux (Ubuntu/Debian)

macOS

Contributing

Copyright and license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages