SoundADE

Acoustic Descriptor Extraction tool for processing sound on High Performance Computing clusters.

SoundADE will recursively identify audio files within a directory structure and read these for processing by the pipeline. The pipeline will then compute a set of acoustic descriptors, pull out BirdNET species detection probabilities. If site-level information such as latitude, longitude and timezone are provided as a separate sites file, these will be used to extract solar and weather data.

Installation

Python

Install uv package. You can then bundle dependencies. Finally you can run the test suite, and the different parts of the pipeline either independently or as a whole using the CLI.

uv sync --locked
uv run pytest -v
uv run main.py --help

Docker

Install docker-ce package. You can then build and run the full pipeline using:

docker compose up --build

Singularity

Some HPC admins don't give users sudo privileges required to run docker. The project can be built using singularity which doesn't require privileges and is usually installed on HPC systems.

singularity build --fakeroot app.sif app.def

When on a machine you have sudo privileges, fakeroot isn't required, you can run:

singularity build --ignore-fakeroot-command -F app.sif app.def

If using SLURM based HPC, scripts to schedule these builds can be found under the slurm directory.

As with docker, you must rebuild every time you make changes to the source code.

If you need to inspect your container once its build, run:

singularity exec --env-file .env --bind $DATA_PATH:/data app.sif /bin/sh

Usage

Run the whole pipeline, specifying the relevant option depending on your setup configuration (docker / singularity / local python environment).

./run.sh

Configuration

Create your own custom dataset config file (see examples in ./config). This is where you should specify FFT and BirdNET parameters. Default parameters for the FFT and BirdNET will be set by the pipeline if none are specified, however you need to specify a means for the pipeline to extract information from the file paths, such as timestamp and site-level information in the form of a regular expression. You can specify additional site-level information, see below for more details.

Environment Variables

.env contains run time settings for the pipeline:

DATA_PATH: the path to the audio data you want processed N.B. Your site_locations.parquet file must also be in this location
SAVE_PATH: the path to where you want the results saved. Your locations.parquet file must be in this location. For more details on the locations.parquet see below.
CORES: the number of cores (local) or jobs (HPC) to be deployed to process your data
MEM_PER_CPU: the integer number of gigabytes of RAM deployed per core or job

Location Information

A regular expression is required to find and extract audio files along with their location information. See ./config for examples.

A file specifying site-specific information is required for the pipeline to run.

site_id	site_name	latitude	longitude	timezone
string \| integer	string	float32	float32	string

You need to ensure the site_name field matches the regular expression defined in the data class.

For example, consider the following folder structure:

└─── <site_level_1>
    └── <site_level_2>
        ├── <site_level_2>
        ├── ├─ <timestamp>.wav
        │   ├─ ....
        │   └─ <timestamp>.wav
        ├── <site_level_2>
        └── ├─ <timestamp>.wav
            ├─ ....
            └─ <timestamp>.wav

The site_name variable should match /<site_level_1>/<site_level_2>/<site_level_3>.

└─── EC
    └── TE
        ├── 9
        ├── ├─ 20150619_0630.wav
        │   ├─ ....
        │   └─ 20150621_0317.wav
        ├── 10
        └── ├─ 20150619_0630.wav
            ├─ ....
            └─ 20150621_0317.wav

In this case <site_level_1> is the country (EC = Ecuador), <site_level_2> is a site identifier (TE), and <site_level_3> is a recorder ID number. Therefore the site_name column must contain records '/EC/TE/9' for '/EC/TE/10'. The depth for the site level is arbitrary, you can define as many as you like in the regular expression for discovering audio files.

Tests

Run the test suite:

uv run pytest -v

Development

If you want to make a contribution to the codebase, please correspond with the package creators specified in the pyproject.toml.

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.devcontainer		.devcontainer
.github		.github
.idea		.idea
.vscode		.vscode
config		config
doc		doc
slurm		slurm
src		src
test		test
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
app.def		app.def
docker-compose.yaml		docker-compose.yaml
main.py		main.py
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundADE

Installation

Python

Docker

Singularity

Usage

Configuration

Environment Variables

Location Information

Tests

Development

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SoundADE

Installation

Python

Docker

Singularity

Usage

Configuration

Environment Variables

Location Information

Tests

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages