Skip to content

Template for creating a good python ML project

Notifications You must be signed in to change notification settings

r-pad/python_ml_project_template

This branch is 25 commits ahead of beneisner/python_pkg_template:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4c2a039 · Jan 4, 2025

History

32 Commits
Oct 18, 2024
Oct 2, 2024
Jan 4, 2025
Jun 22, 2023
Jun 17, 2023
Aug 28, 2022
Jan 4, 2025
Oct 18, 2024
Jun 17, 2023
Aug 4, 2023
Oct 2, 2024
Jun 17, 2023
Jun 17, 2023
Oct 18, 2024
Jan 4, 2025
Aug 4, 2023
Jan 4, 2025
Aug 21, 2022

Repository files navigation

python_ml_project_template

This is a template for a Python Machine Learning project with the following features:

In addition, it contains all the good features from the original version of this repository (and is a proper Python package):

  • Installable via pip install. Anyone can point directly to this Github repository and install your project, either as a regular dependency or as an editable one.
  • Uses the new PEP 518, officially-recommended pyproject.toml structure for defining project structure and dependencies (instead of requirements.txt)
  • Nice, static documentation website support, using mkdocs-material. Structure can be found in docs/
  • black support by default, which is an opinionated code formatting tool
  • pytest support, which will automatically run tests found in the tests/ directory
  • mypy support, for optional typechecking of type hints
  • pre-commit support, which runs various formatting modifiers on commit to clean up your dirty dirty code automatically.
  • Github Actions support, which runs the following:
    • On a Pull Request: install dependencies, run style checks, run Python tests
    • After merge: same a Pull Request, but also deploy the docs site to the projects Github Pages URL!!!!

All that needs doing is replacing all occurances of python_ml_project_template and python-ml-project-template with the name of your package(including the folder src/python_ml_project_template), the rest should work out of the box!

Installation

First, we'll need to install platform-specific dependencies for Pytorch. See here for more details. For example, if we want to use CUDA 12.4 with Pytorch 2.

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu124

Then, we can install the package itself:

pip install -e ".[develop,notebook]"

Then we install pre-commit hooks:

pre-commit install

Docker

To build the docker image, run:

docker build -t <my_dockerhub_username>/python-ml-project-template .

To run the training script locally, run:

WANDB_API_KEY=<API_KEY>
# Optional: mount current directory to run / test new code.
# Mount data directory to access data.
docker run \
    -v $(pwd)/data:/opt/baeisner/data \
    -v $(pwd)/logs:/opt/baeisner/logs \
    --gpus all \
    -e WANDB_API_KEY=$WANDB_API_KEY \
    -e WANDB_DOCKER_IMAGE=python-ml-project-template \
    python-ml-project-template python scripts/train.py \
        dataset.data_dir=/root/data \
        log_dir=/root/logs

To push this:

docker push <my_dockerhub_username>/python-ml-project-template:latest

Using the CI.

Set up pushing to docker:

Put the following secrets in the Github repository:

  • DOCKERHUB_USERNAME: Your Dockerhub username
  • DOCKERHUB_TOKEN: Your Dockerhub token

You'll also need to Ctrl-F replace instances of beisner and baeisner with appropriate usernames.

Running on Clusters

About

Template for creating a good python ML project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 69.0%
  • Shell 26.4%
  • Dockerfile 4.6%