UV Data Science Project Template

Image by David T. [Source: Astral]

Tutorial Project for 1) Developing Data Science Projects in a Dev Container, and 2) Machine Learning Applications in Production

This guide provides instructions on how to develop and productionize machine learning applications in a robust and efficient way.
It is demonstrated how to achieve this using a modern setup of tools, like UV, Docker, Ruff, FastAPI and more (see Overview Tools Section). The focus of this project is to give an introduction to using those tools and not on how to properly set up a machine learning application (for production). Therefore only a simple machine learning pipeline based on PyTorch/Lightning and FastAPI is used.

See the related Project Documentation for additional information.

UV Data Science Project Template

Overview Tools

The project includes the following components, for more details see Documentation - Guides:

Tool	Description
UV	A fast and efficient package manager for Python, written in Rust. It replaces tools like pip and virtualenv.
Ruff	An extremely fast Python linter, formatter, and code assistant, written in Rust.
PyRight	A static type checker for Python, helping to catch type-related errors early in the development process.
PyTest	A powerful and flexible testing framework for Python, simplifying writing and running tests.
Coverage	A tool for measuring code coverage of Python programs, helping to ensure that all parts of the code are tested.
Pre-Commit	A framework for managing and maintaining multi-language pre-commit hooks to ensure code quality.
CI-GitHub	Continuous Integration setup using GitHub Actions to automate testing, linting, and deployment.
MkDocs	A static site generator geared towards building project documentation, written in Markdown.
VSCode-DevContainer	A development environment setup using Docker and VS Code, providing a consistent and isolated workspace.
Docker-Production	Docker setup for creating a lean, efficient, and secure production environment for applications.

Using uv to Manage the Project

UV is a tool that simplifies the management of Python projects and virtual environments. It handles dependency installation, virtual environment creation, and other project configurations. In this project, UV is used to manage dependencies and the virtual environment inside the Docker container, ensuring a consistent and reproducible setup.

See Guides - UV for a comprehensive guide.

pyproject toml

The pyproject.toml file includes the following sections:

Project metadata (name, version, description, etc.).
Dependencies required for the project.
Dependency groups for development and documentation.
Configuration for tools and packaging.

# filepath: pyproject.toml
[project]
name = "uv-datascience-project-template"
version = "0.1.0"
description = "Tutorial Project for 1) Data Science in a Dev Container, and 2) for a Machine Learning Application in Production; using Docker, UV, and FastAPI"
readme = "README.md"
license = {text = "MIT"}
authors = [
    {name = "David Tiefenthaler"}
]
urls = {repository = "https://github.com/tiefenthaler/uv-datascience-project-template"}
keywords = [
    "data science project",
    "docker",
    "python",
    "template",
    "uv"
]
requires-python = ">=3.12.0, <3.13.0"
dependencies = [
    "fastapi[standard]>=0.115.6",
    "lightning>=2.4.0",
    "pydantic>=2.10.4",
    "torch>=2.4.1",
    "torchvision>=0.20.1",
    "uvicorn>=0.34.0"
]

# DEV SETTING
[dependency-groups]
dev = [
    "ipykernel>=6.29.5",
    "jupyterlab>=4.3.1",
    "pyright>=1.1.393",
    "pytest-cov>=6.0.0",
    "pytest>=8.1.1",
    "ruff>=0.9.4",
    "toml-sort>=0.24.2",
    "uv>=0.5.26",
    "pre-commit>=4.1.0",
]
docs = [
    "mkdocs>=1.6.1",
    "mkdocs-include-markdown-plugin>=7.1.3",
    "mkdocs-jupyter>=0.25.1",
    "mkdocs-material>=9.6.3",
    "mkdocstrings[python]>=0.15.0",
]

# DEV SETTING
[tool.uv]
default-groups = ["dev"]

# DEV SETTING
# ruff.toml file is used.
[tool.ruff]

# DEV SETTING
# pytest.ini file is used.
[tool.pytest]

# DEV SETTING
# .coveragerc file is used. A pytest fixture in .conftest.py is used to create coverage file/report directory.
[tool.coverage]

# DEV SETTING
# pyrightconfig.json file is used. NOTE: Ensure to set the python version correctly.
[tool.pyright]

# DEV SETTING
# NOTE: to sort, run: "uv run toml-sort pyproject.toml"
[tool.tomlsort]
in_place = true
no_sort_tables = true
sort_inline_arrays = true
spaces_before_inline_comment = 4
spaces_indent_inline_array = 4

[tool.hatch.build.targets.wheel]
packages = ["src/uv_datascience_project_template"]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Custom Code in src Folder

See Source Code API Reference for a comprehensive documentation.

The src folder contains the custom code for the machine learning project. The main components include:

lit_auto_encoder

This file defines the LitAutoEncoder class, which is a LightningModule an autoencoder using PyTorch Lightning. The LitAutoEncoder class includes:

An __init__ method to initialize the encoder and decoder.
A training_step method to define the training loop.
A configure_optimizers method to set up the optimizer.

train_autoencoder

This file defines the training function train_litautoencoder to initialize and train the model on the MNIST dataset using PyTorch Lightning.

FastAPI Application

The FastAPI application is defined in the app_fastapi_autoencoder.py file. It includes the following endpoints:

GET /: Root endpoint that provides a welcome message and instructions.
POST /train: Endpoint to train the autoencoder model.
POST /embed: Endpoint to embed fake images using the trained autoencoder.

app_fastapi_autoencoder

See Source Code API Reference for a comprehensive documentation.

This file defines the FastAPI application and the endpoints. It includes:

Importing necessary libraries and modules.
Defining global variables for the encoder, decoder, and model training status.
A NumberFakeImages class for input validation.
A train_litautoencoder function to initialize and train the autoencoder.
A read_root function to handle the root endpoint.
A train_model function to handle the model training endpoint.
An embed function to handle the embedding endpoint.
The application entry point to run the FastAPI application.

main

This file defines the uvicorn server to run the FastAPI AutoEncoder application and the endpoints. It includes:

Importing necessary libraries and modules, including the source code of the project.
The application entry point to run the FastAPI application.

# filepath: main.py

# Application entry point
if __name__ == "__main__":
    # Run the FastAPI application
    uvicorn.run(app=app, host="0.0.0.0", port=8000)

Production Setup for the Machine Learning FastAPI App hosted in the Docker container

See Docker Production Setup for a comprehensive guide.

Dockerfile

The Dockerfile is used to build the Docker image for the project. It includes the following steps:

Define build-time arguments for the base container images and workspace name.
Use a Python image with uv pre-installed.
Set the working directory.
Enable bytecode compilation for faster startup.
Copy and install dependencies without installing the project.
Copy the application source code and install it.
Add executables and source to environment paths.
Set the default command to run the FastAPI application.

Multi Stage Dockerfile

To build the multistage image for a container optimized final image without uv use the multistage.Dockerfile.

Docker Compose

The docker-compose.yml file is used to define and run multi-container Docker applications. It includes the following configurations:

Build the image from the Dockerfile.
Define the image name.
Host the FastAPI application on port 8000.
Mount the current directory to the app directory in the container.
Set environment variables.
Define the default command to start the FastAPI application.

Build the docker image and run a container

Build and run a specific or all services when multiple services ("app" and "app-optimized-docker") are defined in docker-compose.yml. Note that in the give example both services us the same port and only one service at a time should be used.

docker-compose up --build

or to build a single service only "app" respectively "app-optimized-docker".

docker-compose up --build app

docker-compose up --build app-optimized-docker

Test the endpoint with curl

Welcome root endpoint
```
curl -X GET http://0.0.0.0:8000/
```
Get docs of the request options of the FastAPI app:
```
curl -X GET http://0.0.0.0:8000/docs
```

Test the endpoint with curl by training the model first, followed by requesting predictions for n fake images

curl -X POST http://0.0.0.0:8000/train \
curl -X POST http://0.0.0.0:8000/embed -H "Content-Type: application/json" -d '{"n_fake_images": 4}'

Development in Dev Container

See VSCode Dev-Container (Docker) Setup for Data Science Projects using UV for a comprehensive guide.

Run the server: uv run /workspace/main.py

Test the standard endpoints with curl:

Get docs of the request options of the FastAPI app
```
curl -X GET http://localhost:8000/docs
```
Welcome root request of the FastAPI app, providing an app description
```
curl -X GET http://localhost:8000/
```

Test the machine learning endpoints with curl:

curl -X POST http://localhost:8000/train \
curl -X POST http://localhost:8000/embed -H "Content-Type: application/json" -d '{"n_fake_images": 1}'

Conclusion

This repository provides a comprehensive overview of setting up and running the machine learning FastAPI project using Docker and uv. Follow the instructions to build and run the application in both development and production environments. The project demonstrates how to develop and productionize machine learning applications using modern tools and best practices.

Additionally, ensure to review the provided guides and documentation for detailed instructions on various setups and configurations necessary for optimal project performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UV Data Science Project Template

Overview Tools

Using uv to Manage the Project

pyproject toml

Custom Code in src Folder

lit_auto_encoder

train_autoencoder

FastAPI Application

app_fastapi_autoencoder

main

Production Setup for the Machine Learning FastAPI App hosted in the Docker container

Dockerfile

Multi Stage Dockerfile

Docker Compose

Build the docker image and run a container

Test the endpoint with curl

Development in Dev Container

Conclusion

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.devcontainer		.devcontainer
.github		.github
docs		docs
notebooks		notebooks
src/uv_datascience_project_template		src/uv_datascience_project_template
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
mkdocs.yml		mkdocs.yml
multistage.Dockerfile		multistage.Dockerfile
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
ruff.toml		ruff.toml
uv.lock		uv.lock

License

tiefenthaler/uv-datascience-project-template

Folders and files

Latest commit

History

Repository files navigation

UV Data Science Project Template

Overview Tools

Using uv to Manage the Project

pyproject toml

Custom Code in src Folder

lit_auto_encoder

train_autoencoder

FastAPI Application

app_fastapi_autoencoder

main

Production Setup for the Machine Learning FastAPI App hosted in the Docker container

Dockerfile

Multi Stage Dockerfile

Docker Compose

Build the docker image and run a container

Test the endpoint with curl

Development in Dev Container

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages