Skip to content

Commit

Permalink
Merge pull request #1 from Infectious-Disease-Modeling-Hubs/bsweger/m…
Browse files Browse the repository at this point in the history
…ove-code-from-infrastructure-repo

Bsweger/move code from infrastructure repo
  • Loading branch information
bsweger authored Apr 30, 2024
2 parents f267f26 + 997c7f4 commit 14233c5
Show file tree
Hide file tree
Showing 12 changed files with 1,034 additions and 2 deletions.
38 changes: 38 additions & 0 deletions .github/workflows/run-code-checks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: run-code-checks

on:
push:
pull_request:
# to avoid duplicate workflow runs, don't run when PR is synchronized
# (which is the default) because it will run on the push
types: [opened, reopened, ready_for_review]
workflow_dispatch:

jobs:
run-checks:
runs-on: ubuntu-latest
steps:
- name: Checkout 🛎️
uses: actions/checkout@v4

- name: Set up PDM 🗻
# using the PDM action because actions/setup-python doesn't suppport
# caching for PDM out of the box
# https://github.com/pdm-project/setup-pdm?tab=readme-ov-file#why-do-i-need-this-action
uses: pdm-project/setup-pdm@v4
with:
cache: true

- name: Install dependencies 📦
run: pdm install

- name: lint 🧹
run: pdm run ruff check

- name: type check ✔️
run: pdm run mypy . --ignore-missing-imports

- name: run tests 🧪
run: pdm run pytest


1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ ipython_config.py
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
.pdm-python

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
Expand Down
28 changes: 28 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.1.11
hooks:
# Run the linter.
- id: ruff
args: [ --fix ]
# Run the formatter.
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: check-yaml
args: [--allow-multiple-documents]
- id: detect-aws-credentials
args: [--allow-missing-credentials]
- id: detect-private-key
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
hooks:
- id: codespell
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.8.0' # Use the sha / tag you want to point at
hooks:
- id: mypy
additional_dependencies: [types-all]
146 changes: 144 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,144 @@
# hubverse-transform
Data transform functions for hubverse model-output files
# Hubverse Transform
A package to perform data transformations on hubverse model-output files.

The package contains a `ModelOutputHandler` class that reads, transforms, and writes a single Hubverse-compliant model-output file.

Currently, its primary purpose is for use as an AWS Lambda function that transforms model-output files uploaded to hub S3 bucket.

## Usage

To install this package:

```bash
pip install git+https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-transform.git
```

Sample usage:

```python
from hubverse_transform.model_output import ModelOutputHandler

# to use with a local model-output file

mo = ModelOutputHandler(
'~/code/hubverse-cloud/model-output/UMass-flusion/2023-10-14-UMass-flusion.csv',
'/.'

)
# read the original model-output file into an Arrow table
original_file = mo.read_file()

# add new columns to the original model_output data
transformed_data = mo.add_columns(original_file)

# write transformed data to parquet
# TODO: fix this up for local filesystem (it's currently designed for S3 writes)
# mo.write(transformed_data)
```

Sample output of the original and transformed data:
```
In [31]: original_file.take([0,1])
Out[31]:
pyarrow.Table
reference_date: date32[day]
location: string
horizon: int64
target: string
target_end_date: date32[day]
output_type: string
output_type_id: double
value: double
----
reference_date: [[2023-10-14,2023-10-14]]
location: [["01","01"]]
horizon: [[0,0]]
target: [["wk inc flu hosp","wk inc flu hosp"]]
target_end_date: [[2023-10-14,2023-10-14]]
output_type: [["quantile","quantile"]]
output_type_id: [[0.01,0.025]]
value: [[0,1.5810684371620558]]
In [36]: transformed_data.take([0,1])
Out[36]:
pyarrow.Table
reference_date: date32[day]
location: string
horizon: int64
target: string
target_end_date: date32[day]
output_type: string
output_type_id: double
value: double
round_id: string
team: string
model: string
----
reference_date: [[2023-10-14,2023-10-14]]
location: [["01","01"]]
horizon: [[0,0]]
target: [["wk inc flu hosp","wk inc flu hosp"]]
target_end_date: [[2023-10-14,2023-10-14]]
output_type: [["quantile","quantile"]]
output_type_id: [[0.01,0.025]]
value: [[0,1.5810684371620558]]
round_id: [["2023-10-14","2023-10-14"]]
team: [["UMass","UMass"]]
...
```

## Dev setup

If you'd like to contribute, this section has the setup instructions.

**Prerequisites**

The setup instructions below use [PDM](https://pdm-project.org/) to install Python, manage a Python virtual environment, and manage dependencies. However, PDM is only absolutely necessary for managing dependencies (because the lockfile is in PDM format), so other tools for Python installs and environments will work as well.

To install PDM: https://pdm-project.org/en/latest/#installation

**Setup**

Follow the directions below to set this project up on your local machine.

1. Clone this repository and change into the project directory.
2. Make sure you have a version of Python installed that meets the `requires-python` constraint in [pyproject.toml](pyproject.toml).

**Note:** if you don't have Python installed, PDM can install it for you: `pdm python install 3.12.2`
3. Install the project dependencies (this will also create a virtual environment):

```bash
pdm install
```
4. Verify that everything is working by running the test suite:

```bash
pdm run pytest
```

To sync project dependencies after pulling upstream code changes:

```bash
pdm sync
```

## Adding Dependencies

This project uses PDM to manage dependencies and add them to a cross-platform lockfile.

To add a new dependency:

```bash
pdm add [package-name]
```

To add a new dev dependency:

```bash
pdm add --dev [package-name]
```

The `pdm add` command will install the package, add it to [`pyproject.toml`](pyproject.toml), and update [`pdm.lock`](pdm.lock).

Refer to [PDM's documentation](https://pdm-project.org/latest/usage/dependency/) for complete information about adding dependencies.
Loading

0 comments on commit 14233c5

Please sign in to comment.