Skip to content

Commit

Permalink
Add a utility script that re-triggers hubverse-transform-model-output…
Browse files Browse the repository at this point in the history
… lambda

First iteration of a way to re-trigger the lambda function that fires
when new data is added to the raw/model-output/ folder of a hub's
S3 bucket.
  • Loading branch information
bsweger committed Aug 20, 2024
1 parent a52e249 commit 9f268e5
Show file tree
Hide file tree
Showing 5 changed files with 85 additions and 3 deletions.
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,4 +175,16 @@ To package the hubverse_transform code for deployment to the `hubverse-transform
3. From the root of this project, run the deploy script:
```bash
source deploy_lambda.sh
```
```

### Re-processing model-output files that have already been transformed

If you need to re-run the hubverse-transform function on model-output files that have already been uploaded to S3,
you can use the `lambda_retrigger_model_output_add.py` script in this repo's `faas/` folder.
This manual action should be done with care but can be handy if data needs to be re-processed (in the event of a
hubverse-transform bug fix, for example). The script works by updating the S3 metadata for every file in the
`raw/model-output` file of the hub's S3 bucket. The metadata update then triggers the lambda function that runs
when new incoming model-output files are detected.

**Note:** You will need write access to the hub's S3 bucket to use this script.
65 changes: 65 additions & 0 deletions faas/lambda_retrigger_model_output_add.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
Update metadata for all files in the raw/model-output/ directory of a specified AWS S3 bucket.
This script can be used for force re-triggering the lambda that transforms hubverse model-output files.
"""
import argparse
from datetime import datetime, timezone

import boto3
from botocore import exceptions as boto_exceptions


def main():
parser = argparse.ArgumentParser(
description="Re-trigger lambda that transforms hubverse model-output files for AWS S3 storage"
)

parser.add_argument(
"s3_bucket",
metavar="Hubverse S3 bucket",
type=str,
help="""
A Hubverse S3 bucket name. Metadata of the files in the raw/model-output/
directory of this bucket will be updated to trigger the transform lambda.
""",
)

args = parser.parse_args()
s3_bucket = args.s3_bucket
print(f"Updating metadata for all files in {s3_bucket}/raw/model-output/\n")

update_date = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f")
updated_file_count = 0

try:
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket=s3_bucket, Prefix="raw/model-output/")

for page in pages:
for obj in page.get("Contents", []):
key = obj["Key"]
print(f"Processing {key}")

s3_resource = boto3.resource("s3")
s3_object = s3_resource.Object(s3_bucket, key)
s3_object.metadata.update({"x-amz-meta-manual-update": update_date})
s3_object.copy_from(
CopySource={"Bucket": s3_bucket, "Key": key},
Metadata=s3_object.metadata,
MetadataDirective="REPLACE",
)
updated_file_count += 1

except boto_exceptions.NoCredentialsError:
print("No AWS credentials found. Please configure your AWS credentials.")
except boto_exceptions.ClientError as e:
print("Boto client error - ", e)
except Exception as e:
print("Error - ", e)

print(f"Updated metadata for {updated_file_count} files in {s3_bucket}/raw/model-output/")


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ classifiers = [
]

dependencies = [
"boto3",
"pyarrow>=16.0.0",
"cloudpathlib[s3]",
]
Expand Down
4 changes: 3 additions & 1 deletion requirements/requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# This file was autogenerated by uv via the following command:
# uv pip compile pyproject.toml --extra dev -o requirements/requirements-dev.txt
boto3==1.34.109
# via cloudpathlib
# via
# hubverse-transform (pyproject.toml)
# cloudpathlib
botocore==1.34.109
# via
# boto3
Expand Down
4 changes: 3 additions & 1 deletion requirements/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# This file was autogenerated by uv via the following command:
# uv pip compile pyproject.toml -o requirements/requirements.txt
boto3==1.34.109
# via cloudpathlib
# via
# hubverse-transform (pyproject.toml)
# cloudpathlib
botocore==1.34.109
# via
# boto3
Expand Down

0 comments on commit 9f268e5

Please sign in to comment.