Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lambda support #7

Merged
merged 3 commits into from
May 6, 2024
Merged

Add lambda support #7

merged 3 commits into from
May 6, 2024

Conversation

bsweger
Copy link
Collaborator

@bsweger bsweger commented May 2, 2024

Resolves #4

Although the hubverse_transform package in this repo can be installed and run anywhere, our most immediate need is to run it as an AWS Lambda function (Lambda is AWS's function-as-a-service offering).

To get this deployed to the hubverse-transform-model-output Lambda that already exists in the Hubverse's AWS account*, this PR adds two things:

  1. A handler function that is invoked whenever an AWS event triggers the lambda (when a hub's bucket receives a new model-output file, for example)
  2. A deployment script that packages the code into the .zip structure required by lambda

image

bsweger added 2 commits May 2, 2024 15:09
hubverse_transform doesn't use boto3: it was there for earlier testing
…mbda function

This changeset is the first step for getting this code deployable by
the hubverse-transform-model-output lambda function that exists in
the Hubverse's AWS account. This version of the code supports manually
updating the code in the lambda function by running bash script.

The next version will do the deployment via GitHub actions, as changes
are merged into the main branch.
logger.setLevel("INFO")


def lambda_handler(event, context):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this function is to receive the AWS S3 events that are emitted whenever a new/updated model-output file lands in a hub's S3 bucket. The function parses the event to get the name of the hub's S3 bucket and the name ("key") of the model-output file.

The bucket name + key are then used to create aModelOutputHandler object and transform the data.

For reference, this is an example of such an event

{
    "Records": [
        {
            "eventVersion": "2.1",
            "eventSource": "aws:s3",
            "awsRegion": "us-east-1",
            "eventTime": "2024-05-02T19:06:28.151Z",
            "eventName": "ObjectCreated:Put",
            "userIdentity": {
                "principalId": "howdy"
            },
            "requestParameters": {
                "sourceIPAddress": "redacted"
            },
            "responseElements": {
                "x-amz-request-id": "S7DY1N8KZP1F8J35",
                "x-amz-id-2": "LeRQgYUNlXYiMdY+ibFpvF0XUSjcu5tgyUyhufmsSavl+oPrpKF2L5/J1MYfe+F0wEKHUGnC+BxxOrQKhTiXPKmuAxVFA48R"
            },
            "s3": {
                "s3SchemaVersion": "1.0",
                "configurationId": "howdy",
                "bucket": {
                    "name": "hubverse-cloud",
                    "ownerIdentity": {
                        "principalId": "howdy"
                    },
                    "arn": "arn:aws:s3:::hubverse-cloud"
                },
                "object": {
                    "key": "raw/model-output/UMass-flusion/2023-10-14-UMass-flusion.csv",
                    "size": 357048,
                    "eTag": "af50008c99f29b39310b57be5f474d28",
                    "versionId": "nPvSV7RK1UNFXMGHNY9pDiNVbb5zBTJI",
                    "sequencer": "006633E433E70A377B"
                }
            }
        }
    ]
}


logger.info("Transforming file: {}/{}".format(bucket, key))
try:
mo = ModelOutputHandler.from_s3(bucket, key)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the handler actually invokes our code that performs the data transformations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script looks worse than it is. Ultimately, these steps will be performed in the context of a GitHub action (instead of being run by a human).

mkdir -p $build_dir/hubverse_transform

# output project requirements
pdm export --without dev --format requirements > $build_dir/requirements.txt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will delete this once pdm is removed.

This will ensure the lambda handler doesn't break when we add
s3:ObjectRemoved triggers to hubs' S3 buckets.
@matthewcornell
Copy link
Collaborator

Approved via joint review session.

Copy link
Collaborator

@matthewcornell matthewcornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved via joint review session.

@bsweger bsweger merged commit 6ea45cf into main May 6, 2024
1 check passed
@bsweger bsweger deleted the bsweger/add-lambda-support branch May 30, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run hubverse-transform's transform-model-output function in AWS lambda
2 participants