Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

Merged
merged 13 commits into from
Nov 11, 2024

Conversation

thvasilo
Copy link
Contributor

@thvasilo thvasilo commented Nov 1, 2024

Issue #, if available:

Description of changes:

  • Add a new SageMaker job to convert DistPart data to GraphBolt. This is our only option currently as there's no way to directly use S3 as a writable, shared file system in SageMaker, see Add support for direct S3 access on SageMaker tasks #1081 for details.
  • The sagemaker/launch_graphbolt_convert.py will launch the SageMaker job, that downloads the entire partitioned graph to one instance, then runs the GB conversion, one partition at a time. Because DGL writes the new fused CSC graph representation in the same directory as the input data, we can't use one of SageMaker's FastFile modes to stream the data, as that creates read-only filesystems.
  • [Optional] We also include an example of how one could use a SageMaker Pipeline to run the GSPartition and GBConvert jobs in sequence, but this can be removed (because SageMaker Pipelines are persistent once created).
  • Added unit test mechanism to test sagemaker scripts, we start with testing our parsing logic. To make the scripts available to the runner's python runtime we add the graphstorm/sagemaker/launch directory to the runner's PYTHONPATH.

EDIT: One note about the PR: The changes to the partition launch that use a SageMaker Pipeline are for demonstration purposes, I think I'll remove them alltogether and just have separate partition/gbconvert jobs. But we might want to have an example of how to programmatically build an SM pipeline as an example, e.g. from gsprocessing to training (as SM jobs)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@thvasilo thvasilo self-assigned this Nov 1, 2024
@thvasilo thvasilo changed the title [SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on S… [SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker Nov 1, 2024
@thvasilo thvasilo added this to the 0.4 release milestone Nov 4, 2024
@thvasilo thvasilo marked this pull request as ready for review November 4, 2024 23:00
@thvasilo thvasilo added the ready able to trigger the CI label Nov 5, 2024
Copy link
Contributor

@classicsong classicsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you launch the convert2graphbolt task?

Copy link
Collaborator

@jalencato jalencato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In overall I think we need to run separate sagemaker regression test before we actually merge the code to make sure there is no other backward compatibility problem.

@thvasilo thvasilo merged commit 19fac3b into awslabs:main Nov 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.4 ready able to trigger the CI sagemaker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants