[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

thvasilo · 2024-11-01T00:57:22Z

Issue #, if available:

Description of changes:

Add a new SageMaker job to convert DistPart data to GraphBolt. This is our only option currently as there's no way to directly use S3 as a writable, shared file system in SageMaker, see Add support for direct S3 access on SageMaker tasks #1081 for details.
The sagemaker/launch_graphbolt_convert.py will launch the SageMaker job, that downloads the entire partitioned graph to one instance, then runs the GB conversion, one partition at a time. Because DGL writes the new fused CSC graph representation in the same directory as the input data, we can't use one of SageMaker's FastFile modes to stream the data, as that creates read-only filesystems.
[Optional] We also include an example of how one could use a SageMaker Pipeline to run the GSPartition and GBConvert jobs in sequence, but this can be removed (because SageMaker Pipelines are persistent once created).
Added unit test mechanism to test sagemaker scripts, we start with testing our parsing logic. To make the scripts available to the runner's python runtime we add the graphstorm/sagemaker/launch directory to the runner's PYTHONPATH.

EDIT: One note about the PR: The changes to the partition launch that use a SageMaker Pipeline are for demonstration purposes, I think I'll remove them alltogether and just have separate partition/gbconvert jobs. But we might want to have an example of how to programmatically build an SM pipeline as an example, e.g. from gsprocessing to training (as SM jobs)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ageMaker

docs/source/advanced/using-graphbolt.rst

python/graphstorm/gpartition/convert_to_graphbolt.py

sagemaker/launch/common_parser.py

sagemaker/launch/launch_train.py

sagemaker/launch/launch_partition.py

tests/sagemaker-tests/test_sagemaker_args.py

.github/workflow_scripts/pytest_check.sh

docker/sagemaker/Dockerfile.sm

docs/source/advanced/using-graphbolt.rst

sagemaker/run/gconstruct_entry.py

tests/sagemaker-tests/test_sagemaker_args.py

sagemaker/launch/launch_partition.py

Co-authored-by: xiang song(charlie.song) <[email protected]>

classicsong

How do you launch the convert2graphbolt task?

docs/source/advanced/using-graphbolt.rst

sagemaker/launch/common_parser.py

tests/sagemaker-tests/test_sagemaker_args.py

Co-authored-by: xiang song(charlie.song) <[email protected]>

jalencato

In overall I think we need to run separate sagemaker regression test before we actually merge the code to make sure there is no other backward compatibility problem.

python/graphstorm/sagemaker/sagemaker_partition.py

sagemaker/launch/launch_graphbolt_convert.py

Co-authored-by: xiang song(charlie.song) <[email protected]>

docs/source/advanced/using-graphbolt.rst

sagemaker/run/gb_convert_entry.py

thvasilo added 0.4 sagemaker labels Nov 1, 2024

thvasilo self-assigned this Nov 1, 2024

thvasilo changed the title ~~[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on S…~~ [SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker Nov 1, 2024

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on S…

de02686

…ageMaker

thvasilo force-pushed the gsf-sm-graphbolt branch from 91cb60a to de02686 Compare November 4, 2024 22:43

thvasilo added this to the 0.4 release milestone Nov 4, 2024

thvasilo marked this pull request as ready for review November 4, 2024 23:00

thvasilo requested review from classicsong and jalencato November 4, 2024 23:00

jalencato reviewed Nov 4, 2024

View reviewed changes

Add sagemaker unit tests

979e3b0

thvasilo added the ready able to trigger the CI label Nov 5, 2024

thvasilo commented Nov 5, 2024

View reviewed changes

tests/sagemaker-tests/test_sagemaker_args.py Show resolved Hide resolved

classicsong reviewed Nov 5, 2024

View reviewed changes

thvasilo and others added 3 commits November 5, 2024 14:00

Apply suggestions from code review

6d10766

Co-authored-by: xiang song(charlie.song) <[email protected]>

Apply review comments

3a92a8e

Fix SM entry point bash script

e5c5e95

thvasilo force-pushed the gsf-sm-graphbolt branch from cc85064 to 03bb561 Compare November 5, 2024 23:49

Handle single string unknown args

99e3b53

thvasilo force-pushed the gsf-sm-graphbolt branch from 03bb561 to 99e3b53 Compare November 5, 2024 23:53

classicsong reviewed Nov 6, 2024

View reviewed changes

docs/source/advanced/using-graphbolt.rst Outdated Show resolved Hide resolved

sagemaker/launch/common_parser.py Outdated Show resolved Hide resolved

sagemaker/launch/common_parser.py Outdated Show resolved Hide resolved

tests/sagemaker-tests/test_sagemaker_args.py Show resolved Hide resolved

thvasilo and others added 3 commits November 6, 2024 16:26

Add missing launch script

a0d7e2b

Apply suggestions from code review

66729f4

Co-authored-by: xiang song(charlie.song) <[email protected]>

Handle single-arg quoted gsf arg string

8fca6b0

jalencato reviewed Nov 6, 2024

View reviewed changes

python/graphstorm/sagemaker/sagemaker_partition.py Show resolved Hide resolved

classicsong reviewed Nov 7, 2024

View reviewed changes

thvasilo and others added 2 commits November 7, 2024 17:01

Add missing entry point

463bccb

Apply suggestions from code review

56b9ebd

Co-authored-by: xiang song(charlie.song) <[email protected]>

jalencato reviewed Nov 7, 2024

View reviewed changes

docs/source/advanced/using-graphbolt.rst Show resolved Hide resolved

Auto-format by black, replace print with logging, fix docs

d1f96bb

thvasilo force-pushed the gsf-sm-graphbolt branch from 1b72e36 to d1f96bb Compare November 7, 2024 18:35

classicsong reviewed Nov 7, 2024

View reviewed changes

sagemaker/run/gb_convert_entry.py Show resolved Hide resolved

sagemaker/run/gb_convert_entry.py Show resolved Hide resolved

Merge branch 'main' into gsf-sm-graphbolt

e47617f

classicsong approved these changes Nov 11, 2024

View reviewed changes

thvasilo merged commit 19fac3b into awslabs:main Nov 11, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

thvasilo commented Nov 1, 2024 •

edited

Loading

classicsong left a comment

jalencato left a comment

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

[SageMaker] [GraphBolt] Add support for launching GraphBolt jobs on SageMaker #1083

Conversation

thvasilo commented Nov 1, 2024 • edited Loading

classicsong left a comment

Choose a reason for hiding this comment

jalencato left a comment

Choose a reason for hiding this comment

thvasilo commented Nov 1, 2024 •

edited

Loading