Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] cugraph-gnn v25.02 #139

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

[RELEASE] cugraph-gnn v25.02 #139

wants to merge 37 commits into from

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-25.02 and v25.02 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.02 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.02 into main for the release

raydouglass and others added 30 commits November 18, 2024 09:46
Forward-merge branch-24.12 into branch-25.02
Forward-merge branch-24.12 into branch-25.02
Forward-merge branch-24.12 into branch-25.02
Forward-merge branch-24.12 into branch-25.02
Adds a workflow that triggers a second workflow which sends a
notification to a designated Slack channel on every PR labelled with
breaking, whenever any of the following events are triggered on the PR:

- closed
- reopened
- labeled
- unlabeled

Depends on rapidsai/shared-workflows#257
By default, CI runs on draft PRs. This leads to many CI runs that may be
unnecessary.

With this PR's change to `.github/copy-pr-bot.yaml`, an `/ok to test`
comment from a trusted user is required to trigger CI on draft PRs.
Non-draft PRs will run CI by default, assuming that all commits are
signed by trusted users. Otherwise an `/ok to test` is required (as
before) -- see the `copy-pr-bot` docs at
https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/ for more
information.

Part of rapidsai/build-planning#123.
Forward-merge branch-24.12 into branch-25.02
All of this repo's `conda-python-tests` jobs have conditions in them like "skip on ARM":

https://github.com/rapidsai/cugraph-gnn/blob/2dd300122dfd6fdea70c9d20c276a3c5946b7613/ci/test_python.sh#L100

https://github.com/rapidsai/cugraph-gnn/blob/2dd300122dfd6fdea70c9d20c276a3c5946b7613/ci/test_python.sh#L141

https://github.com/rapidsai/cugraph-gnn/blob/2dd300122dfd6fdea70c9d20c276a3c5946b7613/ci/test_python.sh#L183

As a result, right now the arm64 `conda-python-tests` jobs are just wasting CI resources... they're spending ~40+~ 5-10 minutes occupying a GPU runner just to download some datasets and then exit ([example build link](https://github.com/rapidsai/cugraph-gnn/actions/runs/11858773988/job/33056063652?pr=69)).

This proposes never even starting those jobs, to make CI here less expensive.

## Notes for Reviewers

### But why are we skipping arm at all?

Lack of pytorch packages. See #61 (comment)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #70
Follow-up to these PRs:

* rapidsai/devcontainers#417
* #68

Proposes adding devcontainers and a devcontainers CI job to the repo.

## Notes for Reviewers

### Benefits of these changes

* faster and easier local development
* reduced risk of changes here breaking the RAPIDS unified devcontainers maintained in https://github.com/rapidsai/devcontainers

Similar to rapidsai/nx-cugraph#25

### How I made these changes

Copied the `.devcontainer/` directory from https://github.com/rapidsai/cugraph, then just changed `cugraph` references to `cugraph-gnn`.

### How I tested this

Tested the `update-version.sh` changes like this:

```shell
./ci/release/update-version.sh '25.04.00'
git grep -E '25\.[0-9]+'
```

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #79
Forward-merge branch-24.12 into branch-25.02
Update version references in breaking-change trigger workflow

Authors:
  - Jake Awe (https://github.com/AyodeAwe)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #93
Fixes #94

Uploads API docs for `libwholegraph`, to be used by rapidsai/cugraph-docs#46

Also removes `sphinx` dependencies... this repo only needs to produce Doxygen docs for `libwholegraph`, all the other Sphinx stuff will be done in https://github.com/rapidsai/cugraph-docs.

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Don Acosta (https://github.com/acostadon)

Approvers:
  - Don Acosta (https://github.com/acostadon)
  - Bradley Dice (https://github.com/bdice)

URL: #96
The branch build triggered by merging #96 failed immediately.

> The workflow is not valid. .github/workflows/build.yaml (Line: 47, Col: 12): Job 'docs-build' depends on unknown job 'conda-cpp-build'.

([build link](https://github.com/rapidsai/cugraph-gnn/actions/runs/12379736454))

This fixes that.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Don Acosta (https://github.com/acostadon)
  - Bradley Dice (https://github.com/bdice)

URL: #97
Proposes miscellaneous small changes:

* removes unused dependency groups in `dependencies.yaml`
* removes commented-out CMake code
* fixes lingering references to things from the `cugraph` repo

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/linhu-nv
  - Ray Douglass (https://github.com/raydouglass)

URL: #98
…102)

Proposes some miscellaneous packaging cleanup:

* sets minimum CMake version to 3.26.4 everywhere, to match the rest of RAPIDS
* removes commented-out CMake code
* removes unnecessary variables throughout CMake code
  - *including consolidating version references to use `RAPIDS_VERSION` from https://github.com/rapidsai/cugraph-gnn/blob/af22a1271251dc6b02d91cd593ac32b504356b8d/rapids_config.cmake#L20*
* updates some `pre-commit` hooks to their latest versions

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/linhu-nv
  - Bradley Dice (https://github.com/bdice)

URL: #102
`wholgraph`'s CMake has some configuration to run `flake8`, `clang-tidy`, and `clang-foramt` via CMake.

This proposes removing it.

## Notes for Reviewers

### Risks to doing this?

I don't think any.

`flake8` and `clang-format` configs are unnecessary, as those are already run via `pre-commit` here:

https://github.com/rapidsai/cugraph-gnn/blob/71675d868589ff9f904197f729985de1555cb914/.pre-commit-config.yaml#L22

https://github.com/rapidsai/cugraph-gnn/blob/71675d868589ff9f904197f729985de1555cb914/.pre-commit-config.yaml#L37

The `clang-tidy` support must not actually be used today... it refers to a script that doesn't exist in this repo:

https://github.com/rapidsai/cugraph-gnn/blob/71675d868589ff9f904197f729985de1555cb914/cpp/cmake/CodeChecker.cmake#L42-L43

This code had been in the `wholegraph` repo for a while... it was added in June 2023 (rapidsai/wholegraph#24) and then never modified again.

### Benefits of doing this?

Similar to #102, I'm putting up PRs like this because I'm planning to attempt to add `libwholegraph` wheels, and want to simplify the `wholegraph` / `pylibwholegraph` CMake as much as possible before doing that, to reduce the implementation and reviewing effort.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/linhu-nv

URL: #103
The nightly tests have been failing despite all runs succeeding because the workflow's logic for filtering out notebook runs is invalid. Examples: https://github.com/rapidsai/cugraph-gnn/actions/runs/12649473784, https://github.com/rapidsai/cugraph-gnn/actions/runs/12609514484. Hopefully this change is sufficient to get the nightly suite passing.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #105
The pull-request input is simply wrong, while the other inputs are necessary to pull the correct artifacts.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #106
Contributes to rapidsai/build-planning#127

This PR cannot be merged unless nightly CI has passed within the past 7 days, so if it remains unmerged that will itself be an indication that nightly CI needs fixing.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #100
Removes the build directory from `cugraph-pyg` which should not have been committed.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Tingyu Wang (https://github.com/tingyu66)

URL: #107
Allows sampling of heterogeneous graphs.

Removes unbuffered sampling from the PyG examples and completely disables it in DGL.  A future PR will completely drop PyG support for unbuffered sampling, and a future `cugraph` PR will drop support for unbuffered sampling in the distributed sampler.

Merge after rapidsai/cugraph#4795

Closes rapidsai/cugraph#4402

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)
  - James Lamb (https://github.com/jameslamb)

URL: #82
Proposes some simplifications for `wholegraph` CMake:

* removes code adding timing information via `RULE_LAUNCH_COMPILE` and `RULE_LAUNCH_LINK`
  - *these are internal to `ctest`, per https://cmake.org/cmake/help/latest/prop_dir/RULE_LAUNCH_LINK.html*
* removes `find_package(Python)` and related code in `pylibwholegraph`
  - *this is already handled by `rapids_cython_init()`: https://github.com/rapidsai/cugraph-gnn/blob/87455cfedcc6721f24c783ba555af14a9a180624/python/pylibwholegraph/CMakeLists.txt#L119-L120*

## Notes for Reviewers

### Benefits of doing this?

Similar to #102 and #103, I'm putting up PRs like this because I'm planning to attempt to add libwholegraph wheels, and want to simplify the wholegraph / pylibwholegraph CMake as much as possible before doing that, to reduce the implementation and reviewing effort.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/linhu-nv

URL: #109
conda-forge is using GCC 13 for CUDA 12 builds. This PR updates CUDA 12 conda builds to use GCC 13, for alignment.

These PRs should be merged in a specific order, see rapidsai/build-planning#129 for details.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #108
#111)

Part of rapidsai/build-planning#136, which tracks some building/packaging simplifications and conventions we'd like to standardize across RAPIDS.

This proposes the following:

* using `cmake-format` to autoformat CMake code
* using `cmake-lint` to enforce style preferences for CMake code
* removing unnecessary use of `-DDETECT_CONDA_ENV` for wheel builds
* explicitly passing package type to GitHub Actions / `gha-tools` things handling wheels

## Notes for Reviewers

The `cmake-format` / `cmake-lint` approach was copied directly from RAFT:

* https://github.com/rapidsai/raft/blob/596d4b7338e62a92652503cd76feaeaa187ad740/.pre-commit-config.yaml#L52
* https://github.com/rapidsai/raft/blob/596d4b7338e62a92652503cd76feaeaa187ad740/cpp/cmake/config.json
* https://github.com/rapidsai/raft/blob/596d4b7338e62a92652503cd76feaeaa187ad740/cpp/scripts/run-cmake-format.sh

Other RAPIDS projects ([like cuDF](https://github.com/rapidsai/cudf/blob/1f0f51f96b79edd820e81343ca521c684b1f4918/.pre-commit-config.yaml#L97)) do this the same way.

All formatting-only changes to CMake in this PR were made automatically by `cmake-foramt`.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - https://github.com/linhu-nv

URL: #111
Contributes to rapidsai/build-planning#138

Updates to using UCX 1.18 in pip devcontainers here.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/jakirkham

URL: #112
Adds support for PyG 2.6 in cuGraph-PyG.  The primary change is updating the examples so they fully specify all tensors, since partial specification is no longer allowed.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Tingyu Wang (https://github.com/tingyu66)

URL: #114
bdice and others added 7 commits January 29, 2025 06:26
This PR uses CUDA 12.8.0 to build and test.

xref: rapidsai/build-planning#139

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #115
This PR points the shared workflow branches back to the default 25.02
branches.

xref: rapidsai/build-planning#139
Adds a heterogeneous link prediction example for cuGraph-PyG that uses the Taobao dataset.  Loosely based on the Taobao example from the PyG repository.

Adds ability to specify fanout as a dictionary to better align with PyG API.

Fixes a bug where the number of negative samples was calculated incorrectly, causing additional unwanted negative samples to be generated.

Updates the negative sampling call to match the new behavior added in rapidsai/cugraph#4885

Merge after rapidsai/cugraph#4898

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #104
Adds an example for MNMG PyTorch/NCCL renumbering.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)

URL: #101
Now that all features supported by the Dask API are available in the new API, we are deprecating the Dask API.  It will be removed in release 25.06.

Merge after #104 
Closes #86

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)

URL: #118
quick fix to create_node_classification function: use data_and_label dict as the parameter instead of pickle_data_path

Authors:
  - https://github.com/linhu-nv

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #128
Uses a retry wrapper for `pip` commands to try to alleviate CI failures due to hash mismatches that result from network hiccups

xref rapidsai/build-planning#148

This will retry failures that show up in CI like:

```
   Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0)
    Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB)
       ━━━━━━━━━━━━━━━━━━━━━                 350.2/604.9 MB 229.2 MB/s eta 0:00:02
  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
      nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0):
          Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3
               Got        849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be
```

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Bradley Dice (https://github.com/bdice)

URL: #133
@raydouglass raydouglass requested review from a team as code owners February 7, 2025 19:29
@raydouglass raydouglass requested review from KyleFromNVIDIA and removed request for a team February 7, 2025 19:29
Copy link
Contributor

@linhu-nv linhu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good to me. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants