Skip to content

Commit

Permalink
Add instructions for creating non-master branch build trigger (pytorc…
Browse files Browse the repository at this point in the history
…h#4969)

* Minor docs refactoring

* Add instructions on how to create experimental trigger

* Replace commit_id with commit_sha
  • Loading branch information
mateuszlewko authored May 4, 2023
1 parent 38c8002 commit 66c382e
Show file tree
Hide file tree
Showing 2 changed files with 155 additions and 71 deletions.
43 changes: 29 additions & 14 deletions infra/Terraform.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,35 @@ If not, install Terraform from the [official source](https://developer.hashicorp
## First time initialization

1. Run `gcloud auth application-default login` on your local workstation.
2. Go to the directory of the desired Terraform setup, for example
2. Go to the directory of the desired Terraform setup, for example
[`tpu-pytorch-releases`](./tpu-pytorch-releases).
3. Run `terraform init`.
4. (Optional) Consider installing
[Terraform extension for VS Code](https://marketplace.visualstudio.com/items?itemName=HashiCorp.terraform).

## Enforce entire Terraform setup
## Preview Terraform changes

1. Run `terraform apply`. Preview the planned changes.
2. Confirm planned changes by typing "yes" and pressing enter.
3. Wait for Terraform to finish provisioning resources.
1. See [First time initialization](#first-time-initialization).
2. Run `terraform plan`

## Enforce only selected resource.
Terraform will print proposed changes.

## Enforce entire Terraform setup manually

Both GCP projects contain `terraform-provision-trigger` that will automatically
enforce new Terraform setup on every push to the `master` branch.
Follow the steps below if for any reason you need to apply Terraform setup manually.

1. See [First time initialization](#first-time-initialization).
2. Run `terraform apply`. Preview the planned changes.
If you added an instance of new module you may need to run `terraform init` (it's safe to run `init` multiple times).
3. Confirm planned changes by typing "yes" and pressing enter.
4. Wait for Terraform to finish provisioning resources.

## Enforce only selected resources

1. Run `terraform apply` to preview planned changes.
2. Note the Terraform resource ID of the resource that you want to provision.
2. Take a note of the Terraform's resource ID that you want to provision.

**Example**

Expand All @@ -35,19 +50,19 @@ If not, install Terraform from the [official source](https://developer.hashicorp
# (because google_cloudbuild_trigger.trigger is not in configuration)
- resource "google_cloudbuild_trigger" "trigger" {
- create_time = "2023-04-13T10:52:58.971939642Z" -> null
...
```

`module.bazel_builds.module.cloud_build.google_cloudbuild_trigger.trigger` is
`module.bazel_builds.module.cloud_build.google_cloudbuild_trigger.trigger` is
a resource ID.

3. Run `terraform apply -target=$RESOURCE_ID`.
Verify that only the desired resource will be modified.
3. Run `terraform apply -target=$RESOURCE_ID`.
Verify that only the desired resource will be modified.
The flag can be used multiple times, also with `terraform destroy` command.

## Check if Terraform setup is fully provisioned

1. Running `terraform plan` should return empty plan if local configuration was
fully provisioned.
Terraform won't show diff in any resources that were not created by Terraform.
1. Running `terraform plan` should return an empty plan if the local configuration was
fully provisioned.
Terraform won't show diff in any resources that were not created by Terraform.
183 changes: 126 additions & 57 deletions infra/tpu-pytorch-releases/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Terraform setup for the `pytorch-xla-releases` GCP project

This setup configures all resources for building public artifacts: docker images
This setup configures all resources for building public artifacts: docker images
and python wheels.

## Cloud Build Triggers
Expand All @@ -10,51 +10,53 @@ This section explains how to add, modify and run Cloud Build triggers to:
* modify existing releases or nightly builds,
* remove old releases.

The list of Cloud Build triggers is specified in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file, in two variables
The list of Cloud Build triggers is specified in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file, in two variables
`versioned_builds` and `nightly_builds`.

These variables are consumed in the [artifacts_builds.tf](./artifacts_builds.tf) file.

Each build is associated with a separate build trigger.
Build trigger builds both docker image and Python wheels.
Each artifact is associated with a separate build trigger.
A build trigger builds both docker image and Python wheels.

* Docker images are pushed to the configured docker registry:
* Docker images are pushed to the configured docker registry:
[us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla](http://us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla)
* Python wheels are uploaded to `gs://pytorch-xla-releases/wheels/`
* Python wheels are uploaded to `gs://pytorch-xla-releases/wheels/`
([GCP link](https://pantheon.corp.google.com/storage/browser/pytorch-xla-releases/wheels)).

### Versioned releases

Versioned release builds are triggered on push to a specific `git_tag`.

Versioned release entries in the `versioned_builds` variable in
Versioned release entries in the `versioned_builds` variable in
[artifacts.auto.tfvars](./artifacts.auto.tfvars)
consists of the following fields.
* `git_tag` (string) - Git tag at which to checkout both PyTorch and PyTorch/XLA
* `git_tag` (string) - Git tag at which to checkout both PyTorch and PyTorch/XLA
sources when building image and wheels.
* `package_version` (string) - Version of the built wheels. Passed to the
* `package_version` (string) - Version of the built wheels. Passed to the
build steps.
* `accelerator` ("tpu"|"cuda") - Supported accelerator. Impacts build
process and installed dependencies.
* `python_version` (optional, string, default = "3.8") - Python version used for
the docker images base and build process.
* `accelerator` ("tpu"|"cuda") - Supported accelerator. Affects build
process and installed dependencies, see [apt.yaml](../ansible/config/apt.yaml) and
[pip.yaml](../ansible/config/pip.yaml).
* `python_version` (optional, string, default = "3.8") - Python version used for
the docker image base and build process.
* `cuda_version` (optional, string, default = "11.8") - CUDA version to install.
Used only if `accelerator` is set to "cuda"
* `arch` (optional, "amd64"|"aarch64", default = "amd64") - Architecture
influences installed dependencies and build process.
* `arch` (optional, "amd64"|"aarch64", default = "amd64") - Architecture
affects installed dependencies and build process, see [apt.yaml](../ansible/config/apt.yaml) and
[pip.yaml](../ansible/config/pip.yaml).

To modify default values see `variable "versioned_builds"` in
[artifacts_builds.tf](./artifacts_builds.tf). Modifying default values will modify
To modify default values see `variable "versioned_builds"` in
[artifacts_builds.tf](./artifacts_builds.tf). Modifying default values will modify
unset properties of existing triggers.

#### Add a new versioned release

1. Add an entry with specific git tag, accelerator, package and python versions
to the `versioned_builds` variable in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file.
See all variables in the section above.
1. Add an entry with specific `git_tag`, `accelerator`, `package_version` and
`python_version` to the `versioned_builds` variable in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file.
See all variables definitions in the section above.

**Example**

```hcl
Expand All @@ -70,76 +72,143 @@ See all variables in the section above.
# ...
]
```
2. Ensure that Terraform repo is initialized, see
[Terraform.md](../Terraform.md).
3. Run `terraform apply` and review the planned changes.
4. Types "yes" to confirm the changes. Wait for Terraform to enforce all
changes.
5. (Optional) See section "Manually trigger a Cloud Build" to manually trigger
the created build.
2. See [Preview Terraform changes](https://github.com/pytorch/xla/blob/master/infra/Terraform.md#preview-terraform-changes)
to preview proposed Terraform changes without affecting any infrastructure.
3. Commit proposed changes.
4. After successfully merge, [`terraform-provision-trigger`](https://pantheon.corp.google.com/cloud-build/builds;region=us-central1?project=tpu-pytorch-releases&pageState=(%22builds%22:(%22f%22:%22%255B%257B_22k_22_3A_22Trigger%2520Name_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22terraform-provision-trigger_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22triggerName_22%257D%255D%22)))
will run and enforce all the proposed infrastructure changes.
5. See section [Manually trigger a Cloud Build](#manually-trigger-a-cloud-build)
to manually trigger the created build and produce all the artifacts.
### Nightly releases
Nightly release are configured to build from the `master` branch once per day
at midnight (America/Los_Angeles time zone).
Nightly release are configured to build from the `master` branch once per day
at midnight (`America/Los_Angeles` time zone).
Nightly builds in the `nightly_builds` variable in
Nightly builds in the `nightly_builds` variable in
[artifacts.auto.tfvars](./artifacts.auto.tfvars)
consists of the following fields.
* `accelerator` ("tpu"|"cuda") - Supported accelerator. Impacts build
process and installed dependencies.
* `python_version` (optional, string, default = "3.8") - Python version used for
* `python_version` (optional, string, default = "3.8") - Python version used for
the docker images base and build process.
* `cuda_version` (optional, string, default = "11.8") - CUDA version to install.
Used only if `accelerator` is set to "cuda"
* `arch` (optional, "amd64"|"aarch64", default = "amd64") - Architecture
* `arch` (optional, "amd64"|"aarch64", default = "amd64") - Architecture
influences installed dependencies and build process.
Additionally, **`package_version` of all nightly builds** is configured through
Additionally, **`package_version` of all nightly builds** is configured through
a separate `nightly_package_version` variable.
To modify default values see `variable "nightly_builds"` in
[artifacts_builds.tf](./artifacts_builds.tf). Modifying default values will modify
To modify default values see `variable "nightly_builds"` in
[artifacts_builds.tf](./artifacts_builds.tf). Modifying default values will modify
unset properties of existing triggers.
#### Add a new nightly release
#### Modify or add a new nightly release
1. Modify or add an entry with specific `accelerator`, `python_version` and (optionally)
`cuda_version` to the `nightly_builds` variable in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file.
See all variables in the section above.
1. Add an entry with specific accelerator, python and (optionally) cuda version
to the `nightly_builds` variable in the
[artifacts.auto.tfvars](./artifacts.auto.tfvars) file.
See all variables in the section above.

**Example**
```hcl
nightly_builds = [
{
accelerator = "cuda"
cuda_version = "11.8" # optional
python_version = "3.8" # optional
python_version = "3.8" # optional
arch = "amd64" # optional
},
# ...
]
```
2. Ensure that Terraform repo is initialized, see
[Terraform.md](../Terraform.md).
3. Run `terraform apply` and review the planned changes.
4. Types "yes" to confirm the changes. Wait for Terraform to enforce all
changes.
5. (Optional) See section "Manually trigger a Cloud Build" to manually trigger
the created build. Nightly build will be triggered automatically at midnight.
2. See [Preview Terraform changes](https://github.com/pytorch/xla/blob/master/infra/Terraform.md#preview-terraform-changes)
to preview proposed Terraform changes without affecting any infrastructure.
3. Commit proposed changes.
4. After successfully merge, [`terraform-provision-trigger`](https://pantheon.corp.google.com/cloud-build/builds;region=us-central1?project=tpu-pytorch-releases&pageState=(%22builds%22:(%22f%22:%22%255B%257B_22k_22_3A_22Trigger%2520Name_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22terraform-provision-trigger_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22triggerName_22%257D%255D%22)))
will run and enforce all the proposed infrastructure changes.
5. See section [Manually trigger a Cloud Build](#manually-trigger-a-cloud-build)
to manually trigger the created build and produce all the artifacts.
### Manually trigger a Cloud Build
1. Go to [Cloud Build > Triggers](https://pantheon.corp.google.com/cloud-build/triggers;region=us-central1?project=tpu-pytorch-releases) page in GCP.
2. Click "RUN" on the desired triggered.
**Note:** "Branch" input in the "Run trigger" window is irrelevant, since
Ansible setup and repository sources will be fetched at revisions specified
2. Click "RUN" on the desired triggered.
**Note:** "Branch" input in the "Run trigger" window is irrelevant, since
Ansible setup and repository sources will be fetched at revisions specified
in [artifacts_builds.tf](./artifacts_builds.tf).
3. Click "Run Trigger"
4. Go to [History] (https://pantheon.corp.google.com/cloud-build/builds;region=us-central1?project=tpu-pytorch-releases)
4. Go to [History](https://pantheon.corp.google.com/cloud-build/builds;region=us-central1?project=tpu-pytorch-releases)
to see status of the triggered builds.
### Create experimental trigger for non-master branch
1. Add a new instance of `xla_docker_build` module to [artifacts_builds.tf](./artifacts_builds.tf)
(or any other or new file within that directory - Terraform reads automatically all top-level
files from the setup directory).
**Example**
```hcl
module "my_branch" {
source = "../terraform_modules/xla_docker_build"
ansible_vars = merge(each.value, {
pytorch_git_rev = "main"
# Fetch XLA sources from "my-branch".
# You can also use any git revision (e.g. tag), or
# "$COMMIT_SHA" to fetch the sources at the same commit that
# the Build was triggered.
xla_git_rev = "my-branch"
})
# Fetch Ansible configuration from "my-branch".
ansible_branch = "my-branch"
# Build will be triggered on every push to "my-branch".
trigger_on_push = {
branch = "my-branch"
}
# Trigger name in GCP.
trigger_name = "trigger-for-my-branch"
# Remove `image_name` and `image_tags` if you don't want to
# upload any docker images
image_name = "my-experimental-image"
image_tags = ["$COMMIT_ID", "latest"]
description = "Experimental trigger for my-branch"
# Remove `wheels_dest` and `wheels_srcs` if you don't want to
# upload any Python wheels.
wheels_dest = "${module.releases_storage_bucket.url}/wheels/experimental/my-branch-name"
wheels_srcs = ["/dist/*.whl"]
# Passed directly to ../ansible/Dockerfile.
build_args = {
python_version = "3.8" # Default, can be removed.
}
worker_pool_id = module.worker_pool.id
# Remove or change to a different docker registry.
docker_repo_url = module.docker_registry.url
}
```
2. Create the trigger in GCP. Complete either of the steps below.
a) Either commit and merge the changed Terraform setup
to master to get it automatically applied, or
b) apply manually only the newly created
resource, see
[Enforce only selected resource](https://github.com/pytorch/xla/blob/master/infra/Terraform.md#enforce-only-selected-resource) (this requires appropriate permissions in GCP).

0 comments on commit 66c382e

Please sign in to comment.