Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm runner #1262

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 149 additions & 16 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ on:
branches:
- "pull-request/[0-9]+"
- main
- ARM_Runner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed. Our GitHub actions run on PRs, so the changes you are making we can test via this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JunAr7112 let's remove this.

- release-*

jobs:
Expand Down Expand Up @@ -114,9 +115,9 @@ jobs:
- run: make docker-build

### Image builds ###
build-gpu-operator:
build-gpu-operator-arm:
needs: [go-check, go-test, go-build]
runs-on: ubuntu-latest
runs-on: ubuntu-24.04-arm
strategy:
matrix:
dist: [ubi9]
Expand All @@ -133,6 +134,7 @@ jobs:
echo "LABEL_IMAGE_SOURCE=https://github.com/${REPO_FULL_NAME}" >> $GITHUB_ENV

GENERATE_ARTIFACTS="false"
cdesiniotis marked this conversation as resolved.
Show resolved Hide resolved
#NOT_USING_ARM="false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

if [[ "${{ github.actor }}" == "dependabot[bot]" ]]; then
GENERATE_ARTIFACTS="false"
elif [[ "${{ github.event_name }}" == "pull_request" && "${{ github.event.pull_request.head.repo.full_name }}" == "${{ github.repository }}" ]]; then
Expand All @@ -141,11 +143,7 @@ jobs:
GENERATE_ARTIFACTS="true"
fi
echo "PUSH_ON_BUILD=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
echo "BUILD_MULTI_ARCH_IMAGES=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master
echo "DOCKER_BUILD_PLATFORM_OPTIONS=--platform=linux/arm64" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
Expand All @@ -157,11 +155,57 @@ jobs:
- name: Build image
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/gpu-operator
VERSION: ${COMMIT_SHORT_SHA}
VERSION: ${COMMIT_SHORT_SHA}-arm
run: |
echo "${VERSION}"
make build-${{ matrix.dist }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- do we need to invoke the push target after this to ensure the tag gets pushed to the ghcr.io registry?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the push target gets invoked in the Makefile via its inclusion in native-only.mk.

build-gpu-operator-validator-arm:
needs: [go-check, go-test, go-build]
runs-on: ubuntu-24.04-arm
strategy:
matrix:
dist: [ubi9]
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
REPO_FULL_NAME="${{ github.event.pull_request.head.repo.full_name }}"
echo "${REPO_FULL_NAME}"
echo "LABEL_IMAGE_SOURCE=https://github.com/${REPO_FULL_NAME}" >> $GITHUB_ENV

GENERATE_ARTIFACTS="false"
if [[ "${{ github.actor }}" == "dependabot[bot]" ]]; then
GENERATE_ARTIFACTS="false"
elif [[ "${{ github.event_name }}" == "pull_request" && "${{ github.event.pull_request.head.repo.full_name }}" == "${{ github.repository }}" ]]; then
GENERATE_ARTIFACTS="true"
elif [[ "${{ github.event_name }}" == "push" ]]; then
GENERATE_ARTIFACTS="true"
fi
echo "PUSH_ON_BUILD=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
echo "DOCKER_BUILD_PLATFORM_OPTIONS=--platform=linux/arm64" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build image
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/gpu-operator/gpu-operator-validator
VERSION: ${COMMIT_SHORT_SHA}-arm
SUBCOMPONENT: validator
run: |
echo "${VERSION}"
make build-${{ matrix.dist }}
build-gpu-operator-validator:

### Image builds ###
build-gpu-operator-amd:
needs: [go-check, go-test, go-build]
runs-on: ubuntu-latest
strategy:
Expand All @@ -188,11 +232,51 @@ jobs:
GENERATE_ARTIFACTS="true"
fi
echo "PUSH_ON_BUILD=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
echo "BUILD_MULTI_ARCH_IMAGES=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
echo "DOCKER_BUILD_PLATFORM_OPTIONS=--platform=linux/amd64" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
image: tonistiigi/binfmt:master
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build image
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/gpu-operator
VERSION: ${COMMIT_SHORT_SHA}-amd
run: |
echo "${VERSION}"
make build-${{ matrix.dist }}

build-gpu-operator-validator-amd:
needs: [go-check, go-test, go-build]
runs-on: ubuntu-latest
strategy:
matrix:
dist: [ubi9]
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
REPO_FULL_NAME="${{ github.event.pull_request.head.repo.full_name }}"
echo "${REPO_FULL_NAME}"
echo "LABEL_IMAGE_SOURCE=https://github.com/${REPO_FULL_NAME}" >> $GITHUB_ENV

GENERATE_ARTIFACTS="false"
if [[ "${{ github.actor }}" == "dependabot[bot]" ]]; then
GENERATE_ARTIFACTS="false"
elif [[ "${{ github.event_name }}" == "pull_request" && "${{ github.event.pull_request.head.repo.full_name }}" == "${{ github.repository }}" ]]; then
GENERATE_ARTIFACTS="true"
elif [[ "${{ github.event_name }}" == "push" ]]; then
GENERATE_ARTIFACTS="true"
fi
echo "PUSH_ON_BUILD=${GENERATE_ARTIFACTS}" >> $GITHUB_ENV
echo "DOCKER_BUILD_PLATFORM_OPTIONS=--platform=linux/amd64" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
Expand All @@ -204,15 +288,64 @@ jobs:
- name: Build image
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/gpu-operator/gpu-operator-validator
VERSION: ${COMMIT_SHORT_SHA}
VERSION: ${COMMIT_SHORT_SHA}-amd
SUBCOMPONENT: validator
run: |
echo "${VERSION}"
make build-${{ matrix.dist }}


### MULTI-ARCH-IMAGES test ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update the comment or remove it.

build-multi-arch-image:
needs: [build-gpu-operator-arm, build-gpu-operator-validator-arm, build-gpu-operator-amd, build-gpu-operator-validator-amd]
runs-on: ubuntu-latest
strategy:
matrix:
dist: [ubi9]
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
REPO_FULL_NAME="${{ github.event.pull_request.head.repo.full_name }}"
echo "${REPO_FULL_NAME}"
echo "LABEL_IMAGE_SOURCE=https://github.com/${REPO_FULL_NAME}" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need docker buildx for this job.

- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build Manifest
env:
LOWERCASE_REPO_OWNER: ${{ env.LOWERCASE_REPO_OWNER }}
COMMIT_SHORT_SHA: ${{ env.COMMIT_SHORT_SHA }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lines needed? It looks like you are already adding this variables to the environment in a prior step.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove these lines

IMAGE_ID_ARM: ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/gpu-operator:${{ env.COMMIT_SHORT_SHA }}-arm
IMAGE_ID_AMD: ghcr.io/${{ env.LOWERCASE_REPO_OWNER}}/gpu-operator:${{ env.COMMIT_SHORT_SHA }}-amd
IMAGE_ID_ARM_VAL: ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/gpu-operator/gpu-operator-validator:${{ env.COMMIT_SHORT_SHA }}-arm
IMAGE_ID_AMD_VAL: ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/gpu-operator/gpu-operator-validator:${{ env.COMMIT_SHORT_SHA }}-amd
MANIFEST: ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/gpu-operator:${{ env.COMMIT_SHORT_SHA }}
MANIFEST_VAL: ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/gpu-operator/gpu-operator-validator:${{ env.COMMIT_SHORT_SHA }}
run: |
docker manifest create \
${MANIFEST} \
${IMAGE_ID_AMD} \
${IMAGE_ID_ARM}
docker manifest push ${MANIFEST}
docker manifest create \
${MANIFEST_VAL} \
${IMAGE_ID_AMD_VAL} \
${IMAGE_ID_ARM_VAL}
docker manifest push ${MANIFEST_VAL}

### e2e tests ###
e2e-tests-containerd:
needs: [build-gpu-operator, build-gpu-operator-validator]
needs: [build-multi-arch-image]
runs-on: linux-amd64-cpu4
steps:
- uses: actions/checkout@v4
Expand Down Expand Up @@ -264,7 +397,7 @@ jobs:
retention-days: 15

e2e-tests-nvidiadriver:
needs: [build-gpu-operator, build-gpu-operator-validator]
needs: [build-multi-arch-image]
runs-on: linux-amd64-cpu4
steps:
- uses: actions/checkout@v4
Expand Down
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

BUILD_MULTI_ARCH_IMAGES ?= no
BUILD_MULTI_ARCH_IMAGES ?= false
DOCKER ?= docker
GO_CMD ?= go
PROJECT_DIR := $(shell dirname $(abspath $(lastword $(MAKEFILE_LIST))))
Expand Down Expand Up @@ -256,6 +256,8 @@ coverage: unit-test
cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
go tool cover -func=$(COVERAGE_FILE).no-mocks



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded newlines.

##### Public rules #####
DISTRIBUTIONS := ubi9
DEFAULT_PUSH_TARGET := ubi9
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ spec:
- name: vfio-manager-image
image: nvcr.io/nvidia/cuda@sha256:b24e555dce8b3e4a3b50152cc10ec6739691839a04f893de1428045c032db940
- name: sandbox-device-plugin-image
image: nvcr.io/nvidia/kubevirt-gpu-device-plugin@sha256:4ffa1cd2a6497eb647a89ed259dcfb007554737b9d80f69bc173a2c3cd72a1da
image: nvcr.io/nvidia/kubevirt-gpu-device-plugin@sha256:cb32ded7a0057efbc1cf0a468cf9c775c334e18e48c3a101360b1c59637388ae
- name: vgpu-device-manager-image
image: nvcr.io/nvidia/cloud-native/vgpu-device-manager@sha256:7edd7a0413dcb39b6e3bcefaf06812f3293c8e480ca10783e821a561ed686200
- name: gdrcopy-image
Expand Down Expand Up @@ -893,7 +893,7 @@ spec:
- name: "VFIO_MANAGER_IMAGE"
value: "nvcr.io/nvidia/cuda@sha256:b24e555dce8b3e4a3b50152cc10ec6739691839a04f893de1428045c032db940"
- name: "SANDBOX_DEVICE_PLUGIN_IMAGE"
value: "nvcr.io/nvidia/kubevirt-gpu-device-plugin@sha256:4ffa1cd2a6497eb647a89ed259dcfb007554737b9d80f69bc173a2c3cd72a1da"
value: "nvcr.io/nvidia/kubevirt-gpu-device-plugin@sha256:cb32ded7a0057efbc1cf0a468cf9c775c334e18e48c3a101360b1c59637388ae"
- name: "VGPU_DEVICE_MANAGER_IMAGE"
value: "nvcr.io/nvidia/cloud-native/vgpu-device-manager@sha256:7edd7a0413dcb39b6e3bcefaf06812f3293c8e480ca10783e821a561ed686200"
- name: "GDRCOPY_IMAGE"
Expand Down
2 changes: 1 addition & 1 deletion deployments/gpu-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -536,7 +536,7 @@ sandboxDevicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: kubevirt-gpu-device-plugin
version: v1.2.10
version: v1.3.0
imagePullPolicy: IfNotPresent
imagePullSecrets: []
args: []
Expand Down
2 changes: 1 addition & 1 deletion multi-arch.mk
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
PUSH_ON_BUILD ?= false
ATTACH_ATTESTATIONS ?= false
DOCKER_BUILD_OPTIONS = --output=type=image,push=$(PUSH_ON_BUILD) --provenance=$(ATTACH_ATTESTATIONS) --sbom=$(ATTACH_ATTESTATIONS)
DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64,linux/arm64
DOCKER_BUILD_PLATFORM_OPTIONS ?= --platform=linux/amd64,linux/arm64

REGCTL ?= regctl
$(PUSH_TARGETS): push-%:
Expand Down
3 changes: 1 addition & 2 deletions native-only.mk
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64

DOCKER_BUILD_OPTIONS = --output=type=image,push=$(PUSH_ON_BUILD) --provenance=$(ATTACH_ATTESTATIONS) --sbom=$(ATTACH_ATTESTATIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I commented out the DOCKER_BUILD_PLATFORM_OPTIONS because we were already specifying it in the workflow in ci.yaml. I added the DOCKER_BUILD_OPTIONS to store the image so we can build the Manifest in the next job.

$(PUSH_TARGETS): OUT_IMAGE ?= $(IMAGE_NAME):$(IMAGE_TAG)
$(PUSH_TARGETS): push-%:
$(DOCKER) tag "$(IMAGE_NAME):$(VERSION)-$(DEFAULT_PUSH_TARGET)" "$(OUT_IMAGE)"
Expand Down
5 changes: 3 additions & 2 deletions validator/native-only.mk
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
DOCKER_BUILD_PLATFORM_OPTIONS ?= --platform=linux/amd64
DOCKER_BUILD_OPTIONS = --output=type=image,push=$(PUSH_ON_BUILD) --provenance=$(ATTACH_ATTESTATIONS) --sbom=$(ATTACH_ATTESTATIONS)

$(PUSH_TARGETS): push-%:
$(DOCKER) push "$(IMAGE_NAME):$(IMAGE_TAG)"

push-short:
$(DOCKER) tag "$(IMAGE_NAME):$(VERSION)-$(DEFAULT_PUSH_TARGET)" "$(IMAGE_NAME):$(VERSION)"
$(DOCKER) push "$(IMAGE_NAME):$(VERSION)"
$(DOCKER) push "$(IMAGE_NAME):$(VERSION)"