-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automating initialization of on-demand self-hosted CNCF CIL runner #39
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
name: Configurable Benchmark Test on Self-hosted Runner | ||
on: | ||
workflow_dispatch: | ||
inputs: | ||
profile_name: | ||
description: "performance profile to use" | ||
required: false | ||
profile_filename: | ||
description: "test configuration file" | ||
required: false | ||
service_mesh: | ||
type: choice | ||
required: false | ||
description: "service mesh being tested" | ||
options: | ||
- istio | ||
- linkerd | ||
load_generator: | ||
type: choice | ||
required: false | ||
description: "load generator to run tests with" | ||
options: | ||
- fortio | ||
- wrk2 | ||
- nighthawk | ||
|
||
jobs: | ||
start-runner: | ||
name: Start self-hosted CNCF CIL runner | ||
runs-on: ubuntu-latest | ||
if: ${{ github.event_name == 'workflow_dispatch' }} | ||
outputs: | ||
hostname: ${{ steps.start-cil-runner.outputs.hostname }} | ||
label: ${{ steps.start-cil-runner.outputs.label }} | ||
device_id: ${{ steps.start-cil-runner.outputs.device_id }} | ||
steps: | ||
- name: Checkout Code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Configure CNCF CIL credentials | ||
run: | | ||
chmod +x .github/workflows/scripts/self-hosted-credentails.sh | ||
.github/workflows/scripts/self-hosted-credentails.sh ${{ secrets.CNCF_CIL_TOKEN }} | ||
shell: bash | ||
|
||
- name: Create registration token for CNCF CIL runner | ||
id: getRegToken | ||
run: | | ||
reg_token=$(curl -s -X POST -H "Accept: application/vnd.github.v3+json" \ | ||
-H 'Authorization: token ${{ secrets.PAT }}' \ | ||
https://api.github.com/repos/${{github.repository}}/actions/runners/registration-token | jq -r .token) | ||
echo REG_TOKEN=$reg_token >> $GITHUB_ENV | ||
echo REPOSITORY=${{github.repository}} >> $GITHUB_ENV | ||
shell: bash | ||
|
||
- name: Start CNCF CIL runner | ||
id: start-cil-runner | ||
run: | | ||
chmod +x .github/workflows/scripts/start-cil-runner.sh | ||
.github/workflows/scripts/start-cil-runner.sh ${{ secrets.cncf_cil_token }} ${{ github.event.inputs.service_mesh }}-${{ github.event.inputs.load_generator }} | ||
shell: bash | ||
|
||
run-benchmarks: | ||
name: Run the configurable benchmarks on the runner | ||
needs: | ||
- start-runner # required to start the main job when the runner is ready | ||
runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner | ||
steps: | ||
- name: Install dependencies | ||
run: | | ||
echo "Current user: $(whoami)" | ||
echo "Installing kubectl..." | ||
curl -LO https://dl.k8s.io/release/v1.23.2/bin/linux/amd64/kubectl | ||
sudo install -o smp -g smp -m 0755 kubectl /usr/local/bin/kubectl | ||
echo "Installing docker..." | ||
sudo apt update -y | ||
sudo apt install -y jq unzip apt-transport-https ca-certificates software-properties-common | ||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - | ||
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable" | ||
sudo apt-cache policy docker-ce | ||
sudo apt install -y docker-ce | ||
sudo systemctl status docker | ||
|
||
- name: Setup Kubernetes | ||
uses: manusa/[email protected] | ||
with: | ||
minikube version: 'v1.23.2' | ||
kubernetes version: 'v1.23.2' | ||
driver: docker | ||
|
||
- name: Checkout Code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Install Service Mesh and Deploy Application | ||
run: | | ||
chmod +x .github/workflows/scripts/${{ github.event.inputs.service_mesh }}_deploy.sh | ||
.github/workflows/scripts/${{ github.event.inputs.service_mesh }}_deploy.sh | ||
shell: bash | ||
|
||
- name: Run Benchmark Tests | ||
uses: layer5io/meshery-smp-action@self-hosted | ||
with: | ||
provider_token: ${{ secrets.MESHERY_TOKEN }} | ||
platform: docker | ||
profile_name: ${{ github.event.inputs.profile_name }} | ||
profile_filename: ${{ github.event.inputs.profile_filename }} | ||
endpoint_url: ${{env.ENDPOINT_URL}} | ||
service_mesh: ${{env.SERVICE_MESH}} | ||
load_generator: ${{ github.event.inputs.load_generator }} | ||
test_name: '${{ github.event.inputs.service_mesh }}-${{ github.event.inputs.load_generator }}-${{ github.event.inputs.profile_filename }}${{ github.event.inputs.profile_name }}' | ||
|
||
stop-runner: | ||
name: Stop self-hosted runner | ||
needs: | ||
- start-runner # required to get output from the start-runner job | ||
- run-benchmarks # required to wait when the main job is done | ||
runs-on: ubuntu-latest | ||
if: ${{ always() }} # required to stop the runner even if the error happened in the previous jobs | ||
steps: | ||
- name: Checkout Code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Stop CNCF CIL runner | ||
run: | | ||
chmod +x .github/workflows/scripts/stop-cil-runner.sh | ||
.github/workflows/scripts/stop-cil-runner.sh ${{ secrets.cncf_cil_token }} ${{ needs.start-runner.outputs.device_id }} ${{ needs.start-runner.outputs.hostname }} | ||
shell: bash | ||
|
||
- name: Remove CNCF CIL runner from github repository | ||
run: | | ||
runner_id=$(curl -s -H 'Authorization: token ${{ secrets.PAT }}' -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/${{github.repository}}/actions/runners | jq '.runners[] | select(.name == "${{ needs.start-runner.outputs.hostname }}") | {id}' | jq .id) | ||
curl -X DELETE -H 'Authorization: token ${{ secrets.PAT }}' -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/${{github.repository}}/actions/runners/$runner_id | ||
shell: bash |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/usr/bin/env bash | ||
|
||
# This script is used to configure and verify the token of self-hosted runner | ||
# Token must be set as a github repo secret named "CNCF_CIL_TOKEN" | ||
|
||
token=$1 | ||
|
||
# https://metal.equinix.com/developers/api/authentication/#authentication | ||
result=$(curl -I -s -w %{http_code} -o /dev/null -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1) | ||
if [[ $result != "200" ]]; then | ||
echo "ERROR: Failed to authenticate the CNCF CIL token" | ||
exit 1 | ||
fi | ||
echo "Authenticate CNCF CIL token sucessfully!" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#!/usr/bin/env bash | ||
|
||
# This script is used to start a CNCF CIL runner | ||
|
||
token=$1 | ||
hostname=$2 | ||
|
||
# Generate random number from datastamp as the hostname of runner | ||
label=$(date +%N) | ||
|
||
hostname="$hostname-$label" | ||
echo "Creating CNCF CIL machine: $hostname..." | ||
|
||
# Use user_data_scripts to register the CNCF CIL runner as a self-hosted runner | ||
user_data_scripts="#cloud-config\nusers:\n - default\n - name: smp\n groups: sudo, docker\n sudo: ALL=(ALL) NOPASSWD:ALL\n lock_passwd: true\nruncmd:\n - [runuser, -l, smp, -c, \'mkdir actions-runner && cd actions-runner\']\n - [runuser, -l, smp, -c, \'curl -o actions-runner-linux-x64-2.287.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.287.1/actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'tar xzf ./actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'export RUNNER_ALLOW_RUNASROOT=1\']\n - [runuser, -l, smp, -c, \'./config.sh --url https://github.com/$REPOSITORY --token $REG_TOKEN --labels $hostname >> github-action-registeration.log\']\n - [runuser, -l, smp, -c, \'./run.sh >> github-action-registeration.log\']" | ||
|
||
# TODO: the options "operating_system", "facility", "plan" are hardcoded now, we should make them configurable | ||
# https://metal.equinix.com/developers/api/devices/#devices-createdevice | ||
device_id=$(curl -X POST -H "X-Auth-Token: $token" -s -H "Content-Type: application/json" \ | ||
-d '{"operating_system": "ubuntu_20_04", "facility": "da11", "plan": "c3.small.x86", "hostname": "'"${hostname}"'", "userdata": "'"${user_data_scripts}"'"}' \ | ||
https://api.equinix.com/metal/v1/projects/96a9d336-541b-42f7-9827-d845010da550/devices | jq -r .id) | ||
if [[ -z $device_id ]]; then | ||
echo "ERROR: Failed to create CNCF CIL machine: $hostname..." | ||
exit 1 | ||
fi | ||
|
||
# Wait 10 minutes until the machine is running | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason to wait 10 minutes? Is standard waiting time for an instance to come up mentioned somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no standard waiting time in my opinion. My thought about waiting 10 minutes is that if it takes too long to wait for machine to be running, there must be something wrong with it.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gyohuangxin this new flow makes a lot of sense. But let's capture this in another issue. There might be slight experimentation required here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand this issue and will be creating an issue ticket for this one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, please go ahead. |
||
echo "Waiting for $hostname to run..." | ||
n=0 | ||
while [[ $n -le 10 ]] | ||
do | ||
if [[ $n -eq 10 ]]; then | ||
echo "Waiting too long for $hostname to start, exiting..." | ||
exit 1 | ||
fi | ||
sleep 1m | ||
state=$(curl -s -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1/devices/$device_id | jq -r .state) | ||
if [[ $state == "active" ]]; then | ||
echo "$hostname successfully created!" | ||
break | ||
fi | ||
echo "Still waiting..." | ||
let n++ | ||
done | ||
|
||
# Set the outputs | ||
echo "::set-output name=hostname::$hostname" | ||
echo "::set-output name=label::$hostname" | ||
echo "::set-output name=device_id::$device_id" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#!/usr/bin/env bash | ||
|
||
# This script is used to start a CNCF CIL runner | ||
|
||
token=$1 | ||
device_id=$2 | ||
hostname=$3 | ||
|
||
echo "Removing CNCF CIL machine: $hostname..." | ||
|
||
# https://metal.equinix.com/developers/api/devices/#devices-deletedevice | ||
remove_cil_result=$(curl -X DELETE -I -s -w %{http_code} -o /dev/null -H "X-Auth-Token: $token" https://api.equinix.com/metal/v1/devices/$device_id) | ||
|
||
if [[ $remove_cil_result != "204" ]]; then | ||
echo "ERROR: Failed to remove CNCF CIL machine: $hostname." | ||
exit 1 | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,7 @@ main() { | |
kubectl config view --minify --flatten > ~/minified_config | ||
mv ~/minified_config ~/.kube/config | ||
|
||
curl -L https://git.io/meshery | PLATFORM=$PLATFORM bash - | ||
curl -L https://git.io/meshery | sudo PLATFORM=$PLATFORM bash - & | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason for running this command in the background? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I tested on the self-hosted runner, it will be pending always if not running it in the background. But I don't know why this doesn't happen on github runner. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm okay, it's a minor thing let's just keep a note of this behaviour and come back to it later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's a good reminder. |
||
|
||
sleep 60 | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still pending, would this also come in the scope of this PR or should we just make a new issue and document it there, till someone else picks it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be in another PR instead of this one. This PR are the initial automation of self-hosted runner, so I won't include too many changes. And regarding this TODO, I think there is still something to discuss, it's a good idea to create an new issue and see if anyone can picks it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Let's create another issue on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyohuangxin I'll let you take care of making an issue on this one as you would understand the details better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hershd23 I created the issue #41 to track it.