Automating initialization of on-demand self-hosted CNCF CIL runner #39

gyohuangxin · 2022-02-25T09:40:58Z

The commit automates below steps:

Create a CNCF CIL machine and register it as a self-hosted runner
Run SMP benchmarks on the self-hosted runner
Stop and remove the CNCF CIL machine and self-hosted runner

Description

This PR fixes #38

Notes for Reviewers

Signed commits

Yes, I signed my commits.

hershd23 · 2022-02-25T17:12:26Z

Great work till now @gyohuangxin. You've made the workflow quite straightforward and easy to understand. If you need any help in resolving those OS permission issues you mentioned in #38, do let me know

Also mentioning @MarioArriaga92 who had a question in the recent build and release about provisioning and releasing instances during the workflow. This should help Mario!

gyohuangxin · 2022-02-28T15:12:32Z

@hershd23 Thanks, what you did before helped me a lot in this implementation!
The issue blocks me now is how to run minikube start with the user I created. I tried to create a user and switch to the user: https://github.com/gyohuangxin/meshery-smp-action/blob/self-hosted/.github/workflows/configurable-benchmark-test-self-hosted.yaml#L83-#L86
However, the message of minikube told me I was still using root user which is not allowed to run minikube start: https://github.com/gyohuangxin/meshery-smp-action/runs/5357892309?check_suite_focus=true#step:3:76
It seems it's hard to switch another user across github action steps.

hershd23 · 2022-03-01T06:49:22Z

I tried looking for the solution to that myself wasn't able to find anything. Is it possible to make a different user say smp on machine startup itself and our Github action sessions are made on that user instead of root?

gyohuangxin · 2022-03-01T07:29:16Z

@hershd23 Yes, thank you for your help, it makes sense. I'm investigating to create the user via userdata script, so that we can use the smp user from startup.

gyohuangxin · 2022-03-03T03:22:28Z

@hershd23 I used the cloud-config configuration to create user smp and it works, https://github.com/gyohuangxin/meshery-smp-action/blob/self-hosted/.github/workflows/scripts/start-cil-runner.sh#L15
The cloud-config (userdata) is very helpful for us to configure everything from machine stratup and here is the related docs:

You can find the latest result from here: https://github.com/gyohuangxin/meshery-smp-action/runs/5401509632?check_suite_focus=true
The machine creation, registration, deletion has been implemented, but still an issue with runing mesheryctl perf, it seems the pods can not be accessed when usingkubernetes platform. Do you have any comments on this? @hershd23 @navendu-pottekkat

Configuring Meshery to access Minikube...
Error getting context: Post "http://localhost:9081/api/system/kubernetes/contexts": dial tcp [::1]:9081: connect: connection refused
Configuration file: load-test.yaml
Endpoint URL: http://192.168.49.2:31126/productpage
Service Mesh: ISTIO
Test Name: istio-fortio-load-test.yamltest
Load Generator: fortio
Running test with test configuration file load-test.yaml
Error: failed to make a request.Get "http://localhost:9081/api/user/performance/profiles?page_size=25&page=0&search=test": dial tcp [::1]:9081: connect: connection refused.
See https://docs.meshery.io/reference/mesheryctl/perf/apply for usage details

pottekkat · 2022-03-03T03:40:24Z

@gyohuangxin It seems like we are deploying Meshery inside the minikube cluster and there is some networking issue. I would suggest we deploy Meshery in Docker and connect it to the Kubernetes cluster. This is how we run tests on the GitHub runners now.

~~I'm not sure how it deployed Meshery on Kubernetes right now. I will go over the action code and get back.~~

We can set the platform on the workflow to Docker and it should work.

Does this help?

pottekkat · 2022-03-03T03:41:27Z

Also, not deploying Meshery on the cluster have the added benefit of producing more accurate results as running Meshery is not interfering with the performance of the cluster to some extent.

gyohuangxin · 2022-03-03T05:42:48Z

@navendu-pottekkat Thanks, it helps. I'll try to deploy Meshery in Docker.

gyohuangxin · 2022-03-08T10:16:44Z

@navendu-pottekkat I tried to deploy Meshery in docker, but there was a nil pointer panic in meshery container. https://github.com/gyohuangxin/meshery-smp-action/runs/5462624684?check_suite_focus=true#step:6:1240
Do you have any comments on this?

pottekkat · 2022-03-08T11:34:55Z

@navendu-pottekkat I tried to deploy Meshery in docker, but there was a nil pointer panic in meshery container. gyohuangxin/meshery-smp-action/runs/5462624684?check_suite_focus=true#step:6:1240
Do you have any comments on this?

@piyushsingariya I think we did fix this bug. A new release of mesheryctl should fix this right?

piyushsingariya · 2022-03-08T12:51:36Z

@navendu-pottekkat I tried to deploy Meshery in docker, but there was a nil pointer panic in meshery container. gyohuangxin/meshery-smp-action/runs/5462624684?check_suite_focus=true#step:6:1240
Do you have any comments on this?

@piyushsingariya I think we did fix this bug. A new release of mesheryctl should fix this right?

@navendu-pottekkat This isn't an mesheryctl issue, it's from the server. @gyohuangxin can you try running same performance test with local build??

gyohuangxin · 2022-03-09T10:39:59Z

@navendu-pottekkat @piyushsingariya When the platform is docker and using mesheryctl perf, it seems that the meshery container cannot access the endpoint of service mesh application.
https://github.com/gyohuangxin/meshery-smp-action/runs/5462624684?check_suite_focus=true#step:6:1174
Is there any method to make endpoint can be accessed to meshery container?

pottekkat · 2022-03-10T08:46:42Z

There is no Meshery method. It could be some issue with the networking but we were able to use the same configuration on GitHub runners to successfully run benchmark tests and access the application endpoint. It could be environment specific.

gyohuangxin · 2022-03-10T10:36:45Z

@navendu-pottekkat Yes, I looked at the code and found the panic caused by failing to get minikube context meshery-meshery-1 | time="2022-03-10T07:48:08Z" level=warning msg="failed to generate in cluster context: " meshery-meshery-1 | time="2022-03-10T07:48:08Z" level=warning msg="failed to find kubernetes context".
And I tried the local build manually and it works, so it could be the OS permission issue again.

And regarding the Github runner, I found there were another panic with GitHub runners, https://github.com/layer5io/meshery-smp-action/runs/5493977248?check_suite_focus=true#step:6:1573, it may be another issue we need to fix.

The commit automates below steps: 1. Create a CNCF CIL machine and register it as a self-hosted runner 2. Run SMP benchmarks on the self-hosted runner 3. Stop and remove the CNCF CIL machine and self-hosted runner Signed-off-by: Huang Xin <[email protected]>

gyohuangxin · 2022-03-11T07:46:37Z

Hi, there. I'm blocked by it too much time and the next important job (running scheduled benchmarking test on CNCF cluster) shouldn't be blocked. Can we start reviewing this PR and create another issue to track the meshery problem?
@hershd23 @navendu-pottekkat @leecalcote

pottekkat · 2022-03-11T08:20:27Z

Yes @gyohuangxin We can review the workflow and merge it. Could you open a new issue to track the others?

pottekkat · 2022-03-11T08:21:06Z

@hershd23 Could you also review this PR?

hershd23 · 2022-03-11T08:45:07Z

Yes will do

gyohuangxin · 2022-03-11T08:56:16Z

@navendu-pottekkat @hershd23 Thanks, I opened the issue #40.

hershd23

@gyohuangxin I have reviewed the PR. Great work. I have not suggested any changes but have left a few comments wanting to know a couple of things.

hershd23 · 2022-03-11T18:10:01Z

meshery.sh

@@ -35,7 +35,7 @@ main() {
 	kubectl config view --minify --flatten > ~/minified_config
 	mv ~/minified_config ~/.kube/config

-  curl -L https://git.io/meshery | PLATFORM=$PLATFORM bash -
+  	curl -L https://git.io/meshery | sudo PLATFORM=$PLATFORM bash - &


Any reason for running this command in the background?

When I tested on the self-hosted runner, it will be pending always if not running it in the background. But I don't know why this doesn't happen on github runner.

Hmm okay, it's a minor thing let's just keep a note of this behaviour and come back to it later.

Yes, it's a good reminder.

hershd23 · 2022-03-11T18:13:10Z

.github/workflows/scripts/start-cil-runner.sh

+# Use user_data_scripts to register the CNCF CIL runner as a self-hosted runner
+user_data_scripts="#cloud-config\nusers:\n    - default\n    - name: smp\n      groups: sudo, docker\n      sudo: ALL=(ALL) NOPASSWD:ALL\n      lock_passwd: true\nruncmd:\n    - [runuser, -l, smp, -c, \'mkdir actions-runner && cd actions-runner\']\n    - [runuser, -l, smp, -c, \'curl -o actions-runner-linux-x64-2.287.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.287.1/actions-runner-linux-x64-2.287.1.tar.gz\']\n    - [runuser, -l, smp, -c, \'tar xzf ./actions-runner-linux-x64-2.287.1.tar.gz\']\n    - [runuser, -l, smp, -c, \'export RUNNER_ALLOW_RUNASROOT=1\']\n    - [runuser, -l, smp, -c, \'./config.sh --url https://github.com/$REPOSITORY --token $REG_TOKEN --labels $hostname >> github-action-registeration.log\']\n    - [runuser, -l, smp, -c, \'./run.sh >> github-action-registeration.log\']"
+
+# TODO: the options "operating_system", "facility", "plan" are hardcoded now, we should make them configurable


Is this TODO still pending, would this also come in the scope of this PR or should we just make a new issue and document it there, till someone else picks it up?

It should be in another PR instead of this one. This PR are the initial automation of self-hosted runner, so I won't include too many changes. And regarding this TODO, I think there is still something to discuss, it's a good idea to create an new issue and see if anyone can picks it up.

Sure! Let's create another issue on this

@gyohuangxin I'll let you take care of making an issue on this one as you would understand the details better.

@hershd23 I created the issue #41 to track it.

hershd23 · 2022-03-11T18:14:30Z

.github/workflows/scripts/start-cil-runner.sh

+    exit 1
+fi
+
+# Wait 10 minutes until the machine is running 


Any reason to wait 10 minutes? Is standard waiting time for an instance to come up mentioned somewhere?

There is no standard waiting time in my opinion. My thought about waiting 10 minutes is that if it takes too long to wait for machine to be running, there must be something wrong with it.
However, on my second thought, it would be better if we could take more advantage of "state" field instead of just waiting 10 minutes:

If "state" == "provisioning", sleep 10s...

If "state" == "active", echo "Machine successfully created!" and continue.

If "state" == "failed", echo "Failed to create machine" and exit.
How do you think about this?

@gyohuangxin this new flow makes a lot of sense. But let's capture this in another issue. There might be slight experimentation required here.

I understand this issue and will be creating an issue ticket for this one

Thanks, please go ahead.

hershd23

LGTM. Let's just create tickets for the things we have discussed in the comments and we will be good to go

hershd23 · 2022-03-13T17:20:25Z

@navendu-pottekkat @leecalcote I don't seem to have the permissions to merge these changes. Do review them and merge them once you're done with your review

hershd23 · 2022-03-15T14:37:15Z

@leecalcote @navendu-pottekkat please do review and merge. Looks like I do not have merging permissions here

piyushsingariya

LGTM!

leecalcote · 2022-03-15T15:07:23Z

This is really neat.

leecalcote · 2022-03-15T15:08:56Z

New release available: v0.2.0 - https://github.com/marketplace/actions/performance-testing-with-meshery

gyohuangxin marked this pull request as ready for review March 11, 2022 07:46

hershd23 reviewed Mar 11, 2022

View reviewed changes

hershd23 approved these changes Mar 13, 2022

View reviewed changes

gyohuangxin mentioned this pull request Mar 14, 2022

Make the self-hosted runner configurable as the workflow's options #41

Closed

gyohuangxin requested review from pottekkat and leecalcote March 15, 2022 14:25

piyushsingariya approved these changes Mar 15, 2022

View reviewed changes

leecalcote approved these changes Mar 15, 2022

View reviewed changes

leecalcote merged commit 20c0b6f into layer5io:self-hosted Mar 15, 2022

Automating initialization of on-demand self-hosted CNCF CIL runner #39

Automating initialization of on-demand self-hosted CNCF CIL runner #39

Conversation

gyohuangxin commented Feb 25, 2022 • edited Loading

hershd23 commented Feb 25, 2022 • edited Loading

gyohuangxin commented Feb 28, 2022

hershd23 commented Mar 1, 2022

gyohuangxin commented Mar 1, 2022

gyohuangxin commented Mar 3, 2022

pottekkat commented Mar 3, 2022

pottekkat commented Mar 3, 2022

gyohuangxin commented Mar 3, 2022

gyohuangxin commented Mar 8, 2022

pottekkat commented Mar 8, 2022 • edited Loading

piyushsingariya commented Mar 8, 2022

gyohuangxin commented Mar 9, 2022

pottekkat commented Mar 10, 2022

gyohuangxin commented Mar 10, 2022

gyohuangxin commented Mar 11, 2022

pottekkat commented Mar 11, 2022

pottekkat commented Mar 11, 2022

hershd23 commented Mar 11, 2022

gyohuangxin commented Mar 11, 2022

hershd23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hershd23 left a comment

Choose a reason for hiding this comment

hershd23 commented Mar 13, 2022

hershd23 commented Mar 15, 2022

piyushsingariya left a comment

Choose a reason for hiding this comment

leecalcote commented Mar 15, 2022

leecalcote commented Mar 15, 2022

gyohuangxin commented Feb 25, 2022 •

edited

Loading

hershd23 commented Feb 25, 2022 •

edited

Loading

pottekkat commented Mar 8, 2022 •

edited

Loading