-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automating initialization of on-demand self-hosted CNCF CIL runner #39
Conversation
Great work till now @gyohuangxin. You've made the workflow quite straightforward and easy to understand. If you need any help in resolving those OS permission issues you mentioned in #38, do let me know Also mentioning @MarioArriaga92 who had a question in the recent build and release about provisioning and releasing instances during the workflow. This should help Mario! |
@hershd23 Thanks, what you did before helped me a lot in this implementation! |
I tried looking for the solution to that myself wasn't able to find anything. Is it possible to make a different user say |
@hershd23 Yes, thank you for your help, it makes sense. I'm investigating to create the user via userdata script, so that we can use the |
@hershd23 I used the cloud-config configuration to create user
You can find the latest result from here: https://github.com/gyohuangxin/meshery-smp-action/runs/5401509632?check_suite_focus=true
|
@gyohuangxin It seems like we are deploying Meshery inside the minikube cluster and there is some networking issue. I would suggest we deploy Meshery in Docker and connect it to the Kubernetes cluster. This is how we run tests on the GitHub runners now.
We can set the platform on the workflow to Docker and it should work. Does this help? |
Also, not deploying Meshery on the cluster have the added benefit of producing more accurate results as running Meshery is not interfering with the performance of the cluster to some extent. |
@navendu-pottekkat Thanks, it helps. I'll try to deploy Meshery in Docker. |
@navendu-pottekkat I tried to deploy Meshery in docker, but there was a nil pointer panic in meshery container. https://github.com/gyohuangxin/meshery-smp-action/runs/5462624684?check_suite_focus=true#step:6:1240 |
@piyushsingariya I think we did fix this bug. A new release of mesheryctl should fix this right? |
@navendu-pottekkat This isn't an mesheryctl issue, it's from the server. @gyohuangxin can you try running same performance test with local build?? |
@navendu-pottekkat @piyushsingariya When the platform is docker and using mesheryctl perf, it seems that the meshery container cannot access the endpoint of service mesh application. |
There is no Meshery method. It could be some issue with the networking but we were able to use the same configuration on GitHub runners to successfully run benchmark tests and access the application endpoint. It could be environment specific. |
@navendu-pottekkat Yes, I looked at the code and found the panic caused by failing to get minikube context And regarding the Github runner, I found there were another panic with GitHub runners, https://github.com/layer5io/meshery-smp-action/runs/5493977248?check_suite_focus=true#step:6:1573, it may be another issue we need to fix. |
The commit automates below steps: 1. Create a CNCF CIL machine and register it as a self-hosted runner 2. Run SMP benchmarks on the self-hosted runner 3. Stop and remove the CNCF CIL machine and self-hosted runner Signed-off-by: Huang Xin <[email protected]>
Hi, there. I'm blocked by it too much time and the next important job (running scheduled benchmarking test on CNCF cluster) shouldn't be blocked. Can we start reviewing this PR and create another issue to track the meshery problem? |
Yes @gyohuangxin We can review the workflow and merge it. Could you open a new issue to track the others? |
@hershd23 Could you also review this PR? |
Yes will do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyohuangxin I have reviewed the PR. Great work. I have not suggested any changes but have left a few comments wanting to know a couple of things.
@@ -35,7 +35,7 @@ main() { | |||
kubectl config view --minify --flatten > ~/minified_config | |||
mv ~/minified_config ~/.kube/config | |||
|
|||
curl -L https://git.io/meshery | PLATFORM=$PLATFORM bash - | |||
curl -L https://git.io/meshery | sudo PLATFORM=$PLATFORM bash - & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for running this command in the background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested on the self-hosted runner, it will be pending always if not running it in the background. But I don't know why this doesn't happen on github runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay, it's a minor thing let's just keep a note of this behaviour and come back to it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a good reminder.
# Use user_data_scripts to register the CNCF CIL runner as a self-hosted runner | ||
user_data_scripts="#cloud-config\nusers:\n - default\n - name: smp\n groups: sudo, docker\n sudo: ALL=(ALL) NOPASSWD:ALL\n lock_passwd: true\nruncmd:\n - [runuser, -l, smp, -c, \'mkdir actions-runner && cd actions-runner\']\n - [runuser, -l, smp, -c, \'curl -o actions-runner-linux-x64-2.287.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.287.1/actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'tar xzf ./actions-runner-linux-x64-2.287.1.tar.gz\']\n - [runuser, -l, smp, -c, \'export RUNNER_ALLOW_RUNASROOT=1\']\n - [runuser, -l, smp, -c, \'./config.sh --url https://github.com/$REPOSITORY --token $REG_TOKEN --labels $hostname >> github-action-registeration.log\']\n - [runuser, -l, smp, -c, \'./run.sh >> github-action-registeration.log\']" | ||
|
||
# TODO: the options "operating_system", "facility", "plan" are hardcoded now, we should make them configurable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still pending, would this also come in the scope of this PR or should we just make a new issue and document it there, till someone else picks it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be in another PR instead of this one. This PR are the initial automation of self-hosted runner, so I won't include too many changes. And regarding this TODO, I think there is still something to discuss, it's a good idea to create an new issue and see if anyone can picks it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Let's create another issue on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyohuangxin I'll let you take care of making an issue on this one as you would understand the details better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit 1 | ||
fi | ||
|
||
# Wait 10 minutes until the machine is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to wait 10 minutes? Is standard waiting time for an instance to come up mentioned somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no standard waiting time in my opinion. My thought about waiting 10 minutes is that if it takes too long to wait for machine to be running, there must be something wrong with it.
However, on my second thought, it would be better if we could take more advantage of "state" field instead of just waiting 10 minutes:
- If "state" == "provisioning", sleep 10s...
- If "state" == "active", echo "Machine successfully created!" and continue.
- If "state" == "failed", echo "Failed to create machine" and exit.
How do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyohuangxin this new flow makes a lot of sense. But let's capture this in another issue. There might be slight experimentation required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this issue and will be creating an issue ticket for this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, please go ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's just create tickets for the things we have discussed in the comments and we will be good to go
@navendu-pottekkat @leecalcote I don't seem to have the permissions to merge these changes. Do review them and merge them once you're done with your review |
@leecalcote @navendu-pottekkat please do review and merge. Looks like I do not have merging permissions here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This is really neat. |
New release available: v0.2.0 - https://github.com/marketplace/actions/performance-testing-with-meshery |
The commit automates below steps:
Description
This PR fixes #38
Notes for Reviewers
Signed commits