Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark test servers for auto-deletion using "end_at" parameter #75

Closed
leecalcote opened this issue May 11, 2023 — with Slack · 8 comments · Fixed by #76
Closed

Mark test servers for auto-deletion using "end_at" parameter #75

leecalcote opened this issue May 11, 2023 — with Slack · 8 comments · Fixed by #76
Assignees
Labels
area/ci Continuous integration | Build and release area/devops area/tests Testing / quality assurance help wanted Extra attention is needed

Comments

Copy link
Member

leecalcote commented May 11, 2023

Current Behavior

Of the scheduled tests that run multiple times a day, they have faced a few challenges. Notably, one of those challenges is in the cleanup phase once a test is complete. Currently, it is frequently the case that any number of bare metal servers that are used for testing or orphaned, and not decommissioned at the end of each test. This leaves an inordinate amount of bare metal servers, unnecessarily unavailable for used by other projects.

@vielmetti has been most helpful in identifying ways to mitigate this from happening.

Desired Behavior

All resources provisioned for a scheduled test are subsequently decommissioned at the end of that same test.

Implementation

Recently @vielmetti point this out:

You can create servers that will auto-delete themselves at a time certain, perfect for test runs. See https://deploy.equinix.com/developers/docs/metal/deploy/spot-market/#spot-market-request-creation. You want the “end_at” parameter on the API endpoint for device creation

Slack Message

Acceptance Tests

  1. Test servers are decommissioned at end of scheduled testing period.

Contributor Guide

@leecalcote leecalcote added area/ci Continuous integration | Build and release area/devops area/tests Testing / quality assurance labels May 11, 2023 — with Slack
@leecalcote leecalcote added the help wanted Extra attention is needed label May 11, 2023
@vielmetti
Copy link

looks like it's here that needs to be changed, and @gyohuangxin had the last edit. (actually looks like a pretty simple fix). We'll need to compute a timestamp.

.github/workflows/scripts/start-cil-runner.sh

What I don't know is, how long are the tests expected to run, at worst case? Can the machines be clobbered in 24, 6, 2 hours?

@gyohuangxin
Copy link
Member

@vielmetti Currently, it runs in 30mins, at worst case. But considering that we may add more test cases in the future, I think 2 hours is a appropriate deadline.

@vielmetti
Copy link

@gyohuangxin The Equinix system bills by the hour, so my recommendation is to set the expiry at say 1 hour 50 minutes from the time it's created, that will catch anything relatively fast but not risk running 2:01 and incurring the extra charge.

The example end_at time is in ISO format, e.g

"end_at": "2020-09-24T05:00:00Z",

which can be generated with something like date -v +110M -Iminutes .

@gyohuangxin
Copy link
Member

@gyohuangxin gyohuangxin self-assigned this May 15, 2023
@gyohuangxin
Copy link
Member

@vielmetti I confirmed that Equinix system is using UTC/GMT timezone, so please review the PR, thanks.
image

@vielmetti
Copy link

Thanks @gyohuangxin - I am checking on the semantics of "termination_time", to see if this is expected to work for ordinary instances or only for "spot instances". Will follow up soonest when I can confirm.

@vielmetti
Copy link

Confirming two things: one, that our team is working on updated docs for the termination_time option, and two, that based on my understanding this should work as planned.

@vielmetti
Copy link

To complete this -

We have updated the Equinox Metal "termination_time" docs at https://deploy.equinix.com/developers/api/metal/#tag/Devices/operation/createDevice to reflect better the use case (ephemeral instances) and to document the time zone question described above.

Since this change was deployed last month we have not had any of the previous issues described! That's all good news.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci Continuous integration | Build and release area/devops area/tests Testing / quality assurance help wanted Extra attention is needed
Development

Successfully merging a pull request may close this issue.

3 participants