Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the cleanup support #37

Merged
merged 1 commit into from
May 17, 2024
Merged

Add the cleanup support #37

merged 1 commit into from
May 17, 2024

Conversation

yuanchen8911
Copy link
Collaborator

@yuanchen8911 yuanchen8911 commented May 15, 2024

This PR adds the cleanup option to delete all jobs and pods in a test.

  • --cleanup
  • -- cleanup.timeout

The initial verification shows it works.

  • Without --cleanup, the job remained.
./bin/knavigator --tasks resources/tests/k8s/test-job.yml; kubectl get job -n k8s-test
I0515 15:33:07.864033   41806 k8s_config.go:42] "Using external kubeconfig"
I0515 15:33:07.869065   41806 main.go:84] "Starting test" name="test-k8s-job"
I0515 15:33:07.869085   41806 engine.go:103] "Creating task" name="SubmitObj" id="job"
I0515 15:33:07.869617   41806 engine.go:197] "Starting task" id="SubmitObj/job"
I0515 15:33:07.892601   41806 engine.go:203] "Task completed" id="SubmitObj/job" duration="22.973334ms"
I0515 15:33:07.892615   41806 engine.go:209] "Reset Engine"

NAME   STATUS    COMPLETIONS   DURATION   AGE
job1   Running   0/2           0s         0s
  • With the option --cleanup, the job was delete.
./bin/knavigator --tasks resources/tests/k8s/test-job.yml --cleanup; kubectl get job -n k8s-test
I0515 15:34:09.350208   41865 k8s_config.go:42] "Using external kubeconfig"
I0515 15:34:09.355431   41865 main.go:84] "Starting test" name="test-k8s-job"
I0515 15:34:09.355452   41865 engine.go:103] "Creating task" name="SubmitObj" id="job"
I0515 15:34:09.356077   41865 engine.go:197] "Starting task" id="SubmitObj/job"
I0515 15:34:09.382863   41865 engine.go:203] "Task completed" id="SubmitObj/job" duration="26.774458ms"
I0515 15:34:09.382878   41865 engine.go:209] "Reset Engine"
I0515 15:34:09.382883   41865 engine.go:215] "Cleaning up objects"
I0515 15:34:09.393272   41865 engine.go:238] "Deleted all objects"

No resources found in k8s-test namespace. 

Verified it works for volcano jobs too.

@yuanchen8911 yuanchen8911 added wip Work in progress do not merge Do not merge labels May 15, 2024
@yuanchen8911 yuanchen8911 requested a review from dmitsh May 15, 2024 04:03
@yuanchen8911 yuanchen8911 added the help wanted Extra attention is needed label May 15, 2024
@yuanchen8911 yuanchen8911 force-pushed the cleanup branch 2 times, most recently from 8a4440a to 8f421ef Compare May 15, 2024 22:40
@yuanchen8911 yuanchen8911 removed help wanted Extra attention is needed do not merge Do not merge labels May 15, 2024
@yuanchen8911 yuanchen8911 changed the title Add the cleanup support (WIP) Add the cleanup support May 15, 2024
for _, name := range objInfo.Names {
err := eng.dynamicClient.Resource(objInfo.GVR).Namespace(ns).Delete(ctx, name, deletions)
if err != nil {
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it return an error if the object has been already deleted by the test or expired naturally?

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the following the correct way to add a DeleteObj task? The SubmitObj task's id is job.

- id: delete
  type: DeleteObj
  params:
    refTaskId: job
  refTaskId: job

After adding the above task to k8s/test-job.yaml, i ran the test (without --cleanup) and got the following error.

E0516 09:56:44.038701   53311 engine.go:200] "Task failed" err="the server could not find the requested resource" id="DeleteObj/delete"
I0516 09:56:44.038712   53311 engine.go:209] "Reset Engine"
E0516 09:56:44.038718   53311 main.go:96] the server could not find the requested resource

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I was wondering why we need to define refTaskId twice in DeleteObj.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't need refTaskId twice.

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the second refTaskId and still got the same error. Do we have a test with a DeleteObj task other than the unit test? Can you test the DeleteObj task by yourself?

 ./bin/knavigator --tasks resources/tests/k8s/test-job.yml
I0516 10:15:41.155408   53935 k8s_config.go:42] "Using external kubeconfig"
I0516 10:15:41.158578   53935 main.go:84] "Starting test" name="test-k8s-job"
I0516 10:15:41.158590   53935 engine.go:103] "Creating task" name="SubmitObj" id="job"
I0516 10:15:41.160156   53935 engine.go:197] "Starting task" id="SubmitObj/job"
I0516 10:15:41.182176   53935 engine.go:203] "Task completed" id="SubmitObj/job" duration="22.011666ms"
I0516 10:15:41.182188   53935 engine.go:103] "Creating task" name="DeleteObj" id="delete"
I0516 10:15:41.182224   53935 engine.go:197] "Starting task" id="DeleteObj/delete"
E0516 10:15:41.184724   53935 engine.go:200] "Task failed" err="the server could not find the requested resource" id="DeleteObj/delete"
I0516 10:15:41.184735   53935 engine.go:209] "Reset Engine"
E0516 10:15:41.184741   53935 main.go:96] the server could not find the requested resource

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, we never tested it. Let me check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bug indeed. Fixed in #41

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested --cleanup with and without the DeleteObj task. It workded well. For the case with DeleteObj, there's no error. The deleted object should have been removed from eng.objInfoMap?

./bin/knavigator --tasks resources/tests/k8s/test-job.yml  --cleanup
I0517 10:24:41.489049   93711 k8s_config.go:42] "Using external kubeconfig"
I0517 10:24:41.494671   93711 main.go:84] "Starting test" name="test-k8s-job"
I0517 10:24:41.494690   93711 engine.go:111] "Creating task" name="RegisterObj" id="register"
I0517 10:24:41.495255   93711 engine.go:244] "Starting task" id="RegisterObj/register"
I0517 10:24:41.495266   93711 engine.go:250] "Task completed" id="RegisterObj/register" duration="1.708µs"
I0517 10:24:41.495271   93711 engine.go:111] "Creating task" name="SubmitObj" id="job"
I0517 10:24:41.495349   93711 engine.go:244] "Starting task" id="SubmitObj/job"
I0517 10:24:41.512305   93711 engine.go:250] "Task completed" id="SubmitObj/job" duration="16.949084ms"
I0517 10:24:41.512316   93711 engine.go:111] "Creating task" name="CheckPod" id="status"
I0517 10:24:41.512363   93711 engine.go:244] "Starting task" id="CheckPod/status"
I0517 10:24:41.512369   93711 check_pod_task.go:158] "Create pod informer" #pod=2 timeout="5s"
I0517 10:24:41.517275   93711 check_pod_task.go:256] "Accounted for all pods"
I0517 10:24:41.517300   93711 engine.go:250] "Task completed" id="CheckPod/status" duration="4.932042ms"
I0517 10:24:41.517317   93711 engine.go:111] "Creating task" name="DeleteObj" id="delete"
I0517 10:24:41.517385   93711 engine.go:244] "Starting task" id="DeleteObj/delete"
W0517 10:24:41.529210   93711 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
I0517 10:24:41.529230   93711 engine.go:250] "Task completed" id="DeleteObj/delete" duration="11.837917ms"
I0517 10:24:41.529235   93711 engine.go:256] "Reset Engine"
I0517 10:24:41.529237   93711 engine.go:262] "Cleaning up objects"
I0517 10:24:41.533397   93711 engine.go:286] "Deleted all objects"

@yuanchen8911 yuanchen8911 requested a review from dmitsh May 16, 2024 17:00
@yuanchen8911 yuanchen8911 removed the wip Work in progress label May 16, 2024
Signed-off-by: Yuan Chen <[email protected]>

Update engine

Signed-off-by: Yuan Chen <[email protected]>

Resolve conflicts

Signed-off-by: Yuan Chen <[email protected]>

Update main.go

Implement DeleteAllObjects

Signed-off-by: Yuan Chen <[email protected]>

Fix a golint error

Signed-off-by: Yuan Chen <[email protected]>

Remove a comment

Signed-off-by: Yuan Chen <[email protected]>
@yuanchen8911 yuanchen8911 merged commit 0a49d8b into NVIDIA:main May 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants