Skip to content

K8S: Tutor should use selectors to validate already running jobs #1206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jfavellar90 opened this issue Mar 10, 2025 · 0 comments
Open

K8S: Tutor should use selectors to validate already running jobs #1206

jfavellar90 opened this issue Mar 10, 2025 · 0 comments

Comments

@jfavellar90
Copy link

Bug description
When Tutor is commanded to perform a K8S job (which uses the K8sTaskRunner runner), it validates that other jobs have been completed. The problem with this is that it validates the state of ALL the jobs in the namespace (not only those related to Tutor). If a non-related to Tutor job is running in the namespace, it will delay the execution of the Tutor K8S job till its completion. This is not an ideal situation in a namespace where additional jobs are running independently from Tutor.

A possible solution is to use job selectors so Tutor is able to validate that jobs related to Tutor have been completed.

How to reproduce

I used Minikube to test this behavior (used K8S 1.31 version, same kubectl version).

  • Create a new namespace in the cluster.
  • Install Tutor, create a new Tutor environment and run tutor config save.
  • Change the kubectl context to point to the namespace you just created in the cluster.
  • Create an infinite-running job in the namespace. This can be something like:
apiVersion: batch/v1
kind: Job
metadata:
  name: infinite-job
spec:
  template:
    spec:
      restartPolicy: Never  # Can also be "OnFailure"
      containers:
        - name: infinite-loop
          image: busybox
          command: ["sh", "-c", "while true; do echo 'Running...'; sleep 60; done"]
  backoffLimit: 0
  • Initialize tutor in K8S by running tutor k8s launch -I
  • Check that the services initialization does not start due to the existence of the previously created job:

Image

  • Delete the infinite-running job, once this is done, the K8S initialization can be executed with no issues
@DawoudSheraz DawoudSheraz moved this from Pending Triage to Backlog in Tutor project management Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant