Jobs and CronJobs

Job Example

apiVersion: batch/v1
kind: Job
metadata:
  name: sum-job
spec:
  parallelism: 10
  completions: 2
  backoffLimit: 5
  template:
    metadata:
      name: sum-job-pod
    spec:
      containers:
        - name: sum-job-pod-container
          image: busybox
          command: ["/bin/sh", "-c"]
          args: ["expr 3 + 2"]
      restartPolicy: OnFailure

A pod definition file named throw-dice-pod.yaml is given. The image throw-dice randomly returns a value between 1 and 6.6 is considered success and all others are failure.

Try deploying the POD and view the POD logs for the generated number.

# throw-dice-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: throw-dice-pod
spec:
  containers:
  -  image: kodekloud/throw-dice
     name: throw-dice
  restartPolicy: Never

Executando Comandos… Ao observar os logs do Pod concluímos que houve uma falha já que o valor gerado(2) não foi 6, que seria considerado um sucesso. O mais estranho é que no comando describe encontramos um erro reportado pelo kubelet “MountVolume.SetUp failed for volume "kube-api-access-z6ghh" : object "default"/"kube-root-ca.crt" not registered”.

Então, o insucesso do Pod foi causado pelo valor gerado(2) ou por um problema de certificados?

root@controlplane ~ ➜  k apply -f throw-dice-pod.yaml 
	pod/throw-dice-pod created

root@controlplane ~ ➜  k get pods
	NAME             READY   STATUS   RESTARTS   AGE
	throw-dice-pod   0/1     Error    0          15s
	
root@controlplane ~ ➜  k logs throw-dice-pod 
	2

root@controlplane ~ ➜  k describe pods throw-dice-pod 
	Name:         throw-dice-pod
	Namespace:    default
	Status:       Failed
	Containers:
	  throw-dice:
	    Container ID:   docker://12bf124268456067ee8035f2fe95ab...
	    Image:          kodekloud/throw-dice
	    Image ID:       docker-pullable://kodekloud/throw-dice@...
	    State:          Terminated
	      Reason:       Error
	      Exit Code:    1
	      Started:      Wed, 13 Jul 2022 09:11:28 +0000
	      Finished:     Wed, 13 Jul 2022 09:11:28 +0000
	    Ready:          False
	    Restart Count:  0
	    Environment:    <none>
	    Mounts:
	      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z6ghh (ro)
	Conditions:
	  Type              Status
	  Initialized       True 
	  Ready             False 
	  ContainersReady   False 
	  PodScheduled      True 
	Volumes:
	  kube-api-access-z6ghh:
	    Type:                    Projected (a volume that contains injected data from multiple sources)
	    TokenExpirationSeconds:  3607
	    ConfigMapName:           kube-root-ca.crt
	    ConfigMapOptional:       <nil>
	    DownwardAPI:             true
	Events:
	  Type     Reason       Age                From               Message
	  ----     ------       ----               ----               -------
	  Normal   Scheduled    53s                default-scheduler  Successfully assigned default/throw-dice-pod to controlplane
	  Normal   Pulling      51s                kubelet            Pulling image "kodekloud/throw-dice"
	  Normal   Pulled       50s                kubelet            Successfully pulled image "kodekloud/throw-dice" in 1.552159624s
	  Normal   Created      49s                kubelet            Created container throw-dice
	  Normal   Started      47s                kubelet            Started container throw-dice
	  Warning  FailedMount  42s (x3 over 44s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-z6ghh" : object "default"/"kube-root-ca.crt" not registered

Create a Job using this POD definition file or from the imperative command and look at how many attempts does it take to get a '6'.

Use the specification given on the below.

apiVersion: batch/v1
kind: Job
metadata:
  name: throw-dice-job
spec:
  parallelism: 1
  completions: 1
  backoffLimit: 6  # default
  template:
    spec:
      containers:
        - name: throw-dice-job-pod
          image: kodekloud/throw-dice
      restartPolicy: Never
root@controlplane ~ ➜  k get pods,jobs
NAME                       READY   STATUS      RESTARTS   AGE
pod/throw-dice-job-9q6g2   0/1     Error       0          26s
pod/throw-dice-job-h7r8z   0/1     Error       0          30s
pod/throw-dice-job-j6ndz   0/1     Error       0          22s
pod/throw-dice-job-knznw   0/1     Completed   0          19s
pod/throw-dice-job-vz8zp   0/1     Error       0          34s

NAME                       COMPLETIONS   DURATION   AGE
job.batch/throw-dice-job   1/1           19s        34s

Detalhe, como a restartPolicy era Never são criados n Pods até que ocorra um sucesso ou o backoffLimit seja atingido.

Vamos definir o backoffLimit para 2. Foram criados dois Pods que terminaram com status de erro, neste momento o backoffLimit foi atingido e uma última tentativa de completar o Job é realizada com um terceiro e último Pod.

kubectl get pods,jobs                                                                  06:42:30
NAME                          READY   STATUS   RESTARTS   AGE
pod/throw-dice-job--1-gm5wj   0/1     Error    0          31s
pod/throw-dice-job--1-k4v7c   0/1     Error    0          37s
pod/throw-dice-job--1-qvh2k   0/1     Error    0          21s

NAME                       COMPLETIONS   DURATION   AGE
job.batch/throw-dice-job   0/1           37s        37s

Update the job definition to run as many times as required to get 3 successful 6's

Delete existing job and create a new one with the given spec. Monitor and wait for the job to succeed.

apiVersion: batch/v1
kind: Job
metadata:
  name: throw-dice-job
spec:
  parallelism: 1
  completions: 3
  backoffLimit: 6  # default
  template:
    spec:
      containers:
        - name: throw-dice-job-pod
          image: kodekloud/throw-dice
      restartPolicy: Never

That took a while. Let us try to speed it up, by running upto 3 jobs in parallel.

Update the job definition to run 3 jobs in parallel.

apiVersion: batch/v1
kind: Job
metadata:
  name: throw-dice-job
spec:
  parallelism: 3
  completions: 3
  backoffLimit: 6  # default
  template:
    spec:
      containers:
        - name: throw-dice-job-pod
          image: kodekloud/throw-dice
      restartPolicy: Never

Primeira tentativa o Job atingiu o backoffLimit

root@controlplane ~ ➜  k get jobs,pods
NAME                       COMPLETIONS   DURATION   AGE
job.batch/throw-dice-job   2/3           38s        38s

NAME                       READY   STATUS      RESTARTS   AGE
pod/throw-dice-job-2hr7b   0/1     Error       0          9s
pod/throw-dice-job-4vk7l   0/1     Error       0          14s
pod/throw-dice-job-5dd24   0/1     Error       0          19s
pod/throw-dice-job-h6v84   0/1     Error       0          25s
pod/throw-dice-job-kd8w8   0/1     Error       0          38s
pod/throw-dice-job-lf29l   0/1     Error       0          38s
pod/throw-dice-job-mbs9d   0/1     Completed   0          38s
pod/throw-dice-job-nz8dz   0/1     Completed   0          16s
pod/throw-dice-job-zc5vf   0/1     Error       0          22s

root@controlplane ~ ➜  k describe jobs throw-dice-job 
Name:             throw-dice-job
Namespace:        default
Selector:         controller-uid=a6d33e91-8e81-450b-8d11-0ad5a2d08537
Labels:           controller-uid=a6d33e91-8e81-450b-8d11-0ad5a2d08537
                  job-name=throw-dice-job
Annotations:      batch.kubernetes.io/job-tracking: 
Parallelism:      3
Completions:      3
Completion Mode:  NonIndexed
Start Time:       Wed, 13 Jul 2022 09:55:11 +0000
Pods Statuses:    0 Active / 2 Succeeded / 7 Failed
Pod Template:
  Labels:  controller-uid=a6d33e91-8e81-450b-8d11-0ad5a2d08537
           job-name=throw-dice-job
  Containers:
   throw-dice-job-pod:
    Image:        kodekloud/throw-dice
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type     Reason                Age   From            Message
  ----     ------                ----  ----            -------
  Normal   SuccessfulCreate      45s   job-controller  Created pod: throw-dice-job-kd8w8
  Normal   SuccessfulCreate      45s   job-controller  Created pod: throw-dice-job-mbs9d
  Normal   SuccessfulCreate      45s   job-controller  Created pod: throw-dice-job-lf29l
  Normal   SuccessfulCreate      32s   job-controller  Created pod: throw-dice-job-h6v84
  Normal   SuccessfulCreate      29s   job-controller  Created pod: throw-dice-job-zc5vf
  Normal   SuccessfulCreate      26s   job-controller  Created pod: throw-dice-job-5dd24
  Normal   SuccessfulCreate      23s   job-controller  Created pod: throw-dice-job-nz8dz
  Normal   SuccessfulCreate      21s   job-controller  Created pod: throw-dice-job-4vk7l
  Normal   SuccessfulCreate      16s   job-controller  Created pod: throw-dice-job-2hr7b
  Warning  BackoffLimitExceeded  12s   job-controller  Job has reached the specified backoff limit

Let us now schedule that job to run at 21:30 hours every day.

Create a CronJob for this

apiVersion: batch/v1
kind: CronJob
metadata:
 name: throw-dice-cron-job
spec:
  schedule: 30 21 * * *
  jobTemplate:
    spec:
      parallelism: 1  # default
      completions: 1  # default
      backoffLimit: 6  # default
      template:
        spec:
          containers:
            - image: kodekloud/throw-dice
              name: throw-dice-cron-job-container
          restartPolicy: Never