Platform Guide

Your CronJobs fail silently.
kubectl won't tell you a job didn't run.

CronDoctor detects missed schedules, reads pod error output, and diagnoses root causes — OOMKilled, ImagePullBackOff, CrashLoopBackOff — with a suggested fix. One YAML manifest to set up.

Start free — no credit card

This is what you get when a CronJob fails

CronDoctor

k8s/data-pipelineCRITICALAI diagnosed

Pod terminated with OOMKilled after 4 restarts in 1h. Memory limit 512Mi exceeded.

The container’s memory limit (512Mi) is too low for the current workload. Memory usage has been trending upward since the dataset grew. The pod gets OOMKilled, Kubernetes restarts it, and it gets killed again.

Suggested fix: kubectl set resources deployment/data-pipeline -c data-pipeline --limits=memory=1Gi

91% confidence

Kubernetes tells you the pod restarted. CronDoctor tells you why — and what to change.

What Kubernetes monitoring misses about CronJobs

Missed schedules

A CronJob didn’t fire at all. No pod was created, so there’s nothing to alert on. Prometheus only sees pods that exist.

OOMKilled loops

The pod runs, gets OOMKilled, restarts, gets OOMKilled again. Kubernetes keeps trying. You find out Monday morning.

ImagePullBackOff

Someone pushed a bad tag. The CronJob creates a pod but it can never start. The schedule keeps firing into a wall.

Silent duration drift

Your job used to take 30 seconds. Now it takes 5 minutes. It still “succeeds” so nobody notices until it starts timing out.

One manifest. Full CronJob observability.

Add the signal pattern to your CronJob spec. CronDoctor gets start/end/fail signals with error output — no agent, no DaemonSet, no SDK.

# Kubernetes CronJob YAML

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-job
spec:
  schedule: "0 3 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
            - name: my-job
              image: your-image:latest
              command:
                - /bin/sh
                - -c
                - |
                  curl -sf --max-time 10 https://crondoctor.com/api/v1/ping/YOUR_ID/start || true
                  /path/to/your-job.sh 2>/tmp/stderr.log
                  EXIT=$?
                  if [ $EXIT -eq 0 ]; then
                    curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/end \
                      -H "Content-Type: application/json" \
                      -d '{"exit_code":0}' || true
                  else
                    curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/fail \
                      -H "Content-Type: application/json" \
                      -d "{\"exit_code\":$EXIT,\"stderr\":\"$(tail -20 /tmp/stderr.log)\"}" || true
                  fi

Set restartPolicy: Never to prevent Kubernetes from restarting failed jobs — let CronDoctor alert you instead. See all integration examples →

Real Kubernetes failures CronDoctor diagnoses

CRITICALOOMKilled

payment-service OOMKilled 4 times in 1h. Memory limit 512Mi too low for current workload.

CRITICALImagePullBackOff

auth-service image tag v2.14.0 not found in registry. Deployment references wrong tag.

WARNINGSilent duration drift

data-pipeline completed in 12m — 4x your P95 baseline of 3m. Something is degrading.

First CronJob free. $2/month for each additional. See pricing →

Always returns 200

The ping endpoint never breaks your CronJob

No agent required

No DaemonSet, no Helm chart, no cluster-level access

Add one YAML manifest. See your first Kubernetes diagnosis in minutes.