Kubernetes CronJob monitoring
Your CronJobs fail silently.
kubectl won't tell you a job didn't run.
CronDoctor detects missed schedules, reads pod error output, and diagnoses root causes — OOMKilled, ImagePullBackOff, CrashLoopBackOff — with a suggested fix. One YAML manifest to set up.
This is what you get when a CronJob fails
Pod terminated with OOMKilled after 4 restarts in 1h. Memory limit 512Mi exceeded.
kubectl set resources deployment/data-pipeline -c data-pipeline --limits=memory=1GiKubernetes tells you the pod restarted. CronDoctor tells you why — and what to change.
What Kubernetes monitoring misses about CronJobs
Missed schedules
A CronJob didn’t fire at all. No pod was created, so there’s nothing to alert on. Prometheus only sees pods that exist.
OOMKilled loops
The pod runs, gets OOMKilled, restarts, gets OOMKilled again. Kubernetes keeps trying. You find out Monday morning.
ImagePullBackOff
Someone pushed a bad tag. The CronJob creates a pod but it can never start. The schedule keeps firing into a wall.
Silent duration drift
Your job used to take 30 seconds. Now it takes 5 minutes. It still “succeeds” so nobody notices until it starts timing out.
One manifest. Full CronJob observability.
Add the signal pattern to your CronJob spec. CronDoctor gets start/end/fail signals with error output — no agent, no DaemonSet, no SDK.
apiVersion: batch/v1
kind: CronJob
metadata:
name: my-job
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: my-job
image: your-image:latest
command:
- /bin/sh
- -c
- |
curl -sf --max-time 10 https://crondoctor.com/api/v1/ping/YOUR_ID/start || true
/path/to/your-job.sh 2>/tmp/stderr.log
EXIT=$?
if [ $EXIT -eq 0 ]; then
curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/end \
-H "Content-Type: application/json" \
-d '{"exit_code":0}' || true
else
curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/fail \
-H "Content-Type: application/json" \
-d "{\"exit_code\":$EXIT,\"stderr\":\"$(tail -20 /tmp/stderr.log)\"}" || true
fiSet restartPolicy: Never to prevent Kubernetes from restarting failed jobs — let CronDoctor alert you instead. See all integration examples →
Real Kubernetes failures CronDoctor diagnoses
CoreDNS pods in CrashLoopBackOff — all DNS resolution failing in the cluster.
payment-service OOMKilled 4 times in 1h. Memory limit 512Mi too low for current workload.
auth-service image tag v2.14.0 not found in registry. Deployment references wrong tag.
2/3 api-gateway pods failing liveness probes. Not receiving traffic.
DNS returning stale IP for db-primary.internal. Record points to decommissioned host.
How it works
Add a sidecar curl
Add the signal pattern to your CronJob YAML. No DaemonSet, no agent, no Helm chart.
We learn what's normal
Adaptive baselines build automatically from your CronJob's history. Duration, frequency, exit patterns.
Get diagnosed alerts
Root cause, suggested kubectl command, and severity — not just “pod restarted.”
Works alongside Prometheus, Grafana, Datadog — CronDoctor covers the gap they miss for batch jobs.
Simple pricing
5 jobs · 7-day history
20 jobs · 30-day history
100 jobs · 90-day history
AI diagnosis included on every plan.
The ping endpoint never breaks your CronJob
No DaemonSet, no Helm chart, no cluster-level access
Your CronJob fails at 3:01 AM, you know by 3:02.
Every alert includes a diagnosis with kubectl commands to fix it.