Platform Guide
Your CronJobs fail silently.
kubectl won't tell you a job didn't run.
CronDoctor detects missed schedules, reads pod error output, and diagnoses root causes — OOMKilled, ImagePullBackOff, CrashLoopBackOff — with a suggested fix. One YAML manifest to set up.
This is what you get when a CronJob fails
Pod terminated with OOMKilled after 4 restarts in 1h. Memory limit 512Mi exceeded.
kubectl set resources deployment/data-pipeline -c data-pipeline --limits=memory=1GiKubernetes tells you the pod restarted. CronDoctor tells you why — and what to change.
What Kubernetes monitoring misses about CronJobs
Missed schedules
A CronJob didn’t fire at all. No pod was created, so there’s nothing to alert on. Prometheus only sees pods that exist.
OOMKilled loops
The pod runs, gets OOMKilled, restarts, gets OOMKilled again. Kubernetes keeps trying. You find out Monday morning.
ImagePullBackOff
Someone pushed a bad tag. The CronJob creates a pod but it can never start. The schedule keeps firing into a wall.
Silent duration drift
Your job used to take 30 seconds. Now it takes 5 minutes. It still “succeeds” so nobody notices until it starts timing out.
One manifest. Full CronJob observability.
Add the signal pattern to your CronJob spec. CronDoctor gets start/end/fail signals with error output — no agent, no DaemonSet, no SDK.
apiVersion: batch/v1
kind: CronJob
metadata:
name: my-job
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: my-job
image: your-image:latest
command:
- /bin/sh
- -c
- |
curl -sf --max-time 10 https://crondoctor.com/api/v1/ping/YOUR_ID/start || true
/path/to/your-job.sh 2>/tmp/stderr.log
EXIT=$?
if [ $EXIT -eq 0 ]; then
curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/end \
-H "Content-Type: application/json" \
-d '{"exit_code":0}' || true
else
curl -sf --max-time 10 -X POST https://crondoctor.com/api/v1/ping/YOUR_ID/fail \
-H "Content-Type: application/json" \
-d "{\"exit_code\":$EXIT,\"stderr\":\"$(tail -20 /tmp/stderr.log)\"}" || true
fiSet restartPolicy: Never to prevent Kubernetes from restarting failed jobs — let CronDoctor alert you instead. See all integration examples →
Real Kubernetes failures CronDoctor diagnoses
payment-service OOMKilled 4 times in 1h. Memory limit 512Mi too low for current workload.
auth-service image tag v2.14.0 not found in registry. Deployment references wrong tag.
data-pipeline completed in 12m — 4x your P95 baseline of 3m. Something is degrading.
First CronJob free. $2/month for each additional. See pricing →
The ping endpoint never breaks your CronJob
No DaemonSet, no Helm chart, no cluster-level access