Cron Job Monitoring Best Practices in 2026
Most cron job failures go undetected for hours or days. A missed backup, a stalled ETL pipeline, a broken deployment script — by the time someone notices, the damage is done. Here are seven practices that prevent that.
1. Always ping after success, never before
The most common mistake: adding the monitoring ping at the start of the job.
Wrong:
#!/bin/bash
curl -s https://api.getcronsafe.com/ping/abc123
pg_dump mydb > /backups/mydb.sql # this could failRight:
#!/bin/bash
pg_dump mydb > /backups/mydb.sql
if [ $? -eq 0 ]; then
curl -s https://api.getcronsafe.com/ping/abc123
fiThe ping should confirm that work was completed, not that it was attempted. If your job crashes halfway through, the missing ping triggers an alert.
2. Use exit codes religiously
Every script should exit with a meaningful code. Zero means success. Anything else means failure.
#!/bin/bash
set -euo pipefail # exit on any error
pg_dump mydb > /backups/mydb.sql
aws s3 cp /backups/mydb.sql s3://my-bucket/backups/
# only reached if both commands succeed
curl -s https://api.getcronsafe.com/ping/abc123set -euo pipefail is critical. Without it, bash continues executing after errors. Your backup could fail but the ping still fires because the script kept running.
In Python:
import sys
import requests
def main():
try:
run_etl()
requests.get("https://api.getcronsafe.com/ping/abc123", timeout=5)
except Exception as e:
print(f"Job failed: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()3. Set grace periods for variable-duration jobs
A backup that usually takes 2 minutes might take 20 minutes after a large import. Without a grace period, you'll get a false alert every time the job runs long.
Set your grace period to 2-3x the maximum expected duration. CronSafe lets you configure this per monitor:
- •Job takes 1-5 minutes → set grace period to 15 minutes
- •Job takes 10-30 minutes → set grace period to 1 hour
- •Job runs hourly → set grace period to 30 minutes past the expected time
If the ping doesn't arrive within the expected schedule plus grace period, then you get alerted.
4. Detect overlapping runs
Overlapping cron jobs are a silent killer. If your hourly data sync takes 90 minutes, you'll have two instances running simultaneously. This causes:
- •Database locks and deadlocks
- •Duplicate data processing
- •Memory exhaustion
- •Corrupted output files
CronSafe has built-in overlap detection. When you send a start ping and an end ping, it tracks whether the previous run finished before the next one began:
#!/bin/bash
# Signal job start
curl -s https://api.getcronsafe.com/ping/abc123/start
# Do the work
python3 /opt/etl/sync_orders.py
# Signal job end
curl -s https://api.getcronsafe.com/ping/abc123If a new /start arrives before the previous run's completion ping, CronSafe flags the overlap and alerts you.
5. Send job output with your pings
When a job fails, the first question is always "what happened?" If you send the output with your ping, you can see the answer immediately in your alert — no need to SSH into the server and dig through logs.
#!/bin/bash
OUTPUT=$(pg_dump mydb 2>&1)
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
curl -s -X POST https://api.getcronsafe.com/ping/abc123 \
-d "$OUTPUT"
else
curl -s -X POST https://api.getcronsafe.com/ping/abc123/fail \
-d "$OUTPUT"
fiCronSafe includes this output in your Slack/Discord/email alerts, so you can diagnose without leaving your notification channel.
6. Set up escalating reminders
A single alert is easy to miss. Your Slack channel has 200 unread messages. Your phone was on silent. The email went to spam.
Configure escalating reminders that re-alert at increasing intervals:
- •First alert: immediate (within 1 minute of missed ping)
- •Second alert: 1 hour later
- •Third alert: 6 hours later
- •Fourth alert: 24 hours later
CronSafe sends these automatically on the Pro plan. This ensures that a missed backup on Friday night doesn't go unnoticed until Monday morning.
7. Monitor the critical path, not everything
Not all cron jobs are equal. Focus your monitoring on jobs where failure has real consequences:
Always monitor:
- •Database backups
- •Payment processing jobs
- •Data sync between systems
- •Certificate renewal scripts
- •Security scan jobs
Nice to monitor:
- •Log rotation
- •Cache warming
- •Report generation
- •Cleanup scripts
Start with the critical jobs. CronSafe's free tier gives you 20 monitors — that's enough for most teams' critical paths. Expand to Pro when you need more.
Common mistakes to avoid
Mistake: Redirecting all output to /dev/null
# Don't do this — you lose all debugging info
*/5 * * * * /opt/scripts/sync.sh > /dev/null 2>&1Instead, log to a file and send output with your ping:
*/5 * * * * /opt/scripts/sync.sh >> /var/log/sync.log 2>&1Mistake: Not monitoring the monitoring
If your server goes down, it can't send pings. That's fine — the absence of pings is what triggers the alert. But make sure your alert channels themselves are working. Send a test alert to each channel when you set it up.
Mistake: Setting the schedule too tight
If your job runs "every 5 minutes" but occasionally takes 6 minutes, you'll get constant false alerts. Always add a buffer. CronSafe's grace period feature exists specifically for this.
Putting it all together
A well-monitored cron job looks like this:
#!/bin/bash
set -euo pipefail
curl -s https://api.getcronsafe.com/ping/abc123/start
OUTPUT=$(/opt/scripts/process_orders.sh 2>&1)
curl -s -X POST https://api.getcronsafe.com/ping/abc123 -d "$OUTPUT"Five lines. Start ping, work, end ping with output. That's all it takes to go from "we didn't know it was broken" to "we knew within 60 seconds."
Start monitoring your cron jobs for free
20 monitors, email alerts, GitHub badges. No credit card required.
Get started free →