← Back to blog
·6 min read

Cron Job Monitoring Best Practices in 2026

Most cron job failures go undetected for hours or days. A missed backup, a stalled ETL pipeline, a broken deployment script — by the time someone notices, the damage is done. Here are seven practices that prevent that.

1. Always ping after success, never before

The most common mistake: adding the monitoring ping at the start of the job.

Wrong:

bash
#!/bin/bash
curl -s https://api.getcronsafe.com/ping/abc123
pg_dump mydb > /backups/mydb.sql  # this could fail

Right:

bash
#!/bin/bash
pg_dump mydb > /backups/mydb.sql
if [ $? -eq 0 ]; then
  curl -s https://api.getcronsafe.com/ping/abc123
fi

The ping should confirm that work was completed, not that it was attempted. If your job crashes halfway through, the missing ping triggers an alert.

2. Use exit codes religiously

Every script should exit with a meaningful code. Zero means success. Anything else means failure.

bash
#!/bin/bash
set -euo pipefail  # exit on any error

pg_dump mydb > /backups/mydb.sql
aws s3 cp /backups/mydb.sql s3://my-bucket/backups/

# only reached if both commands succeed
curl -s https://api.getcronsafe.com/ping/abc123

set -euo pipefail is critical. Without it, bash continues executing after errors. Your backup could fail but the ping still fires because the script kept running.

In Python:

python
import sys
import requests

def main():
    try:
        run_etl()
        requests.get("https://api.getcronsafe.com/ping/abc123", timeout=5)
    except Exception as e:
        print(f"Job failed: {e}", file=sys.stderr)
        sys.exit(1)

if __name__ == "__main__":
    main()

3. Set grace periods for variable-duration jobs

A backup that usually takes 2 minutes might take 20 minutes after a large import. Without a grace period, you'll get a false alert every time the job runs long.

Set your grace period to 2-3x the maximum expected duration. CronSafe lets you configure this per monitor:

  • Job takes 1-5 minutes → set grace period to 15 minutes
  • Job takes 10-30 minutes → set grace period to 1 hour
  • Job runs hourly → set grace period to 30 minutes past the expected time

If the ping doesn't arrive within the expected schedule plus grace period, then you get alerted.

4. Detect overlapping runs

Overlapping cron jobs are a silent killer. If your hourly data sync takes 90 minutes, you'll have two instances running simultaneously. This causes:

  • Database locks and deadlocks
  • Duplicate data processing
  • Memory exhaustion
  • Corrupted output files

CronSafe has built-in overlap detection. When you send a start ping and an end ping, it tracks whether the previous run finished before the next one began:

bash
#!/bin/bash
# Signal job start
curl -s https://api.getcronsafe.com/ping/abc123/start

# Do the work
python3 /opt/etl/sync_orders.py

# Signal job end
curl -s https://api.getcronsafe.com/ping/abc123

If a new /start arrives before the previous run's completion ping, CronSafe flags the overlap and alerts you.

5. Send job output with your pings

When a job fails, the first question is always "what happened?" If you send the output with your ping, you can see the answer immediately in your alert — no need to SSH into the server and dig through logs.

bash
#!/bin/bash
OUTPUT=$(pg_dump mydb 2>&1)
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
  curl -s -X POST https://api.getcronsafe.com/ping/abc123 \
    -d "$OUTPUT"
else
  curl -s -X POST https://api.getcronsafe.com/ping/abc123/fail \
    -d "$OUTPUT"
fi

CronSafe includes this output in your Slack/Discord/email alerts, so you can diagnose without leaving your notification channel.

6. Set up escalating reminders

A single alert is easy to miss. Your Slack channel has 200 unread messages. Your phone was on silent. The email went to spam.

Configure escalating reminders that re-alert at increasing intervals:

  • First alert: immediate (within 1 minute of missed ping)
  • Second alert: 1 hour later
  • Third alert: 6 hours later
  • Fourth alert: 24 hours later

CronSafe sends these automatically on the Pro plan. This ensures that a missed backup on Friday night doesn't go unnoticed until Monday morning.

7. Monitor the critical path, not everything

Not all cron jobs are equal. Focus your monitoring on jobs where failure has real consequences:

Always monitor:

  • Database backups
  • Payment processing jobs
  • Data sync between systems
  • Certificate renewal scripts
  • Security scan jobs

Nice to monitor:

  • Log rotation
  • Cache warming
  • Report generation
  • Cleanup scripts

Start with the critical jobs. CronSafe's free tier gives you 20 monitors — that's enough for most teams' critical paths. Expand to Pro when you need more.

Common mistakes to avoid

Mistake: Redirecting all output to /dev/null

bash
# Don't do this — you lose all debugging info
*/5 * * * * /opt/scripts/sync.sh > /dev/null 2>&1

Instead, log to a file and send output with your ping:

bash
*/5 * * * * /opt/scripts/sync.sh >> /var/log/sync.log 2>&1

Mistake: Not monitoring the monitoring

If your server goes down, it can't send pings. That's fine — the absence of pings is what triggers the alert. But make sure your alert channels themselves are working. Send a test alert to each channel when you set it up.

Mistake: Setting the schedule too tight

If your job runs "every 5 minutes" but occasionally takes 6 minutes, you'll get constant false alerts. Always add a buffer. CronSafe's grace period feature exists specifically for this.

Putting it all together

A well-monitored cron job looks like this:

bash
#!/bin/bash
set -euo pipefail

curl -s https://api.getcronsafe.com/ping/abc123/start
OUTPUT=$(/opt/scripts/process_orders.sh 2>&1)
curl -s -X POST https://api.getcronsafe.com/ping/abc123 -d "$OUTPUT"

Five lines. Start ping, work, end ping with output. That's all it takes to go from "we didn't know it was broken" to "we knew within 60 seconds."

Start monitoring your cron jobs for free

20 monitors, email alerts, GitHub badges. No credit card required.

Get started free →