← Back to blog
·5 min read

What Is Heartbeat Monitoring and Why Every Developer Needs It

If your cron job fails at 3 AM, how long until you notice? For most teams, the answer is "when a customer complains." Heartbeat monitoring fixes that.

The problem with traditional monitoring

Traditional monitoring checks if a service is up by pinging it from the outside. That works for web servers. It doesn't work for cron jobs, background workers, or scheduled tasks — because there's nothing to ping. These jobs run on a schedule, do their work, and exit. If they stop running, nothing changes from the outside. The server is still up. The process just... didn't start.

This is why cron failures are called silent failures. There's no error, no crash, no log entry. The job simply didn't run.

What is heartbeat monitoring?

Heartbeat monitoring flips the model. Instead of an external service checking your job, your job checks in with an external service. If it doesn't check in on time, you get alerted.

The pattern is also called a dead man's switch — borrowed from trains, where the driver must hold a lever. If the driver releases it (or becomes incapacitated), the train stops automatically.

Here's how it works:

1. You create a monitor with an expected schedule (e.g., "every 5 minutes") 2. Your job sends a ping (HTTP request) after completing successfully 3. If the ping doesn't arrive on time, the monitoring service alerts you

Adding heartbeat monitoring to any job

The implementation is a single HTTP request. Here's how it looks in every common language:

Bash (crontab)

bash
#!/bin/bash
# backup.sh — runs every hour via cron
pg_dump mydb > /backups/mydb_$(date +%Y%m%d_%H%M).sql

# Only ping if backup succeeded
if [ $? -eq 0 ]; then
  curl -s https://api.getcronsafe.com/ping/abc123
fi

Your crontab entry:

0 * * * * /home/deploy/backup.sh >> /var/log/backup.log 2>&1

Python

python
import requests

def run_etl_job():
    # your job logic here
    process_data()
    upload_results()

    # ping after success
    requests.get("https://api.getcronsafe.com/ping/abc123", timeout=5)

if __name__ == "__main__":
    run_etl_job()

Node.js

javascript
const https = require('https');

async function main() {
  await syncUsers();
  await cleanupExpiredSessions();

  // ping CronSafe
  https.get('https://api.getcronsafe.com/ping/abc123');
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

curl (inline in crontab)

If your job is simple enough, you can add the ping directly in crontab:

*/5 * * * * /usr/bin/my-job && curl -s https://api.getcronsafe.com/ping/abc123

The && ensures the ping only fires if the job exits with code 0.

Why heartbeat monitoring catches what other tools miss

HTTP uptime checks have a fundamental blind spot: they can only verify that something is responding. They can't verify that something happened.

Consider these scenarios that only heartbeat monitoring catches:

  • Cron daemon stopped — the server is up, but cron isn't running
  • Job removed from crontab — someone deleted it during a deploy
  • Job runs but fails silently — exits 0 but doesn't actually complete its work
  • Job takes too long — backups that used to take 10 minutes now take 3 hours
  • Job overlaps itself — the previous run hasn't finished when the next one starts

CronSafe detects all of these. You set the expected schedule, and if reality diverges from expectation, you get alerted via email, Slack, Discord, Telegram, or webhook.

What makes a good heartbeat monitoring setup

1. Ping after success, not before. Put your curl at the end of the job, after the critical work completes. If you ping at the start, you'll never know if the job actually finished.

2. Use exit codes. The && operator in bash ensures the ping only fires on success. In Python/Node, wrap your logic in try/catch and only ping in the success path.

3. Set appropriate grace periods. If your job usually takes 30 seconds but occasionally takes 2 minutes, set a grace period of 3 minutes. This prevents false alerts during slow runs.

4. Monitor the monitor. CronSafe sends escalating reminders at 1 hour, 6 hours, and 24 hours if a job stays down. This ensures alerts don't get lost in a noisy Slack channel.

Getting started

CronSafe is free for up to 20 monitors. Create your first monitor, add the curl to your job, and you'll get alerted within minutes if something breaks.

Start monitoring your cron jobs for free

20 monitors, email alerts, GitHub badges. No credit card required.

Get started free →