Heartbeat Monitoring¶
PgQueuer's automatic heartbeat mechanism ensures active jobs are continuously monitored for liveness.
How It Works¶
While a job is in the picked state, the QueueManager periodically updates a heartbeat
timestamp on the job record. This signals that the job is still actively being processed.
- Periodic updates: The heartbeat timestamp is refreshed at a configurable interval.
- Stall detection: External monitoring can compare
heartbeatagainstNOW()to identify stalled or hung jobs. - Resource management: Prevents unresponsive jobs from holding locks indefinitely, enabling external supervisors to detect and handle stuck workers.
Stall Detection Pattern¶
You can query for stalled jobs directly in PostgreSQL:
-- Jobs that haven't updated their heartbeat in the last 5 minutes
SELECT id, entrypoint, status, heartbeat
FROM pgqueuer
WHERE status = 'picked'
AND heartbeat < NOW() - INTERVAL '5 minutes';
Retry Timer¶
The retry_timer parameter on the @entrypoint() decorator sets an interval after which
jobs with a stale heartbeat are eligible to be re-picked by any available worker. This
enables automatic recovery from crashed or stalled workers:
from datetime import timedelta
@pgq.entrypoint("my_task", retry_timer=timedelta(minutes=5))
async def my_task(job: Job) -> None:
await do_work(job.payload)
With retry_timer set, a job that stops updating its heartbeat for the specified duration
will be retried by the next available worker.
Note
The default retry_timer is 0 (disabled). Set it per entrypoint to match your
expected maximum job runtime plus a safety margin to avoid prematurely re-queuing
legitimately long-running jobs.