Durable Work in PostgresPart 1
The claim loop
Minimal Postgres claim loop: enqueue, SKIP LOCKED claim, complete, and handler retry with an interactive simulation.
A customer pays for order:9182. Your API updates the order row and must send a receipt. That second step cannot live only in a setTimeout or a fire-and-forget HTTP call. If the process dies, the work vanishes.
The durable version is to write a row that means “send receipt for order:9182.” A background worker claims it, runs your handler, and marks it done. Postgres holds the backlog, so crashes and deploys become recoverable events instead of silent data loss.
Why a table beats a thread pool
This is a durable work queue. It starts like a standard background job queue: persist a unit of work, let competing consumers claim it, deliver at least once, and make handlers safe to run twice. The same core loop also supports outbox relays, projections, webhook processing, and saga steps.
Here, Postgres is the queue and coordination layer. The enqueue happens in the same ACID commit as your business write. Claims use row locks instead of a broker dispatcher. The backlog is inspectable with SQL when work appears stalled.
| Postgres implementation | Established name | Typical source |
|---|---|---|
Queue table (inbox, outbox) | Message Channel | Enterprise Integration Patterns |
Many workers, SKIP LOCKED claim | Competing Consumers | Enterprise Integration Patterns |
partition_key plus hash ring | Partitioned consumption | EIP plus stream systems |
Lease plus claimed_by | Exclusive work ownership | Distributed systems |
idempotency_key, safe retries | Idempotent Receiver | Enterprise Integration Patterns |
dead_letter | Dead Letter Channel | Enterprise Integration Patterns |
| Outbox in the same transaction as domain write | Transactional Outbox | Microservices patterns |
What production asks of any queue
Before choosing a queue, write down the failures you must survive: a deploy loses jobs, a customer gets two emails, or a row sits in processing for an hour. The implementation needs an answer for each case.
- Accepted work must run: if the API returned success, the email, webhook, or ledger entry cannot disappear because a worker restarted mid-deploy.
- Related work stays together: everything for
order:9182should land on one worker so you can process that stream in order when ordering matters. - Scaling must not reshuffle everything: adding a worker should move roughly
1/Nof open rows, not every in-flight item. - Retries must not duplicate side effects: workers crash, leases expire, and webhooks retry. Handlers must tolerate running twice.
- Stuck and poison rows are handled: a crashed worker cannot hold rows in
processingforever.
Minimal claim loop
Start with one table, one claim query, and one worker process in a loop. For order:9182, the row means send receipt after checkout.
CREATE TABLE inbox (
id UUID PRIMARY KEY DEFAULT uuidv7(),
partition_key TEXT NOT NULL,
partition_bucket INT NOT NULL
CHECK (partition_bucket >= 0 AND partition_bucket < 1024),
payload JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
idempotency_key TEXT,
claimed_by TEXT,
lease_expires_at TIMESTAMPTZ,
lease_generation BIGINT NOT NULL DEFAULT 0,
attempts INT NOT NULL DEFAULT 0,
max_attempts INT NOT NULL DEFAULT 5,
available_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX idx_inbox_idempotency
ON inbox (idempotency_key) WHERE idempotency_key IS NOT NULL;
Enqueue: producer writes pending
Your API inserts a row in the same transaction as the business write when you can. For order:9182, the payload tells the worker what to run later.
INSERT INTO inbox (partition_key, partition_bucket, payload, idempotency_key)
VALUES (
'order:9182',
417,
'{"type":"send_receipt","order_id":9182}'::jsonb,
'receipt-9182-v1'
);
Claim: worker takes ownership
Claiming sets status = 'processing', stamps claimed_by, and sets lease_expires_at for crash recovery. Each claim bumps lease_generation so only the current owner can call complete.
WITH picked AS (
SELECT id FROM inbox
WHERE status = 'pending' AND available_at <= now()
ORDER BY created_at, id
LIMIT 25
FOR UPDATE SKIP LOCKED
)
UPDATE inbox i
SET status = 'processing',
claimed_by = $1,
lease_expires_at = now() + interval '90 seconds',
lease_generation = lease_generation + 1,
attempts = attempts + 1
FROM picked WHERE i.id = picked.id
RETURNING i.*;
Complete or retry on handler outcome
UPDATE inbox
SET status = 'completed'
WHERE id = $1
AND claimed_by = $2
AND lease_generation = $3
AND lease_expires_at > now()
AND status = 'processing';
UPDATE inbox
SET
status = 'pending',
claimed_by = NULL,
lease_expires_at = NULL,
available_at = now() + (LEAST(POWER(2, attempts)::int, 3600) * interval '1 second')
WHERE id = $1
AND claimed_by = $2
AND lease_generation = $3
AND status = 'processing';
The worker id plus lease_generation are a fence token. If rowCount is zero, this worker lost ownership and must not assume the side effect is complete.
Wire the loop in your language
db is a small inbox repository: each method runs one of the SQL snippets above against a Postgres connection pool.
const WORKER_ID = `${hostname}-${process.pid}`;
async function run() {
while (true) {
const batch = await db.claim(WORKER_ID);
if (batch.length === 0) {
await sleep(500);
continue;
}
for (const row of batch) {
try {
await handle(row);
await db.complete(row.id, WORKER_ID, row.lease_generation);
} catch (err) {
await db.failWithRetry(row.id, WORKER_ID, row.lease_generation, err);
}
}
}
}
Where this breaks down
- Hot polling: a 50ms idle loop on an empty queue will waste CPU and connection time. Back off or use
LISTEN. - Process death without lease cleanup: rows stuck in
processingneed leases and a sweeper that returns expired claims topending. - Exactly-once claims: this is at-least-once with guards. Do not promise finance-grade exactly-once without idempotent downstream stores.
Source
Use the article for explanation, then use these files when you want the complete SQL and TypeScript in one place.