ADR-0005: Jobs-table dispatch — streaming poll + batch claim
Context
Handlers run in the developer's process, not ours; the server must get ready jobs from
the jobs table into N worker processes, with no job ever running on two workers because of
a dispatch race. Postgres is the only infrastructure (ADR-0002), so "delivery" has to be
built from the database plus gRPC.
The previous per-job optimistic-claim design was benchmarked and failed: with 20 workers racing for the same jobs, claim conflicts reached 17 per job, throughput anti-scaled with worker count, and O(backlog) polling collapsed outright at a 1M-job backlog. Whatever replaces it has to make contention structurally impossible and keep a dispatch tick's cost independent of backlog depth.
The remaining tension is latency vs. infrastructure: a broker pushes instantly but is a second system to run; polling Postgres gives bounded-but-nonzero latency for zero extra infrastructure. For a job queue (not an RPC bus), sub-second is fine.
Decision
- Per-worker server-streaming poll. A worker calls
StreamJobs(topics, worker_id)and holds the stream open; the handler adaptsstream.Sendinto anassignmentSinkand hands control to adispatcher(internal/transport/grpc/dispatcher.go) that ticks every 500ms (dispatchInterval). Each tick: check the pause gate (gate.Paused()— early return, ADR-0010), then claim → send. The connected stream is worker presence — no registration table; disconnect and the loop dies with the stream's context. - Batch claim with
FOR UPDATE SKIP LOCKED— contention-free by construction.ClaimBatch(internal/repos/postgres/jobs.go) is a single UPDATE over a subselect: pick the topdispatchBatch(100,internal/service/job.go) dispatchable rows in priority-then-FIFO order via the partialidx_jobs_claimindex,FOR UPDATE SKIP LOCKED, and mark themRUNNINGwithlocked_by = worker,lease_until = now() + ttl,attempts + 1.SKIP LOCKEDmeans concurrent workers get disjoint batches — zero conflicts, zero retries, no arbitration code above SQL. The partial index makes the tick O(batch) regardless of backlog depth. - Claim before send. The row is
RUNNINGand leased before the assignment goes on the wire. This is what makes "no job on two workers" a database property rather than a protocol property: by the time anything is sent, exactly one worker owns the row. - The lease covers claimed-but-unsent. If
stream.Sendfails after the claim, the job isRUNNINGwith a lease and no worker — no special cleanup path exists or is needed. The watchdog (ADR-0007) reaps it when the lease lapses, routing it through the normal retry path (ADR-0006). - Error policy per class (
dispatchReady):- claim error → keep the loop (
return nil): a transient DB blip should not kill a healthy worker's stream; retry next tick. - sink error → kill the loop (
return err): the stream itself is broken, so the loop has nothing left to do; the claimed-but-unsent jobs are covered by decision 4. Only the failure that actually invalidates the stream ends the stream.
- claim error → keep the loop (
- Heartbeat and ReportResult stay separate unary RPCs
(
internal/transport/grpc/consumer.go), each with its own deadline, retry, and failure domain — the stream remains a one-way assignment channel. Assignments carryjob_id+attempt(assignmentFrom), the worker-side dedup key for the at-least-once contract.
Alternatives considered
Per-job optimistic claims (the previous design)
Each worker listed pending jobs and raced a compare-and-set claim per job. Measured, not
argued: up to 17 claim conflicts per job at 20 workers, throughput that fell as workers
were added, and per-tick work proportional to the whole backlog (collapse at 1M jobs).
SKIP LOCKED batches deliver the same safety with zero conflicts because workers never see
each other's rows. Rejected on the numbers.
Push dispatch via a broker (NATS) now
Instant dispatch, no poll. Deferred, not rejected: a second piece of infrastructure and a second set of delivery semantics (acks, redelivery, consumer groups) to reconcile with the jobs table's claim logic — real cost, bought to fix a latency floor of one 500ms tick that no current use case can feel. Revisit when the floor is the measured problem; the jobs table stays authoritative either way, so the broker slots in as a delivery optimization.
Unary long-poll (GetNextJob)
N workers hammering a unary RPC means per-request overhead, awkward connection economics, and no server-side per-worker loop to hang policy on — no natural place for the pause gate or a future push fan-out point. The stream gives each worker one long-lived, flow-controlled channel and gives the server a loop that owns that worker's dispatch.
One bidirectional stream multiplexing assignments + heartbeats + results
Fewer RPCs, but it couples lifecycles that must fail independently: one stalled direction stalls the whole stream, a stream death loses in-flight results and liveness in one blow, and every message type shares one deadline and one retry story. Separate unary RPCs keep each concern independently retryable.
Consequences
- Zero contention by construction. Disjoint batches replace conflict-and-retry; smoke-tested at 2000 jobs / 8 workers: 1223 jobs/s end-to-end, 0 rollbacks, ~1.6 DB transactions per job.
- Backlog-proof ticks. The partial-index claim is O(batch); a million-job backlog costs the same per tick as a hundred-job one.
- Failure is legible. Dead stream = worker gone; claimed-but-unsent and claimed-but-worker-died collapse into one recovery mechanism, the lease.
- Latency floor. A submitted job waits up to one 500ms tick — the accepted cost of no broker; tunable, but polling never reaches push latency.
- DB load scales with connected workers, not backlog: one claim query per worker per tick, even when the queue is empty.