ADR-0009: Priority dispatch

Context

Dispatch order was pure FIFO: the claim in internal/repos/postgres/jobs.go (ClaimBatch) picked the oldest dispatchable rows. FIFO is the right default, but it has no answer for "this job matters more" — a password-reset email should not sit behind ten thousand newsletter sends on the same topic. Splitting topics per urgency pushes queue topology into every producer and worker; what's wanted is: same topic, same workers, urgent jobs first.

Constraints:

The claim path is hot. Every connected worker's stream polls it every 500ms (dispatchInterval, internal/transport/grpc/dispatcher.go). Whatever implements priority must not turn the claim into a backlog-proportional scan.
Backward compatible. Existing producers submit without a priority and must behave exactly as before.
No second queue. One jobs table, one claim statement, one code path — the property the design leans on (disjoint FOR UPDATE SKIP LOCKED batches per worker) must survive intact.

Decision

Priority is a column, and ordering lives inside the claim.

Schema: priority int NOT NULL DEFAULT 0 on jobs (db/migrations/0001_CreateJobs.go). Higher dispatches first; negative values de-prioritize. The default makes the feature invisible to anyone not using it — every pre-existing submit is priority 0, and among equal priorities order is FIFO, so unprioritized behavior is exactly the old behavior.
Ordering inside the claim: the ClaimBatch subselect orders ORDER BY priority DESC, submitted_at ASC before LIMIT @limit FOR UPDATE SKIP LOCKED (internal/repos/postgres/jobs.go). Ties within a priority stay FIFO by submitted_at. Because the ordering sits inside the single UPDATE that locks and leases the rows, priority holds under concurrency for free: workers still receive disjoint batches, just cut from the top of the priority order instead of the front of the arrival order.
Index-backed, O(batch): idx_jobs_claim is a partial index on (topic, priority DESC, submitted_at ASC) WHERE status IN ('PENDING','RETRYING') (db/migrations/0001_CreateJobs.go). The index key order is the claim's sort order over exactly the claim's candidate rows, so the top-k rows come straight off the index and the scan stops — a claim costs O(batch size) regardless of backlog depth. Priority added no sort to the hot path; it changed the index definition.
SDK surface: a variadic option, not a signature change. pulse.Enqueue(ctx, p, "send-email", args, pulse.WithPriority(10)) — WithPriority is a SubmitOption that sets the field on the wire request (job.go at the repo root). Call sites that don't care don't change.
Semantics: strict priority. A priority-10 job always beats a priority-0 job. There is no fairness mechanism: a sustained stream of high-priority work starves lower priorities indefinitely. Accepted and documented (see Consequences); aging is deferred until starvation is observed rather than imagined.
Retries keep their priority. A failed job keeps its priority column value; Fail only moves next_run_at forward by the attempts² backoff. A retried high-priority job therefore re-enters the queue at its original priority, gated by the claim's next_run_at <= now() predicate — backoff delays it, but once due it still outranks lower-priority work. Priority and backoff compose without touching each other.

Alternatives considered

Per-priority topics (`email.high`, `email.low`)

Encode priority in the topic name. Rejected: priority becomes topology. Every producer must pick the right topic variant, every worker must subscribe to all of them, and there is no ordering across variants — a worker draining email.high and email.low has to reimplement precedence in the consumer, badly. One column keeps ordering where it is enforced: in the claim.

Mutable re-prioritization (bump a queued job)

Expose an admin API that updates a pending job's priority. Deferred: the storage model already supports it — the next claim simply reads the new value, no queue restructure needed. What's missing is API and audit surface (who bumped what, why), which ships when someone needs it rather than speculatively.

Weighted-fair queuing / priority aging

Give lower priorities a guaranteed share, or grow effective priority with wait time. Deferred: both trade away the property that makes strict priority explainable — "higher always wins" is one sentence; a weight schedule or aging curve is a knob nobody can predict. Aging also breaks the index trick: effective priority becomes a function of now(), so the claim can no longer be a top-k read of a precomputed order. Starvation is accepted until it is a measured problem, and that acceptance is written down here.

Consequences

Positive

Urgent work overtakes bulk work on the same topic with no new tables, loops, or processes — the whole feature is a column, an ORDER BY, an index, and an SDK option.
Hot-path cost unchanged: still one statement, still top-k off one partial index, still O(batch).
Fully backward compatible; priority 0 is the old FIFO exactly.
Scheduler-spawned and retried jobs compose cleanly: they are ordinary rows, ordered by the same claim.

Negative / costs accepted

Starvation is real. Under sustained high-priority load, priority-0 jobs wait unboundedly. Strict priority was chosen for explainability; today's mitigation is operational (don't mark everything urgent), not mechanical.
Priority is fixed at submit time — no re-prioritization API yet.
Priority is a bare int with no named tiers; producers must agree on conventions.
A multi-topic worker drains the best priority across all its subscribed topics (the claim filters topic = ANY(@topics) then orders globally) — intended, but worth knowing when reasoning about a single topic's latency.