pulse
Design decisions (ADRs)

ADR-0002: Overall architecture — one server, one table, an SDK

Context

pulse needs a shape before it needs features: where does state live, where does user code run, how many deployable things are there, and what infrastructure may it assume? The guiding constraints:

  • A developer should integrate with one URL and one imported package — no broker to install, no sidecar, no config sprawl.
  • The server must never execute user code — handlers are the developer's, in the developer's process, with their own dependencies and secrets.
  • Every architectural dollar should buy a demonstrable distributed-systems property (correct concurrency, crash recovery, honest semantics) — not deployment topology.

Decision

Four shape decisions, each with a one-line reason:

  1. One deployable server (pulsed) + a client SDK — not microservices. All server-side concerns (dispatch, watchdog, scheduler, pause control) are goroutines in one process (cmd/pulsed/main.go). The interesting distribution problem in a job system is server ↔ many workers, not server ↔ server; splitting the server would add network boundaries with no property gained.
  2. Postgres is the only infrastructure, and one jobs table is the core of it. A job's state, retry policy, and worker lease are columns on a single row (db/migrations/0001_CreateJobs.go); schedules and the pause switch are two more small tables. Coordination uses exactly two Postgres primitives: FOR UPDATE SKIP LOCKED for contention-free batch claims, and guarded UPDATEs — every state transition is one UPDATE ... WHERE status = ..., and a guard that matches no row surfaces as domain.ErrInvalidTransition (internal/repos/postgres/jobs.go). No broker, no Redis, no coordination service until a measured limit demands one (ADR-0005 defers push/NATS on exactly these grounds).
  3. gRPC + a typed SDK is the only boundary. One proto surface (proto/pulse/v1/pulse.proto): SubmitJob/GetJob for producers, StreamJobs/ReportResult/Heartbeat for workers. The SDK is the root package — pulse.New, one connection, both producer and consumer roles; Register/Enqueue keep protobuf out of the developer's code. The server only ever ships data across the wire — handlers run in the developer's process, never in pulsed. Error mapping lives in interceptors (internal/transport/grpc/errors.goServerOptions()), so handlers just return err and production and the bufconn test harness share identical wire behaviour.
  4. Ports-and-adapters layering inside the server: transport → service → repos (ports) → domain, with Postgres implementing the ports (internal/repos/postgres/). The domain holds types and invariant errors; services own orchestration and the background loops; transport translates only. Every port has a gomock (internal/service/mocks/); integration tests hit real Postgres behind TEST_DB_URL.

Alternatives considered

Broker-centric architecture (NATS at the core, from day one)

A broker gives push dispatch and fan-out — but it becomes a second source of truth beside the jobs table, owns delivery semantics we'd have to reconcile with the claim logic, and adds an operational dependency for every adopter. Deferred, not rejected: the jobs table stays authoritative, so a broker can later become a delivery optimization (ADR-0005's revisit path) without re-architecting.

Split services (API / dispatcher / scheduler as separate deployables)

Buys independent scaling and blast-radius isolation; costs inter-service contracts, shared-DB coupling or an internal bus, and N deploy targets. Nothing in pulse's problem needs it — the loops already scale out correctly as replicas of the whole server: claims are disjoint by SKIP LOCKED, schedule fires are deduplicated by deterministic job ids plus a CAS advance, and the pause switch converges through dispatch_control. N instances of pulsed are safe by construction.

Library-only (embed pulse in the app, no server)

Simplest possible adoption (like asynq-as-a-library), but then every app process runs watchdog/scheduler loops, config drifts per-app, and there is no single admin surface (pause, schedules, dead-letter ops). The server is what makes pulse operable.

REST/JSON instead of gRPC

Broader reach, but hand-written clients, no streaming for dispatch, no generated types. The SDK is the product surface — codegen and streaming won (ADR-0005 details).

Consequences

  • One binary + one database = one-command local run and trivial CI; every distributed property is demonstrable with docker-compose and two terminals.
  • N pulsed replicas are safe by construction — every arbitration (claims, schedule fires, pause) happens in Postgres, none in process memory. Scale-out was designed in, not bolted on.
  • Postgres is the ceiling: poll-based dispatch latency and per-worker query load are the costs (a claim tick is O(batch) via the partial idx_jobs_claim index, so backlog depth doesn't matter, but worker count sets the query rate). The smoke run — 2000 jobs across 8 workers at 1223 jobs/s end-to-end, 0 rollbacks, ~1.6 DB transactions per job — says the ceiling is far off; the answer when it arrives is the deferred broker.
  • The layering keeps the arbitration auditable: everything concurrency-critical is a handful of SQL statements in internal/repos/postgres/jobs.go, testable against a real database, mockable everywhere else.
  • Handlers-in-your-process means pulse never needs sandboxing, dependency injection into the server, or a plugin story — the sharpest scope cut in the design.

On this page