Skip to content

Permanent jobs pin write capacity to primary, blocking horizontal scaling #83

@nicolasburtey

Description

@nicolasburtey

Problem

The job crate's architecture makes it difficult to scale horizontally because permanent (daemon-style) jobs hold tracker slots and write capacity on the primary database indefinitely.

How permanent jobs work today

Several consumers use infinite-loop jobs that never return from run():

async fn run(&self, mut current_job: CurrentJob) -> Result<JobCompletion, ...> {
    loop {
        self.do_work().await;
        select! {
            _ = current_job.shutdown_requested() => return Ok(RescheduleNow),
            _ = tokio::time::sleep(Duration::from_secs(60)) => {}
        }
    }
}

Examples from lana-bank:

Job Sleep interval Holds slot for
fetch-provider-price 300s Forever
self-custody-balance-sync 60s Forever
eod-process-manager N/A (awaits child jobs) Hours
sync-reports 10s Until Dagster run completes

Why this blocks scaling

1. Permanent jobs consume tracker slots

JobTracker::next_batch_size() returns None when running_jobs >= min_jobs (default 30). Each permanent job increments running_jobs via dispatch_job() and only decrements on Drop of JobDispatcher — which never happens while the job is looping. A handful of permanent jobs permanently reduce capacity for transient work.

2. All job infrastructure is write-bound to the primary

The job lifecycle is a write-side state machine on job_executions:

  • Polling: FOR UPDATE + UPDATE SET state = 'running'
  • Keep-alive: UPDATE SET alive_at = $1 WHERE poller_instance_id = $2 every job_lost_interval / 4 (75s default)
  • Lost-job detection: UPDATE SET state = 'pending' WHERE alive_at < threshold
  • Completion/reschedule: DELETE FROM job_executions or UPDATE SET state = 'pending'

None of these can be offloaded to a read replica. Permanent jobs keep all of these write paths active for their entire lifetime.

3. No pool isolation

CurrentJob holds a clone of the same PgPool used by the poller, keep-alive handler, lost-job detector, and (in lana-bank's case) HTTP request handlers. There's no way to give jobs a separate pool or direct read-only work to a replica.

4. Pollers block each other (fixed in #82)

The poll query uses FOR UPDATE without SKIP LOCKED, so concurrent pollers serialize on row locks instead of claiming disjoint job sets. PR #82 fixes this specific issue, but the deeper problems remain.

Impact

  • Adding process replicas helps distribute transient jobs but doesn't help with permanent ones — each replica still needs its own permanent job instances writing keep-alives to the primary
  • The primary database becomes the bottleneck: every running job (permanent or transient) generates periodic writes even when idle
  • Pool exhaustion risk: default pool size is 20, max concurrent jobs is 50, and permanent jobs hold their slots indefinitely

Proposed direction

Separate tracker accounting for daemon jobs

Permanent jobs should not count against min_jobs/max_jobs. A daemon: bool flag on the job type (or a separate DaemonTracker) would let the poller continue claiming transient work regardless of how many daemons are running.

Dedicated pool support

Allow JobSvcConfig to accept a second pool (or pool factory) so that:

  • Job infrastructure (polling, keep-alive, state transitions) uses the primary pool
  • Job business logic (CurrentJob::pool() / begin_op()) can optionally use a replica or separate pool
  • Callers can isolate job DB load from HTTP request handling

Reduce write pressure from idle jobs

The keep-alive handler updates alive_at for ALL running jobs owned by a poller instance in a single UPDATE ... WHERE poller_instance_id = $2. For permanent jobs that are just sleeping between cycles, these writes are pure overhead. Options:

  • Batch keep-alive writes less frequently for daemon-type jobs
  • Let permanent jobs opt out of keep-alive (they manage their own liveness)
  • Use advisory locks instead of row-level alive_at updates for long-lived jobs

Related: #82 (SKIP LOCKED fix for poller contention)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions