Problem
The job crate's architecture makes it difficult to scale horizontally because permanent (daemon-style) jobs hold tracker slots and write capacity on the primary database indefinitely.
How permanent jobs work today
Several consumers use infinite-loop jobs that never return from run():
async fn run(&self, mut current_job: CurrentJob) -> Result<JobCompletion, ...> {
loop {
self.do_work().await;
select! {
_ = current_job.shutdown_requested() => return Ok(RescheduleNow),
_ = tokio::time::sleep(Duration::from_secs(60)) => {}
}
}
}
Examples from lana-bank:
| Job |
Sleep interval |
Holds slot for |
fetch-provider-price |
300s |
Forever |
self-custody-balance-sync |
60s |
Forever |
eod-process-manager |
N/A (awaits child jobs) |
Hours |
sync-reports |
10s |
Until Dagster run completes |
Why this blocks scaling
1. Permanent jobs consume tracker slots
JobTracker::next_batch_size() returns None when running_jobs >= min_jobs (default 30). Each permanent job increments running_jobs via dispatch_job() and only decrements on Drop of JobDispatcher — which never happens while the job is looping. A handful of permanent jobs permanently reduce capacity for transient work.
2. All job infrastructure is write-bound to the primary
The job lifecycle is a write-side state machine on job_executions:
- Polling:
FOR UPDATE + UPDATE SET state = 'running'
- Keep-alive:
UPDATE SET alive_at = $1 WHERE poller_instance_id = $2 every job_lost_interval / 4 (75s default)
- Lost-job detection:
UPDATE SET state = 'pending' WHERE alive_at < threshold
- Completion/reschedule:
DELETE FROM job_executions or UPDATE SET state = 'pending'
None of these can be offloaded to a read replica. Permanent jobs keep all of these write paths active for their entire lifetime.
3. No pool isolation
CurrentJob holds a clone of the same PgPool used by the poller, keep-alive handler, lost-job detector, and (in lana-bank's case) HTTP request handlers. There's no way to give jobs a separate pool or direct read-only work to a replica.
4. Pollers block each other (fixed in #82)
The poll query uses FOR UPDATE without SKIP LOCKED, so concurrent pollers serialize on row locks instead of claiming disjoint job sets. PR #82 fixes this specific issue, but the deeper problems remain.
Impact
- Adding process replicas helps distribute transient jobs but doesn't help with permanent ones — each replica still needs its own permanent job instances writing keep-alives to the primary
- The primary database becomes the bottleneck: every running job (permanent or transient) generates periodic writes even when idle
- Pool exhaustion risk: default pool size is 20, max concurrent jobs is 50, and permanent jobs hold their slots indefinitely
Proposed direction
Separate tracker accounting for daemon jobs
Permanent jobs should not count against min_jobs/max_jobs. A daemon: bool flag on the job type (or a separate DaemonTracker) would let the poller continue claiming transient work regardless of how many daemons are running.
Dedicated pool support
Allow JobSvcConfig to accept a second pool (or pool factory) so that:
- Job infrastructure (polling, keep-alive, state transitions) uses the primary pool
- Job business logic (
CurrentJob::pool() / begin_op()) can optionally use a replica or separate pool
- Callers can isolate job DB load from HTTP request handling
Reduce write pressure from idle jobs
The keep-alive handler updates alive_at for ALL running jobs owned by a poller instance in a single UPDATE ... WHERE poller_instance_id = $2. For permanent jobs that are just sleeping between cycles, these writes are pure overhead. Options:
- Batch keep-alive writes less frequently for daemon-type jobs
- Let permanent jobs opt out of keep-alive (they manage their own liveness)
- Use advisory locks instead of row-level
alive_at updates for long-lived jobs
Related: #82 (SKIP LOCKED fix for poller contention)
Problem
The job crate's architecture makes it difficult to scale horizontally because permanent (daemon-style) jobs hold tracker slots and write capacity on the primary database indefinitely.
How permanent jobs work today
Several consumers use infinite-loop jobs that never return from
run():Examples from lana-bank:
fetch-provider-priceself-custody-balance-synceod-process-managersync-reportsWhy this blocks scaling
1. Permanent jobs consume tracker slots
JobTracker::next_batch_size()returnsNonewhenrunning_jobs >= min_jobs(default 30). Each permanent job incrementsrunning_jobsviadispatch_job()and only decrements onDropofJobDispatcher— which never happens while the job is looping. A handful of permanent jobs permanently reduce capacity for transient work.2. All job infrastructure is write-bound to the primary
The job lifecycle is a write-side state machine on
job_executions:FOR UPDATE+UPDATE SET state = 'running'UPDATE SET alive_at = $1 WHERE poller_instance_id = $2everyjob_lost_interval / 4(75s default)UPDATE SET state = 'pending' WHERE alive_at < thresholdDELETE FROM job_executionsorUPDATE SET state = 'pending'None of these can be offloaded to a read replica. Permanent jobs keep all of these write paths active for their entire lifetime.
3. No pool isolation
CurrentJobholds a clone of the samePgPoolused by the poller, keep-alive handler, lost-job detector, and (in lana-bank's case) HTTP request handlers. There's no way to give jobs a separate pool or direct read-only work to a replica.4. Pollers block each other (fixed in #82)
The poll query uses
FOR UPDATEwithoutSKIP LOCKED, so concurrent pollers serialize on row locks instead of claiming disjoint job sets. PR #82 fixes this specific issue, but the deeper problems remain.Impact
Proposed direction
Separate tracker accounting for daemon jobs
Permanent jobs should not count against
min_jobs/max_jobs. Adaemon: boolflag on the job type (or a separateDaemonTracker) would let the poller continue claiming transient work regardless of how many daemons are running.Dedicated pool support
Allow
JobSvcConfigto accept a second pool (or pool factory) so that:CurrentJob::pool()/begin_op()) can optionally use a replica or separate poolReduce write pressure from idle jobs
The keep-alive handler updates
alive_atfor ALL running jobs owned by a poller instance in a singleUPDATE ... WHERE poller_instance_id = $2. For permanent jobs that are just sleeping between cycles, these writes are pure overhead. Options:alive_atupdates for long-lived jobsRelated: #82 (SKIP LOCKED fix for poller contention)