Skip to content

feat(job-queue): resilient background job retry & monitoring (#130)#380

Open
macan88 wants to merge 3 commits intorohitdash08:mainfrom
macan88:felix/fix-130
Open

feat(job-queue): resilient background job retry & monitoring (#130)#380
macan88 wants to merge 3 commits intorohitdash08:mainfrom
macan88:felix/fix-130

Conversation

@macan88
Copy link

@macan88 macan88 commented Mar 12, 2026

Problem

Async API calls (dashboard, analytics, expenses) had no retry logic or observability, meaning transient network errors or 5xx responses caused immediate user-facing failures with no recovery path.

Solution

Added JobQueue and JobMonitor classes in app/src/lib/jobQueue.ts that wrap any async function with configurable exponential backoff retry (default: 3 attempts, 200 ms base delay) and lifecycle hooks (onSuccess, onRetry, onFailure, onDead). The retry policy explicitly distinguishes transient (5xx / network) from permanent (4xx) errors — permanent errors short-circuit immediately to avoid wasting retries. Jobs that exhaust all retries are placed in a dead-letter state for operator inspection without losing error context.

Testing

  • Added app/src/__tests__/jobQueue.test.ts with three focused cases:
    1. Transient failure retried up to maxAttempts, resolves on success
    2. Permanent 4xx error skips retries entirely, fires onFailure once
    3. All retries exhausted → job moves to dead-letter queue, onDead fired
  • defaultRetryPolicy.isRetryable unit-tested for 4xx vs 5xx/network patterns
  • Tests use jest.useFakeTimers() so backoff delays do not slow CI

Documentation

  • Added docs/job-queue.md covering API, retry policy customisation, dead-letter inspection, and memory management guidance.

Closes #130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant