Skip to content

feat(backend): implement automatic worker scaling via queue backlog m…#363

Merged
EDOHWARES merged 2 commits intoEDOHWARES:mainfrom
CNduka001:feature/issue-260-worker-scaling
May 1, 2026
Merged

feat(backend): implement automatic worker scaling via queue backlog m…#363
EDOHWARES merged 2 commits intoEDOHWARES:mainfrom
CNduka001:feature/issue-260-worker-scaling

Conversation

@CNduka001
Copy link
Copy Markdown
Contributor

PR #260 Automatic Worker Scaling via Queue Backlog Metrics

Description

This pull request implements dynamic backend worker scaling based on Redis queue backlog metrics. It improves performance and resource utilization by ensuring that BullMQ processing threads scale elastically with the transaction load, directly resolving the requirements mapped out for handling queue backlogs effectively without requiring constant container recreations (minimizing cold starts).

Key Changes

  • Custom Worker Auto-Scaler: Created src/worker/scaler.js to dynamically instantiate and shut down BullMQ Worker objects internally based on the total number of waiting jobs across all configured networks.
  • K8s HPA Support: Exposed a new API endpoint GET /api/queue/metrics (via queue.routes.js and queue.controller.js) to expose key metrics totalWaiting and totalActive. This allows integration with Kubernetes HPA via custom metrics adapters if cluster-level pod scaling is preferred.
  • Redis Connection Fixes: Fixed connection locking issues in worker/queue.js and worker/processor.js by explicitly passing lazyConnect: true and maxRetriesPerRequest: null. This guarantees that queue evaluation won't block the main event loop if the Redis server goes offline or restarts.
  • Improved Module Resiliency: Secured the application's boot flow in processor.js by wrapping legacy missing services (email.service, discord.service) in try-catch blocks, preventing the worker application from crashing or failing to bootstrap if these optional plugins aren't present.
  • Scaling Documentation: Added extensive documentation in /docs/queue_scaling.md outlining internal behavior, threshold configurations (MAX_WORKER_REPLICAS, JOBS_PER_WORKER_THRESHOLD), and giving explicit instructions on how to wire it up with standard Kubernetes HPA definitions.
  • Automated Tests: Created __tests__/scaler.test.js leveraging node:test to guarantee scaling logic up/down boundaries, enforcing that worker generation never exceeds the configured ceiling.

Acceptance Criteria Met

  • Integrate with custom scaling logic dynamically managing concurrency.
  • Expose metrics specifically formatted for K8s HPA integration.
  • Scale worker replicas based on Redis queue length.
  • Minimize cold-start time (utilizing Node process scaling).
  • Unit and integration tests added and passing successfully.
  • Documentation updated in the /docs folder.

Closes #260

@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented Apr 27, 2026

@CNduka001 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@EDOHWARES
Copy link
Copy Markdown
Owner

Nice implementation, lgtm!

@EDOHWARES EDOHWARES merged commit 800cf2a into EDOHWARES:main May 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backend: Automatic Worker Scaling via Queue Backlog Metrics

2 participants