Rpc health failover system by 1evi7eo · Pull Request #207 · bnb-chain/example-hub

1evi7eo · 2026-01-24T15:50:31Z

Description

This PR introduces a new RPC Health Failover System tool for BNB Smart Chain (BSC), a BNBChain Cookbook demo that monitors multiple RPC endpoints, checks their health in real-time, and automatically fails over to the best available endpoint when one becomes slow or unavailable. This system ensures high availability and reliability for blockchain applications by maintaining redundancy across multiple RPC providers.

Key Features:

Multi-Endpoint Monitoring: Simultaneously monitors multiple BSC RPC endpoints configured via comma-separated BSC_RPC_URLS environment variable
Periodic Health Checks: Automatically tests each endpoint every 5 seconds (configurable via HEALTH_CHECK_INTERVAL_MS) by calling eth_blockNumber RPC method
Status Classification: Categorizes endpoints into three status levels:
- Healthy: Latency < 1000ms, responding correctly
- Degraded: Latency 1000–3000ms, still functional but slow
- Unhealthy: Latency > 3000ms or errors, not recommended
Automatic Failover: Intelligently selects the best available endpoint based on status priority (healthy > degraded > unhealthy) and latency, automatically switching when the current endpoint fails or degrades
Real-time Failover: If the active endpoint fails during an RPC call, the system immediately tries alternative endpoints in priority order
Manual Override: Allows manual selection of any endpoint via /api/set-active API endpoint
Status Tracking: Monitors latency, errors, consecutive failures, last check timestamp, and last block number for each endpoint
RESTful API: Express.js backend with endpoints for health status, manual endpoint selection, and test RPC calls
Real-time Dashboard: Modern dark-mode UI showing status of all endpoints with color-coded health indicators
Configurable Timeouts: RPC call timeout configurable via RPC_TIMEOUT_MS environment variable (default: 3000ms)

How Failover Works:

Health checks run periodically (default: every 5 seconds) on all configured endpoints
Each endpoint is tested by calling eth_blockNumber and measuring response latency
Endpoints are ranked by status priority (healthy > degraded > unhealthy) and then by latency
The best endpoint is automatically selected as active
If the active endpoint fails during an RPC call, the system immediately fails over to the next best endpoint
Failed endpoints are marked unhealthy and tracked with consecutive failure counts

Use Cases:

High-availability dApps that require reliable RPC access
Production applications needing automatic failover capabilities
Monitoring RPC provider performance and reliability
Educational tool for understanding RPC redundancy patterns
Building resilient blockchain infrastructure

Tech Stack:

TypeScript for type safety and maintainability
Express.js for HTTP server and RESTful API endpoints
Direct JSON-RPC calls using native fetch API with AbortController for timeouts
Plain HTML/CSS/JS for frontend with modern dark theme UI
Vitest for comprehensive unit testing

This implementation provides a complete, production-ready failover system that ensures continuous availability even when individual RPC endpoints experience issues, making it essential for mission-critical blockchain applications.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Reproduction Steps:

Clone the repository and run ./clone-and-run.sh (or manually: npm install, cp .env.example .env, npm run build, npm test, npm start)
Open http://localhost:3000 in a browser
View the dashboard showing health status of all configured RPC endpoints
Observe automatic health checks running every 5 seconds
Click "Test Active Endpoint" to verify the current active endpoint is working
Manually select a different endpoint using the dropdown and verify it becomes active
Simulate an endpoint failure by stopping one RPC service and observe automatic failover
Verify the system automatically selects the best available endpoint based on health and latency

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

vivixu-cmd · 2026-01-27T05:22:26Z

Congratulations! You have received a Cookbook reward. Please reply with your BSC wallet address.Thanks

1evi7eo · 2026-01-27T06:07:38Z

Congratulations! You have received a Cookbook reward. Please reply with your BSC wallet address.Thanks

Thank you for the opportunity to contribute!
0x23b23556c3CAA3C582EeE23Fc0D972352FB2a62c

1evi7eo added 2 commits January 24, 2026 13:21

Initial Commit

2879b71

completed rpc health system

aad8354

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rpc health failover system#207

Rpc health failover system#207
1evi7eo wants to merge 2 commits intobnb-chain:mainfrom
1evi7eo:rpc-health-failover-system

1evi7eo commented Jan 24, 2026

Uh oh!

vivixu-cmd commented Jan 27, 2026

Uh oh!

1evi7eo commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

1evi7eo commented Jan 24, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

vivixu-cmd commented Jan 27, 2026

Uh oh!

1evi7eo commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants