From d142b689bef353f47ffb0e1faf6af02f912a4602 Mon Sep 17 00:00:00 2001 From: MoonBoi9001 Date: Thu, 6 Nov 2025 12:59:20 -0700 Subject: [PATCH] docs: add end-to-end oracle flow diagram Added mermaid diagram showing complete system architecture from scheduler trigger through data processing to blockchain submission. Includes visual representation of circuit breaker, cache logic, BigQuery pipeline, eligibility criteria, RPC failover, and monitoring systems. --- docs/technical-design.md | 190 +++++++++++++++++++++++++++++++++++---- 1 file changed, 175 insertions(+), 15 deletions(-) diff --git a/docs/technical-design.md b/docs/technical-design.md index 6439f1a..835f955 100644 --- a/docs/technical-design.md +++ b/docs/technical-design.md @@ -1,6 +1,168 @@ # Technical Design & Architecture -This document outlines key architectural decisions and data flows within the Rewards Eligibility Oracle. +This document's purpose is to visually represent the Rewards Eligibility Oracle codebase, as a more approachable alternative to reading through the codebase directly. + +## End-to-End Oracle Flow + +The Rewards Eligibility Oracle operates as a daily scheduled service that evaluates indexer performance and updates on-chain rewards eligibility via function calls to the RewardsEligibilityOracle contract. The diagram below illustrates the complete execution flow from scheduler trigger through data processing to blockchain submission and error handling. + +The Oracle is designed to be resilient to transient network issues and RPC provider failures. It uses a multi-layered approach involving internal retries, provider rotation, and a circuit breaker to prevent costly infinite restart loops that needlessly burn through BigQuery requests. + +```mermaid +--- +title: Rewards Eligibility Oracle - End-to-End Flow +--- +%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fee2e2', 'primaryTextColor':'#7f1d1d', 'primaryBorderColor':'#ef4444', 'lineColor':'#6b7280'}}}%% + +graph TB + %% Docker Container - Contains all oracle logic + subgraph DOCKER["Docker Container"] + Scheduler["Python Scheduler"] + Oracle["Rewards Eligibility Oracle"] + + subgraph CIRCUIT_BREAKER["Circuit Breaker Logic"] + CB["Circuit Breaker"] + CBCheck{"Has there been more
than 3 failures in the
last 60 minutes?"} + end + + Scheduler -.->|"Phase 1: Schedule daily run"| Oracle + + %% Data Pipeline + subgraph PIPELINE["Data Pipeline"] + CacheCheck{"Do we have recent cached
BigQuery results available?
(< 30 min old)"} + + subgraph BIGQUERY["BigQuery Analysis"] + FetchData["Fetch Indexer Performance Data
over last 28 days
(from BigQuery)"] + SQLQuery["- Daily query metrics
- Days online calculation
- Subgraph coverage"] + end + + subgraph PROCESSING["Eligibility Processing"] + ApplyCriteria["Apply Criteria e.g.
5+ days online
Latency < 5000ms
Blocks behind < 50000
1+ subgraph served"] + FilterData["Filter Eligible
vs Ineligible"] + GenArtifacts["Generate CSV Artifacts:
- eligible_indexers.csv
- ineligible_indexers.csv
- full_metrics.csv"] + end + end + + %% Blockchain Layer + subgraph BLOCKCHAIN["Blockchain Submission"] + Batch["Consume series of Eligible
Indexers from CSV.
Batch indexer addresses
into groups of 125 indexers."] + + subgraph RPC["RPC Failover System"] + TryRPC["Try establish connection
with RPC provider"] + RPCError["Rotate to next RPC provider"] + end + + BuildTx["Build Transaction:
- Estimate gas
- Get nonce
- Sign with key"] + SubmitTx["Submit Batch to Contract
call function:
renewIndexerEligibility()"] + WaitReceipt["Wait for Receipt
30s timeout"] + MoreBatches{"More
Batches?"} + end + + %% Monitoring + subgraph MONITOR["Monitoring & Notifications"] + SlackSuccess["Slack Success:
- Eligible count
- Execution time
- Transaction links"] + SlackFailCircuitBreaker["Stop container sys.exit(0)
Container will not restart
Manual Intervention needed
Send notification to team
slack channel for debugging"] + SlackFailRPC["Stop container sys.exit(1)
Container will restart
Send notification to slack"] + SlackRotate["Send slack notification"] + end + end + + %% External Systems - Define after Docker subgraph + RPCProviders["Pool of 4 RPC providers
(External Infrastructure)"] + BQ["Google BigQuery
Indexer Performance Data"] + + subgraph FailureLogStorage["Data Storage
(mounted volume)"] + CBLog["Failure log"] + end + + subgraph HistoricalDataStorage["Data Storage
(mounted volume)"] + HistoricalData["Historical archive of
eligible and ineligible
indexers by date
YYYY-MM-DD"] + end + + END_NO_RESTART["FAILURE
Container Stopped
No Restart
Manual Intervention Required"] + END_WITH_RESTART["FAILURE
Container Stopped
Restart Container
Will retry entire loop again"] + SUCCESS["SUCCESS
Wait for next
scheduled trigger"] + + %% Main Flow - Start with Docker container to anchor it left + Oracle -->|"Phase 1.1: Check if oracle
should run"| CB + CB -->|"Phase 1.2: Read log"| CBLog + CBLog -->|"Phase 1.3: Return log"| CB + CB -->|"Phase 1.4: Provides failure
timestamps (if they exist)"| CBCheck + CBCheck -->|"Phase 2:
(Regular Path)
No"| CacheCheck + CacheCheck -->|"Phase 2.1: Check for
recent cached data"| HistoricalData + HistoricalData -->|"Phase 2.2: Return recent eligible indexers
from eligible_indexers.csv
(if they exist)"| CacheCheck + CBCheck -.->|"Phase 2:
(Alternative Path)
Yes"| SlackFailCircuitBreaker + SlackFailCircuitBreaker -.-> END_NO_RESTART + + CacheCheck -->|"Phase 3:
(Alternative Path)
Yes"| Batch + CacheCheck -->|"Phase 3:
(Regular Path)
No"| FetchData + + FetchData -->|"Phase 3.1: Query data
from BigQuery"| BQ + BQ -->|"Phase 3.2: Returns metrics"| SQLQuery + SQLQuery -->|"Phase 3.3: Process results"| ApplyCriteria + ApplyCriteria --> FilterData + FilterData -->|"Phase 3.4: Generate CSV's"| GenArtifacts + GenArtifacts -->|"Phase 3.5: Save data"| HistoricalData + GenArtifacts --> Batch + + Batch -->|"Phase 4.1: For each batch"| TryRPC + TryRPC -->|"Phase 4.2: Connect"| RPCProviders + RPCProviders -->|"Phase 4.3:
(Regular Path)
RPC connection established"| BuildTx + RPCProviders -.->|"Phase 4.3:
(Alternative Path)
RPC connection failed
Multiple connection attempts
Not possible to connect"| RPCError + RPCError -.->|"Notify"| SlackRotate + RPCError -->|"All exhausted"| SlackFailRPC + SlackFailRPC --> END_WITH_RESTART + RPCError -->|"Connection successful"| BuildTx + + BuildTx --> SubmitTx + SubmitTx --> WaitReceipt + + WaitReceipt -->|"Phase 4.4: Batch confirmed"| MoreBatches + + MoreBatches -->|"Yes
Back to phase 4 loop
Process next batch"| Batch + MoreBatches -->|"Phase 5: No
All complete"| SlackSuccess + SlackSuccess --> SUCCESS + + %% Styling + classDef schedulerStyle fill:#fee2e2,stroke:#ef4444,stroke-width:3px,color:#7f1d1d + classDef oracleStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e + classDef dataStyle fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d + classDef processingStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81 + classDef blockchainStyle fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d + classDef monitorStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e + classDef infraStyle fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151 + classDef contractStyle fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a + classDef decisionStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e + classDef endStyle fill:#7f1d1d,stroke:#991b1b,stroke-width:3px,color:#fee2e2 + classDef endStyleOrange fill:#ea580c,stroke:#c2410c,stroke-width:3px,color:#ffedd5 + classDef successStyle fill:#14532d,stroke:#166534,stroke-width:3px,color:#f0fdf4 + + class Scheduler schedulerStyle + class Oracle,CB oracleStyle + class FetchData,SQLQuery,BQ dataStyle + class ApplyCriteria,FilterData,GenArtifacts processingStyle + class Batch,TryRPC,BuildTx,SubmitTx,WaitReceipt,Rotate,RPCError blockchainStyle + class SlackSuccess,SlackFailCircuitBreaker,SlackFailRPC,SlackRotate monitorStyle + class RPCProviders,HistoricalData,CBLog infraStyle + class Contract contractStyle + class CacheCheck,MoreBatches,CBCheck decisionStyle + class END_NO_RESTART endStyle + class END_WITH_RESTART endStyleOrange + class SUCCESS successStyle + + style DOCKER fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a + style CIRCUIT_BREAKER fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e + style PIPELINE fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d + style BIGQUERY fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d + style PROCESSING fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81 + style BLOCKCHAIN fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d + style RPC fill:#fecaca,stroke:#ef4444,stroke-width:2px,color:#7f1d1d + style MONITOR fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e + style FailureLogStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151 + style HistoricalDataStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151 +``` + +--- ## RPC Provider Failover and Circuit Breaker Logic @@ -21,24 +183,22 @@ sequenceDiagram # Describe failure loop inside the blockchain_client module activate blockchain_client - alt RPC Loop (for each provider) + loop For each provider in pool - # Attempt RPC call - blockchain_client->>blockchain_client: _execute_rpc_call() with provider A - note right of blockchain_client: Fails after 5 retries + # Attempt RPC call + blockchain_client->>blockchain_client: _execute_rpc_call() with next provider + note right of blockchain_client: Fails after 3 attempts - # Log failure + # Log failure and rotate blockchain_client-->>blockchain_client: raises ConnectionError - note right of blockchain_client: Catches error, logs rotation + note right of blockchain_client: Catches error, rotates to next provider - # Retry RPC call - blockchain_client->>blockchain_client: _execute_rpc_call() with provider B - note right of blockchain_client: Fails after 5 retries + # Send rotation notification + blockchain_client->>slack_notifier: send_info_notification() + note right of slack_notifier: RPC provider rotation alert - # Log final failure - blockchain_client-->>blockchain_client: raises ConnectionError - note right of blockchain_client: All providers tried and failed end + note right of blockchain_client: All providers exhausted # Raise error back to main_oracle oracle and exit blockchain_client module blockchain_client-->>main_oracle: raises Final ConnectionError @@ -51,6 +211,6 @@ sequenceDiagram main_oracle->>slack_notifier: send_failure_notification() # Document restart process - note right of main_oracle: sys.exit(1) - note right of main_oracle: Docker will restart. CircuitBreaker can halt via sys.exit(0) + note right of main_oracle: sys.exit(1) triggers Docker restart + note right of main_oracle: Circuit breaker uses sys.exit(0) to prevent restart ```