Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 175 additions & 15 deletions docs/technical-design.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,168 @@
# Technical Design & Architecture

This document outlines key architectural decisions and data flows within the Rewards Eligibility Oracle.
This document's purpose is to visually represent the Rewards Eligibility Oracle codebase, as a more approachable alternative to reading through the codebase directly.

## End-to-End Oracle Flow

The Rewards Eligibility Oracle operates as a daily scheduled service that evaluates indexer performance and updates on-chain rewards eligibility via function calls to the RewardsEligibilityOracle contract. The diagram below illustrates the complete execution flow from scheduler trigger through data processing to blockchain submission and error handling.

The Oracle is designed to be resilient to transient network issues and RPC provider failures. It uses a multi-layered approach involving internal retries, provider rotation, and a circuit breaker to prevent costly infinite restart loops that needlessly burn through BigQuery requests.

```mermaid
---
title: Rewards Eligibility Oracle - End-to-End Flow
---
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fee2e2', 'primaryTextColor':'#7f1d1d', 'primaryBorderColor':'#ef4444', 'lineColor':'#6b7280'}}}%%

graph TB
%% Docker Container - Contains all oracle logic
subgraph DOCKER["Docker Container"]
Scheduler["Python Scheduler"]
Oracle["Rewards Eligibility Oracle"]

subgraph CIRCUIT_BREAKER["Circuit Breaker Logic"]
CB["Circuit Breaker"]
CBCheck{"Has there been more<br/>than 3 failures in the <br/>last 60 minutes?"}
end

Scheduler -.->|"Phase 1: Schedule daily run"| Oracle

%% Data Pipeline
subgraph PIPELINE["Data Pipeline"]
CacheCheck{"Do we have recent cached<br/>BigQuery results available?<br/>(< 30 min old)"}

subgraph BIGQUERY["BigQuery Analysis"]
FetchData["Fetch Indexer Performance Data<br/>over last 28 days<br/>(from BigQuery)"]
SQLQuery["- Daily query metrics<br/>- Days online calculation<br/>- Subgraph coverage"]
end

subgraph PROCESSING["Eligibility Processing"]
ApplyCriteria["Apply Criteria e.g.<br/>5+ days online<br/>Latency < 5000ms<br/>Blocks behind < 50000<br/>1+ subgraph served"]
FilterData["Filter Eligible<br/>vs Ineligible"]
GenArtifacts["Generate CSV Artifacts:<br/>- eligible_indexers.csv<br/>- ineligible_indexers.csv<br/>- full_metrics.csv"]
end
end

%% Blockchain Layer
subgraph BLOCKCHAIN["Blockchain Submission"]
Batch["Consume series of Eligible<br/>Indexers from CSV.<br/>Batch indexer addresses<br/>into groups of 125 indexers."]

subgraph RPC["RPC Failover System"]
TryRPC["Try establish connection<br/>with RPC provider"]
RPCError["Rotate to next RPC provider"]
end

BuildTx["Build Transaction:<br/>- Estimate gas<br/>- Get nonce<br/>- Sign with key"]
SubmitTx["Submit Batch to Contract<br/>call function:<br/>renewIndexerEligibility()"]
WaitReceipt["Wait for Receipt<br/>30s timeout"]
MoreBatches{"More<br/>Batches?"}
end

%% Monitoring
subgraph MONITOR["Monitoring & Notifications"]
SlackSuccess["Slack Success:<br/>- Eligible count<br/>- Execution time<br/>- Transaction links"]
SlackFailCircuitBreaker["Stop container sys.exit(0)<br/>Container will not restart<br/>Manual Intervention needed<br/>Send notification to team<br/>slack channel for debugging"]
SlackFailRPC["Stop container sys.exit(1)<br/>Container will restart<br/>Send notification to slack"]
SlackRotate["Send slack notification"]
end
end

%% External Systems - Define after Docker subgraph
RPCProviders["Pool of 4 RPC providers<br/>(External Infrastructure)"]
BQ["Google BigQuery<br/>Indexer Performance Data"]

subgraph FailureLogStorage["Data Storage<br/>(mounted volume)"]
CBLog["Failure log"]
end

subgraph HistoricalDataStorage["Data Storage<br/>(mounted volume)"]
HistoricalData["Historical archive of<br/>eligible and ineligible<br/>indexers by date<br/>YYYY-MM-DD"]
end

END_NO_RESTART["FAILURE<br/>Container Stopped<br/>No Restart<br/>Manual Intervention Required"]
END_WITH_RESTART["FAILURE<br/>Container Stopped<br/>Restart Container<br/>Will retry entire loop again"]
SUCCESS["SUCCESS<br/>Wait for next<br/>scheduled trigger"]

%% Main Flow - Start with Docker container to anchor it left
Oracle -->|"Phase 1.1: Check if oracle<br/>should run"| CB
CB -->|"Phase 1.2: Read log"| CBLog
CBLog -->|"Phase 1.3: Return log"| CB
CB -->|"Phase 1.4: Provides failure<br/>timestamps (if they exist)"| CBCheck
CBCheck -->|"Phase 2:<br/>(Regular Path)<br/>No"| CacheCheck
CacheCheck -->|"Phase 2.1: Check for<br/>recent cached data"| HistoricalData
HistoricalData -->|"Phase 2.2: Return recent eligible indexers<br/>from eligible_indexers.csv<br/>(if they exist)"| CacheCheck
CBCheck -.->|"Phase 2:<br/>(Alternative Path)<br/>Yes"| SlackFailCircuitBreaker
SlackFailCircuitBreaker -.-> END_NO_RESTART

CacheCheck -->|"Phase 3:<br/>(Alternative Path)<br/>Yes"| Batch
CacheCheck -->|"Phase 3:<br/>(Regular Path)<br/>No"| FetchData

FetchData -->|"Phase 3.1: Query data<br/>from BigQuery"| BQ
BQ -->|"Phase 3.2: Returns metrics"| SQLQuery
SQLQuery -->|"Phase 3.3: Process results"| ApplyCriteria
ApplyCriteria --> FilterData
FilterData -->|"Phase 3.4: Generate CSV's"| GenArtifacts
GenArtifacts -->|"Phase 3.5: Save data"| HistoricalData
GenArtifacts --> Batch

Batch -->|"Phase 4.1: For each batch"| TryRPC
TryRPC -->|"Phase 4.2: Connect"| RPCProviders
RPCProviders -->|"Phase 4.3:<br/>(Regular Path)<br/>RPC connection established"| BuildTx
RPCProviders -.->|"Phase 4.3:<br/>(Alternative Path)<br/>RPC connection failed<br/>Multiple connection attempts<br/>Not possible to connect"| RPCError
RPCError -.->|"Notify"| SlackRotate
RPCError -->|"All exhausted"| SlackFailRPC
SlackFailRPC --> END_WITH_RESTART
RPCError -->|"Connection successful"| BuildTx

BuildTx --> SubmitTx
SubmitTx --> WaitReceipt

WaitReceipt -->|"Phase 4.4: Batch confirmed"| MoreBatches

MoreBatches -->|"Yes<br/>Back to phase 4 loop<br/>Process next batch"| Batch
MoreBatches -->|"Phase 5: No<br/>All complete"| SlackSuccess
SlackSuccess --> SUCCESS

%% Styling
classDef schedulerStyle fill:#fee2e2,stroke:#ef4444,stroke-width:3px,color:#7f1d1d
classDef oracleStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
classDef dataStyle fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
classDef processingStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
classDef blockchainStyle fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
classDef monitorStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
classDef infraStyle fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
classDef contractStyle fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
classDef decisionStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
classDef endStyle fill:#7f1d1d,stroke:#991b1b,stroke-width:3px,color:#fee2e2
classDef endStyleOrange fill:#ea580c,stroke:#c2410c,stroke-width:3px,color:#ffedd5
classDef successStyle fill:#14532d,stroke:#166534,stroke-width:3px,color:#f0fdf4

class Scheduler schedulerStyle
class Oracle,CB oracleStyle
class FetchData,SQLQuery,BQ dataStyle
class ApplyCriteria,FilterData,GenArtifacts processingStyle
class Batch,TryRPC,BuildTx,SubmitTx,WaitReceipt,Rotate,RPCError blockchainStyle
class SlackSuccess,SlackFailCircuitBreaker,SlackFailRPC,SlackRotate monitorStyle
class RPCProviders,HistoricalData,CBLog infraStyle
class Contract contractStyle
class CacheCheck,MoreBatches,CBCheck decisionStyle
class END_NO_RESTART endStyle
class END_WITH_RESTART endStyleOrange
class SUCCESS successStyle

style DOCKER fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
style CIRCUIT_BREAKER fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
style PIPELINE fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
style BIGQUERY fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
style PROCESSING fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
style BLOCKCHAIN fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
style RPC fill:#fecaca,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
style MONITOR fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
style FailureLogStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
style HistoricalDataStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
```

---

## RPC Provider Failover and Circuit Breaker Logic

Expand All @@ -21,24 +183,22 @@ sequenceDiagram

# Describe failure loop inside the blockchain_client module
activate blockchain_client
alt RPC Loop (for each provider)
loop For each provider in pool

# Attempt RPC call
blockchain_client->>blockchain_client: _execute_rpc_call() with provider A
note right of blockchain_client: Fails after 5 retries
# Attempt RPC call
blockchain_client->>blockchain_client: _execute_rpc_call() with next provider
note right of blockchain_client: Fails after 3 attempts

# Log failure
# Log failure and rotate
blockchain_client-->>blockchain_client: raises ConnectionError
note right of blockchain_client: Catches error, logs rotation
note right of blockchain_client: Catches error, rotates to next provider

# Retry RPC call
blockchain_client->>blockchain_client: _execute_rpc_call() with provider B
note right of blockchain_client: Fails after 5 retries
# Send rotation notification
blockchain_client->>slack_notifier: send_info_notification()
note right of slack_notifier: RPC provider rotation alert

# Log final failure
blockchain_client-->>blockchain_client: raises ConnectionError
note right of blockchain_client: All providers tried and failed
end
note right of blockchain_client: All providers exhausted

# Raise error back to main_oracle oracle and exit blockchain_client module
blockchain_client-->>main_oracle: raises Final ConnectionError
Expand All @@ -51,6 +211,6 @@ sequenceDiagram
main_oracle->>slack_notifier: send_failure_notification()

# Document restart process
note right of main_oracle: sys.exit(1)
note right of main_oracle: Docker will restart. CircuitBreaker can halt via sys.exit(0)
note right of main_oracle: sys.exit(1) triggers Docker restart
note right of main_oracle: Circuit breaker uses sys.exit(0) to prevent restart
```