Feat/monitoring dashboard#454
Merged
Xhristin3 merged 2 commits intorinafcode:mainfrom Apr 28, 2026
Merged
Conversation
- Add SustainabilityMetrics type with KPIs: invocations, storage writes, events emitted, rewards distributed, content minted, active users, and efficiency score - Add SUSTAINABILITY_METRICS storage key - Add SustainabilityMetricsUpdatedEvent - Add SustainabilityManager with record, query, and health score logic - Expose 4 public contract entry points in lib.rs - Include unit tests for core metric tracking and health scoring
- Add 8 sustainability Prometheus gauges to MetricsService
(invocations, storage writes, events emitted, rewards distributed,
content minted, active users, efficiency score, health score)
- Add updateSustainabilityMetrics() to push gauges on each query
- Add getSustainabilitySnapshot() to DashboardService: computes
real-time KPIs (efficiency, health, dispute rate, reward claim rate)
and pushes them to Prometheus
- Add GET /analytics/sustainability endpoint in ReportingController
- Add teachlink-sustainability alert group to prometheus/alerts.yml
with 5 rules: low efficiency, critical efficiency, low health score,
high error rate, no new transactions
- Create teachlink-monitoring-dashboard.json (691 lines, 20 panels):
- Row 1: Real-Time Platform Health (6 stat panels)
- Row 2: Historical Trends (4 time-series panels)
- Row 3: Alert Management (firing alerts, active count, critical count)
- Row 4: Platform Insights (cache ratio, latency percentiles,
dependency health, indexer progress, HTTP status breakdown)
|
@Mrchinedum Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixed #380
feat: implement comprehensive monitoring dashboard
No comprehensive monitoring dashboard existed. The platform had
Prometheus + Grafana infrastructure but no sustainability-aware
panels, no platform KPI gauges, and no alert rules for contract-level
health. This change closes that gap end-to-end.
Changes
indexer/src/performance/metrics.service.ts
teachlink_contract_sustainability_invocations_total
teachlink_contract_sustainability_storage_writes_total
teachlink_contract_sustainability_events_emitted_total
teachlink_contract_sustainability_rewards_distributed_total
teachlink_contract_sustainability_content_minted_total
teachlink_contract_sustainability_active_users_total
teachlink_contract_sustainability_efficiency_score
teachlink_contract_sustainability_health_score
in a single call, keeping Prometheus state in sync with each query
indexer/src/reporting/dashboard.service.ts
existing getCurrentAnalytics() result:
efficiencyScore — bridge success rate as basis-point proxy
healthScore — weighted composite (50% efficiency,
25% low dispute rate, 25% reward activity)
escrowDisputeRateBps — disputes / total escrows * 10 000
rewardClaimRate — claimed / total rewards * 10 000
Calls metricsService.updateSustainabilityMetrics() on every
invocation so Prometheus gauges stay current without a separate
scrape job
indexer/src/reporting/reporting.controller.ts
getSustainabilitySnapshot() and returns the KPI object
indexer/observability/prometheus/alerts.yml
TeachLinkLowEfficiencyScore efficiency < 7000 bps, 15m, warning
TeachLinkCriticalEfficiencyScore efficiency < 5000 bps, 5m, critical
TeachLinkLowHealthScore health < 60/100, 15m, warning
TeachLinkHighEscrowDisputeRate API 5xx rate > 10%, 10m, warning
TeachLinkNoNewTransactions invocations flat 30m, 30m, warning
indexer/observability/grafana/dashboards/teachlink-monitoring-dashboard.json (new)
691-line Grafana dashboard (uid: teachlink-monitoring-dashboard)
Auto-refresh every 30s, default window now-6h
4 collapsible row sections, 20 panels total:
Row 1 — Real-Time Platform Health (6 stat panels)
Sustainability Health Score (0-100, color thresholds 60/80)
Contract Efficiency Score (0-10000 bps, thresholds 7000/9000)
Indexer Availability (up + probe_success, 0-2)
Ledger Lag (seconds, thresholds 300/900)
HTTP Error Rate (%, thresholds 5%/10%)
API Avg Latency (seconds, thresholds 0.5/1.0)
Row 2 — Historical Trends (4 time-series panels)
Sustainability Scores Over Time (health score, efficiency %)
Contract Resource Usage Over Time (invocations, writes, events)
HTTP Throughput (req/s by route + status)
Platform Growth Over Time (content minted, active users,
rewards distributed in XLM)
Row 3 — Alert Management (3 panels)
Firing Alerts time-series (ALERTS{alertstate="firing"})
Active Alert Count stat (color thresholds 1/3)
Critical Alerts stat (color thresholds 1/2)
Row 4 — Platform Insights (5 panels)
Dashboard Cache Hit Ratio (percentunit time-series)
API Latency Percentiles (avg, p95, p99 time-series)
Dependency Health (database, horizon, indexer_state)
Indexer Progress & Errors (events processed, errors, ledger)
HTTP Status Code Breakdown (2xx / 4xx / 5xx by route)
Acceptance criteria met
✓ Real-time metrics — 6 live stat panels + Prometheus gauges updated
on every /analytics/sustainability call
✓ Historical trends — 4 time-series panels covering scores, resource
usage, throughput, and platform growth
✓ Alert management — 5 new Prometheus alert rules + 3 alert panels
in the dashboard
✓ Platform insights — cache efficiency, latency percentiles,
dependency health, indexer progress, HTTP
status breakdown