Skip to content

Feat/monitoring dashboard#454

Merged
Xhristin3 merged 2 commits intorinafcode:mainfrom
Mrchinedum:feat/monitoring-dashboard
Apr 28, 2026
Merged

Feat/monitoring dashboard#454
Xhristin3 merged 2 commits intorinafcode:mainfrom
Mrchinedum:feat/monitoring-dashboard

Conversation

@Mrchinedum
Copy link
Copy Markdown
Contributor

fixed #380

feat: implement comprehensive monitoring dashboard

No comprehensive monitoring dashboard existed. The platform had
Prometheus + Grafana infrastructure but no sustainability-aware
panels, no platform KPI gauges, and no alert rules for contract-level
health. This change closes that gap end-to-end.

Changes

indexer/src/performance/metrics.service.ts

  • Declare 8 new Prometheus Gauge fields for sustainability KPIs
  • Register all 8 gauges in the constructor with descriptive help text:
    teachlink_contract_sustainability_invocations_total
    teachlink_contract_sustainability_storage_writes_total
    teachlink_contract_sustainability_events_emitted_total
    teachlink_contract_sustainability_rewards_distributed_total
    teachlink_contract_sustainability_content_minted_total
    teachlink_contract_sustainability_active_users_total
    teachlink_contract_sustainability_efficiency_score
    teachlink_contract_sustainability_health_score
  • Add updateSustainabilityMetrics(m) method to set all 8 gauges
    in a single call, keeping Prometheus state in sync with each query

indexer/src/reporting/dashboard.service.ts

  • Add getSustainabilitySnapshot(): computes real-time KPIs from the
    existing getCurrentAnalytics() result:
    efficiencyScore — bridge success rate as basis-point proxy
    healthScore — weighted composite (50% efficiency,
    25% low dispute rate, 25% reward activity)
    escrowDisputeRateBps — disputes / total escrows * 10 000
    rewardClaimRate — claimed / total rewards * 10 000
    Calls metricsService.updateSustainabilityMetrics() on every
    invocation so Prometheus gauges stay current without a separate
    scrape job

indexer/src/reporting/reporting.controller.ts

  • Add GET /analytics/sustainability endpoint that delegates to
    getSustainabilitySnapshot() and returns the KPI object

indexer/observability/prometheus/alerts.yml

  • Add teachlink-sustainability alert group with 5 rules:
    TeachLinkLowEfficiencyScore efficiency < 7000 bps, 15m, warning
    TeachLinkCriticalEfficiencyScore efficiency < 5000 bps, 5m, critical
    TeachLinkLowHealthScore health < 60/100, 15m, warning
    TeachLinkHighEscrowDisputeRate API 5xx rate > 10%, 10m, warning
    TeachLinkNoNewTransactions invocations flat 30m, 30m, warning

indexer/observability/grafana/dashboards/teachlink-monitoring-dashboard.json (new)

  • 691-line Grafana dashboard (uid: teachlink-monitoring-dashboard)

  • Auto-refresh every 30s, default window now-6h

  • 4 collapsible row sections, 20 panels total:

    Row 1 — Real-Time Platform Health (6 stat panels)
    Sustainability Health Score (0-100, color thresholds 60/80)
    Contract Efficiency Score (0-10000 bps, thresholds 7000/9000)
    Indexer Availability (up + probe_success, 0-2)
    Ledger Lag (seconds, thresholds 300/900)
    HTTP Error Rate (%, thresholds 5%/10%)
    API Avg Latency (seconds, thresholds 0.5/1.0)

    Row 2 — Historical Trends (4 time-series panels)
    Sustainability Scores Over Time (health score, efficiency %)
    Contract Resource Usage Over Time (invocations, writes, events)
    HTTP Throughput (req/s by route + status)
    Platform Growth Over Time (content minted, active users,
    rewards distributed in XLM)

    Row 3 — Alert Management (3 panels)
    Firing Alerts time-series (ALERTS{alertstate="firing"})
    Active Alert Count stat (color thresholds 1/3)
    Critical Alerts stat (color thresholds 1/2)

    Row 4 — Platform Insights (5 panels)
    Dashboard Cache Hit Ratio (percentunit time-series)
    API Latency Percentiles (avg, p95, p99 time-series)
    Dependency Health (database, horizon, indexer_state)
    Indexer Progress & Errors (events processed, errors, ledger)
    HTTP Status Code Breakdown (2xx / 4xx / 5xx by route)

Acceptance criteria met

✓ Real-time metrics — 6 live stat panels + Prometheus gauges updated
on every /analytics/sustainability call
✓ Historical trends — 4 time-series panels covering scores, resource
usage, throughput, and platform growth
✓ Alert management — 5 new Prometheus alert rules + 3 alert panels
in the dashboard
✓ Platform insights — cache efficiency, latency percentiles,
dependency health, indexer progress, HTTP
status breakdown

- Add SustainabilityMetrics type with KPIs: invocations, storage
  writes, events emitted, rewards distributed, content minted,
  active users, and efficiency score
- Add SUSTAINABILITY_METRICS storage key
- Add SustainabilityMetricsUpdatedEvent
- Add SustainabilityManager with record, query, and health score logic
- Expose 4 public contract entry points in lib.rs
- Include unit tests for core metric tracking and health scoring
- Add 8 sustainability Prometheus gauges to MetricsService
  (invocations, storage writes, events emitted, rewards distributed,
  content minted, active users, efficiency score, health score)
- Add updateSustainabilityMetrics() to push gauges on each query
- Add getSustainabilitySnapshot() to DashboardService: computes
  real-time KPIs (efficiency, health, dispute rate, reward claim rate)
  and pushes them to Prometheus
- Add GET /analytics/sustainability endpoint in ReportingController
- Add teachlink-sustainability alert group to prometheus/alerts.yml
  with 5 rules: low efficiency, critical efficiency, low health score,
  high error rate, no new transactions
- Create teachlink-monitoring-dashboard.json (691 lines, 20 panels):
  - Row 1: Real-Time Platform Health (6 stat panels)
  - Row 2: Historical Trends (4 time-series panels)
  - Row 3: Alert Management (firing alerts, active count, critical count)
  - Row 4: Platform Insights (cache ratio, latency percentiles,
    dependency health, indexer progress, HTTP status breakdown)
@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented Apr 26, 2026

@Mrchinedum Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@Xhristin3 Xhristin3 merged commit 49b0459 into rinafcode:main Apr 28, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement comprehensive monitoring dashboard

2 participants