Skip to content

fix: report max recovery exhaustion events#1645

Merged
MuncleUscles merged 1 commit into
mainfrom
fix/max-recovery-event-alerts
May 28, 2026
Merged

fix: report max recovery exhaustion events#1645
MuncleUscles merged 1 commit into
mainfrom
fix/max-recovery-event-alerts

Conversation

@MuncleUscles
Copy link
Copy Markdown
Member

@MuncleUscles MuncleUscles commented May 28, 2026

Summary:

  • include recent max-recovery-exhausted transactions in Studio consensus health
  • forward those transactions as structured instance health events to Studio Pulse
  • add DB and unit coverage for event extraction/forwarding

Verification:

  • PYTHONPATH=. ./.venv/bin/python -m py_compile backend/protocol_rpc/health.py backend/services/usage_metrics_service.py
  • PYTHONPATH=. ./.venv/bin/pytest tests/unit/test_rpc_health_genvm_tracking.py -k max_recovery
  • PYTHONPATH=. ./.venv/bin/pytest tests/unit/test_usage_metrics_service.py
  • PYTHONPATH=. ./.venv/bin/pytest tests/db-sqlalchemy/test_health_orphan_detection.py -k max_recovery (blocked locally: POSTGRES_URL is not configured in this remote session; CI db-integration-test covers this path)

Summary by CodeRabbit

  • New Features

    • Health monitoring now reports detailed exhausted-recovery transactions (hash, recipient, recovery count, timestamp).
    • System health metrics include these transaction-level recovery exhaustion events.
  • Tests

    • Added and updated tests to verify the presence and contents of detailed recovery-exhaustion data in health reports and metrics.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

The PR extends the consensus health system to surface detailed "max recovery exhausted" transaction information. A new configurable event limit controls a refactored SQL query that returns both exhaustion counts and limited, recent exhausted transaction records. This detail flows through cached health responses and enriches system health metrics with structured event objects containing transaction hash, contract address, recovery count, and exhausted timestamp.

Changes

Max Recovery Exhaustion Detail Surface

Layer / File(s) Summary
Health query configuration and exhausted transaction SQL
backend/protocol_rpc/health.py
New HEALTH_MAX_RECOVERY_EXHAUSTED_EVENT_LIMIT environment parameter drives a CTE-based query that replaces count-only aggregation with both total counts and a limited, ordered list of exhausted transaction records (hash, address, recovery count, epoch timestamp).
Consensus health response and cached payload structure
backend/protocol_rpc/health.py
Consensus health return object and cached /health background payload now include max_recovery_exhausted_transactions field to expose the queried exhausted transaction detail list.
System metrics enrichment with exhausted events
backend/services/usage_metrics_service.py
send_system_health_metrics maps exhausted transaction details from consensus health into instanceHealthEvents array with type, hash, contract, recovery count, and occurred-at timestamp fields.
Exhausted transaction and metrics tests
tests/db-sqlalchemy/test_health_orphan_detection.py, tests/unit/test_rpc_health_genvm_tracking.py, tests/unit/test_usage_metrics_service.py
DB and unit tests assert max_recovery_exhausted_transactions contains the expected transaction fields and that usage metrics include mapped instance health events.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • genlayerlabs/genlayer-studio#1640: Both PRs modify consensus health SQL/aggregation to compute and surface "max recovery exhaustion" (one adds count + notice, this one extends it to return limited transaction details).
  • genlayerlabs/genlayer-studio#1621: Both PRs modify backend/protocol_rpc/health.py's consensus health logic and update test_health_orphan_detection.py assertions tied to consensus health output.
  • genlayerlabs/genlayer-studio#1636: Both PRs extend the consensus-health computation and cached /health payload by adding new metrics fields to the consensus dict.

Suggested labels

run-tests

Poem

🐰 A query hops, revealing tired chains—
Each exhausted transaction now explained!
With limits and details, the health shines bright,
Metrics enriched with recovery's plight.
Configuration bounds the weary sight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers what was changed and why, includes verification steps, and testing details. However, it does not follow the required template structure with sections like 'What', 'Why', 'Testing done', 'Decisions made', 'Checks', and 'User facing release notes'. Consider restructuring the description to match the repository template with explicit sections for What, Why, Testing done, Decisions made, and release notes for better consistency.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: reporting max recovery exhaustion events as a feature fix, which directly aligns with the core objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/max-recovery-event-alerts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MuncleUscles MuncleUscles force-pushed the fix/max-recovery-event-alerts branch from 0434dbd to 28d80ac Compare May 28, 2026 10:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/services/usage_metrics_service.py (1)

106-119: ⚡ Quick win

Please pin this new payload mapping with a unit test.

This block is a cross-layer contract. A focused test around instanceHealthEvents would catch future field drift between backend/protocol_rpc/health.py and this service before Pulse silently starts receiving partial events.

As per coding guidelines, tests/**/*.py: Use pytest with fixtures from tests/common/ for backend testing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/usage_metrics_service.py` around lines 106 - 119, Add a
focused pytest that pins the instanceHealthEvents payload mapping: create a test
that injects a sample
health_cache.services["consensus"]["max_recovery_exhausted_transactions"] entry
and asserts that usage_metrics_service builds
system_health["instanceHealthEvents"] with exactly the fields "type" (value
"max_recovery_cycles_exhausted"), "transactionHash" from event["hash"],
"contractAddress" from event["contract_address"], "recoveryCount" from
event["recovery_count"], and "occurredAt" from event["exhausted_at"]; use the
backend test fixtures from tests/common/ and mirror the canonical source in
protocol_rpc.health to prevent field drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/services/usage_metrics_service.py`:
- Around line 106-119: Add a focused pytest that pins the instanceHealthEvents
payload mapping: create a test that injects a sample
health_cache.services["consensus"]["max_recovery_exhausted_transactions"] entry
and asserts that usage_metrics_service builds
system_health["instanceHealthEvents"] with exactly the fields "type" (value
"max_recovery_cycles_exhausted"), "transactionHash" from event["hash"],
"contractAddress" from event["contract_address"], "recoveryCount" from
event["recovery_count"], and "occurredAt" from event["exhausted_at"]; use the
backend test fixtures from tests/common/ and mirror the canonical source in
protocol_rpc.health to prevent field drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38de70ce-0005-4eb5-9227-6f86ff6d944c

📥 Commits

Reviewing files that changed from the base of the PR and between f3f90fc and 0434dbd.

📒 Files selected for processing (3)
  • backend/protocol_rpc/health.py
  • backend/services/usage_metrics_service.py
  • tests/db-sqlalchemy/test_health_orphan_detection.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unit/test_usage_metrics_service.py (1)

10-10: ⚡ Quick win

Add a return type hint to the async test function.

Use -> None on the test coroutine to satisfy the project typing rule.

Proposed patch
 `@pytest.mark.asyncio`
-async def test_system_health_metrics_include_max_recovery_events():
+async def test_system_health_metrics_include_max_recovery_events() -> None:

As per coding guidelines, **/*.py: Include type hints in all Python code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_usage_metrics_service.py` at line 10, The async test
coroutine test_system_health_metrics_include_max_recovery_events is missing a
return type hint; update its signature to include "-> None" (e.g., async def
test_system_health_metrics_include_max_recovery_events() -> None:) to satisfy
the project typing rule and ensure the test function is properly typed.
tests/unit/test_rpc_health_genvm_tracking.py (1)

283-338: ⚡ Quick win

Add type hints to the new async test and local fake classes.

These new definitions are untyped, which violates the repository Python typing rule.

Proposed patch
@@
 `@pytest.mark.asyncio`
 async def test_consensus_health_includes_max_recovery_exhaustion_events(
-        self, monkeypatch
-    ):
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
@@
         class FakeResult:
-            def __init__(self, row=None, rows=None):
+            def __init__(
+                self,
+                row: SimpleNamespace | None = None,
+                rows: list[SimpleNamespace] | None = None,
+            ) -> None:
                 self.row = row
                 self.rows = rows or []
 
-            def fetchone(self):
+            def fetchone(self) -> SimpleNamespace | None:
                 return self.row
 
-            def fetchall(self):
+            def fetchall(self) -> list[SimpleNamespace]:
                 return self.rows
 
         class FakeConnection:
-            def __enter__(self):
+            def __enter__(self) -> "FakeConnection":
                 return self
 
-            def __exit__(self, exc_type, exc, tb):
+            def __exit__(self, exc_type: object, exc: object, tb: object) -> bool:
                 return False
 
-            def execute(self, statement, params=None):
+            def execute(self, statement: object, params: object | None = None) -> FakeResult:
                 query = str(statement)
@@
         class FakeEngine:
-            def connect(self):
+            def connect(self) -> FakeConnection:
                 return FakeConnection()

As per coding guidelines, **/*.py: Include type hints in all Python code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_rpc_health_genvm_tracking.py` around lines 283 - 338, The
test function test_consensus_health_includes_max_recovery_exhaustion_events and
its local helper classes (FakeResult, FakeConnection, FakeEngine) lack type
annotations; update the async test signature to include return type "-> None"
and add appropriate type hints for class attributes and methods (e.g.,
FakeResult.__init__(self, row: Optional[Any]=None, rows:
Optional[List[Any]]=None), fetchone(self) -> Optional[Any], fetchall(self) ->
List[Any]; FakeConnection.__enter__(self) -> "FakeConnection", __exit__(self,
exc_type: Optional[Type[BaseException]], exc: Optional[BaseException], tb:
Optional[TracebackType]) -> bool, execute(self, statement: Any, params:
Optional[Mapping[str, Any]] = None) -> FakeResult; FakeEngine.connect(self) ->
FakeConnection) and annotate exhausted_tx with a concrete type (e.g.,
SimpleNamespace or a TypedDict/NamedTuple) to satisfy repository typing rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/test_rpc_health_genvm_tracking.py`:
- Around line 283-338: The test function
test_consensus_health_includes_max_recovery_exhaustion_events and its local
helper classes (FakeResult, FakeConnection, FakeEngine) lack type annotations;
update the async test signature to include return type "-> None" and add
appropriate type hints for class attributes and methods (e.g.,
FakeResult.__init__(self, row: Optional[Any]=None, rows:
Optional[List[Any]]=None), fetchone(self) -> Optional[Any], fetchall(self) ->
List[Any]; FakeConnection.__enter__(self) -> "FakeConnection", __exit__(self,
exc_type: Optional[Type[BaseException]], exc: Optional[BaseException], tb:
Optional[TracebackType]) -> bool, execute(self, statement: Any, params:
Optional[Mapping[str, Any]] = None) -> FakeResult; FakeEngine.connect(self) ->
FakeConnection) and annotate exhausted_tx with a concrete type (e.g.,
SimpleNamespace or a TypedDict/NamedTuple) to satisfy repository typing rules.

In `@tests/unit/test_usage_metrics_service.py`:
- Line 10: The async test coroutine
test_system_health_metrics_include_max_recovery_events is missing a return type
hint; update its signature to include "-> None" (e.g., async def
test_system_health_metrics_include_max_recovery_events() -> None:) to satisfy
the project typing rule and ensure the test function is properly typed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ae8af47-5251-477e-b7f8-000e91cd07dd

📥 Commits

Reviewing files that changed from the base of the PR and between 0434dbd and 28d80ac.

📒 Files selected for processing (5)
  • backend/protocol_rpc/health.py
  • backend/services/usage_metrics_service.py
  • tests/db-sqlalchemy/test_health_orphan_detection.py
  • tests/unit/test_rpc_health_genvm_tracking.py
  • tests/unit/test_usage_metrics_service.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/db-sqlalchemy/test_health_orphan_detection.py
  • backend/protocol_rpc/health.py
  • backend/services/usage_metrics_service.py

@MuncleUscles MuncleUscles merged commit 05759e2 into main May 28, 2026
28 checks passed
@MuncleUscles MuncleUscles deleted the fix/max-recovery-event-alerts branch May 28, 2026 10:17
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 0.120.17 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant