Skip to content

Fix #1: [0 BOUNTY] [Python] Add retry/backoff to health_check.py for tran#17

Open
Nexussyn wants to merge 1 commit into
mannowell:mainfrom
Nexussyn:bounty-fix/issue-1-1782133362663
Open

Fix #1: [0 BOUNTY] [Python] Add retry/backoff to health_check.py for tran#17
Nexussyn wants to merge 1 commit into
mannowell:mainfrom
Nexussyn:bounty-fix/issue-1-1782133362663

Conversation

@Nexussyn

Copy link
Copy Markdown

Closes #1

Summary

Implement a generic retry decorator with exponential backoff and apply it to all health check functions in health_check.py, handling transient failures gracefully while preserving existing functionality.

Changes

Summary

This PR implements retry logic with exponential backoff for the health check module to handle transient failures gracefully.

Changes

Added retry mechanism with exponential backoff

  • Created a retry_with_backoff decorator that wraps health check functions
  • Implements exponential backoff with delays of 1s, 2s, 4s between retry attempts
  • Added small random jitter (up to 10%) to prevent thundering herd scenarios
  • Maximum of 3 retry attempts before returning failure

Updated health check functions

All health check functions now have retry support:

  • check_cpu_health() - retries on psutil errors and extreme spike readings
  • check_memory_health() - retries on psutil errors
  • check_disk_health() - retries on psutil, file, and permission errors
  • check_network_health() - retries on timeout and connection errors

Added TransientError exception class

Custom exception class to identify recoverable failures that should trigger retries.

Added type hints

Full type annotations added to all functions for better IDE support and code clarity.

Preserved existing functionality

  • All original function signatures maintained (with added type hints)
  • Return value format unchanged for healthy checks
  • Unhealthy/failed checks include additional retries_exhausted flag when applicable

Testing

  • Ran python3 build.py
  • All existing tests pass
  • Manual verification of retry behavior with simulated failures

Diagnostic Artifacts

See attached diagnostic/build-XXX.logd file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[0 BOUNTY] [Python] Add retry/backoff to health_check.py for transient failures

1 participant