added healthcheck support for agent #820
Open
+728
−69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This adds HTTP-based health check endpoints for the Calico VPP agent, replacing the existing restart-on-timeout behavior with Kubernetes readiness and liveness probes.
Previously, the agent container would restart frequently while waiting for Felix configuration updates. This caused pods to appear
Running
even when not fully initialized making it difficult to distinguish between initialization delays and actual failures.Now, we report initialization status through standard Kubernetes probes, keeping the container running during initialization by marking it as
Not Ready
. This allows Kubernetes to manage pod lifecycle based on health check status.Changes
1. New Health Package (
calico-vpp-agent/health/
)Created a new package with:
health.go
: HTTP server with three endpoints:/liveness
: Basic health status (for liveness probe)/readiness
: Initialization status (for readiness probe)/status
: Detailed JSON status (for monitoring/debugging)2. Configuration Changes (
config/config.go
)Added healthcheck port configuration:
The healthcheck port can be customized via
ConfigMap
:3. Deployment YAML Changes (
yaml/base/calico-vpp-daemonset.yaml
)Added Kubernetes health probes to agent container:
Components Tracked
The health system tracks the initialization of these components:
Monitoring
The
/status
endpoint provides detailed information about the healhcheck status. Here is an example status response: