fix: Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4359
+174
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix cert-manager webhook certificate renewal race condition
Issue
Fixes #4019 - Race condition during cert-manager webhook certificate renewal causing webhook failures
Description
This PR addresses a critical race condition that occurs when cert-manager renews webhook certificates for the AWS Load Balancer Controller. The issue manifests as webhook validation/mutation failures during certificate renewal periods, causing intermittent service disruptions.
Root Cause Analysis
The original cert-manager integration used a single-tier certificate approach where:
Solution Architecture
Implemented a 3-tier certificate hierarchy that eliminates the race condition:
Key Benefits:
Implementation Details
New Resources Created:
templates/cert-manager.yaml
- New 3-tier CA certificate hierarchytemplates/webhook.yaml
- Updated to use new CA issuervalues.yaml
- Added cert-manager configuration optionsdocs/deploy/cert-manager.md
- Comprehensive documentationConfiguration Options:
Backward Compatibility
100% backward compatible with existing deployments:
enableCertManager: true
when readycertManager.issuerRef
Testing Scenarios Validated
✅ Core Functionality:
✅ Upgrade Scenarios:
✅ Template Quality:
clientConfig
)✅ Production Validation:
✅ Edge Cases:
Before/After Comparison
Before (Race Condition Present):
After (Race Condition Eliminated):
Checklist
README.md
, or thedocs
directory) - Addeddocs/deploy/cert-manager.md
BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯
Impact: This change eliminates a critical production issue affecting webhook reliability during certificate renewals while maintaining 100% backward compatibility. The new architecture provides a stable, enterprise-ready certificate management solution that scales with organizational needs.