fix: improve default readiness probe config for shutdown #204

hvoigt · 2025-03-24T15:58:16Z

On shutdown coredns will wait what is configured for lameduck and continue handling connections. At the same time it will fail readiness probes.

In order to give readiness checks a chance to remove the instance from the service we need to lower the failure threshold and the interval.

This is to avoid failing DNS requests in a busy cluster when coredns is being scaled down.

Why is this pull request needed and what does it do?

We used a dns-test-container and did the following procedure to get this result:

We configured coredns as it is default by EKS lameduck 5s and readinessProbe: periodSeconds=10s failureThreshold=3

Before the test we scaled coredns to 20 instances manually
We started the tests and waited roughly 10 seconds
We scaled coredns to 1 instance manually
Waited for the test to finish

This test was executed in a cluster running ~1000 pods.

With this change in place we get no losses in DNS resolution when repeating this test

Which issues (if any) are related?

The issue is described above.

Checklist:

I have bumped the chart version according to versioning.
I have updated the chart changelog with all the changes that come with this pull request according to changelog.
Any new values are backwards compatible and/or have sensible default.
I have signed off all my commits as required by DCO.

Changes are automatically published when merged to main. They are not published on branches.

Note on DCO

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

On shutdown coredns will wait what is configured for lameduck and continue handling connections. At the same time it will fail readiness probes. In order to give readiness checks a chance to remove the instance from the service we need to lower the failure threshold and the interval. This is to avoid failing DNS requests in a busy cluster when coredns is being scaled down. Signed-off-by: Heiko Voigt <[email protected]>

hagaibarel · 2025-03-25T18:29:51Z

Thanks a lot for the effort!

hagaibarel merged commit ed11181 into coredns:master Mar 25, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve default readiness probe config for shutdown #204

fix: improve default readiness probe config for shutdown #204

hvoigt commented Mar 24, 2025

hagaibarel commented Mar 25, 2025

fix: improve default readiness probe config for shutdown #204

fix: improve default readiness probe config for shutdown #204

Conversation

hvoigt commented Mar 24, 2025

Why is this pull request needed and what does it do?

Which issues (if any) are related?

hagaibarel commented Mar 25, 2025