feat: preflight for at least 1000 fs.inotify.max_user_instances #1763

laverya · 2025-01-29T20:09:56Z

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR require a test?

Does this PR require a release note?

Does this PR require documentation?

github-actions · 2025-01-29T20:25:32Z

This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID.

Online Installer:

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci/appver-dev-7d574c1" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Airgap Installer (may take a few minutes before the airgap bundle is built):

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci-airgap/appver-dev-7d574c1?airgap=true" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Happy debugging!

adamancini

lgtm

ajp-io · 2025-01-29T20:44:29Z

Why does the limit need to be this high? Can EC necessarily not work unless the limit is this high? When I get a GCP instance, the limit is 128. I'm hesitant to add preflights that will fail unless they're absolutely necessary.

ajp-io · 2025-01-29T20:47:14Z

If it is absolutely necessary, we could also consider setting this sysctl like we do for ip forwarding, and then the preflight only fails if we can't set it ourselves at the beginning. That way we don't introduce a lot of friction with failing preflights.

adamancini · 2025-01-29T20:49:01Z

@ajp-io the number can be up to 2^31, based on how much RAM is available to the system - a more precise estimate probably depends heavily on the # of pods that you deploy and what they're doing. 8192 is probably high enough to cover the majority of clusters, but this is kind of a "you have to tune this" parameter in the kernel. I have run into problems during upgrades in my EC cluster on GCP with the default 128 in Ubuntu.

I often don't hit this on a fresh cluster but only after application pods start to get deployed.

I imagine this event had helm charts installed as part of EC, so a greater than averge number of pods are started at first installation

adamancini · 2025-01-29T20:50:16Z

@ajp-io I do think we should probably put it into a drop-in - there are some cases where sysctl dropins get overridden by end user configs, like via Puppet or Ansible - so I support a preflight, supportbundle analyzer, and we should install a sysctl drop-in config.

jtuchscherer · 2025-01-29T21:16:23Z

Let's hold off on merging and talk through the implications of adding this pre-flight check

laverya · 2025-01-29T21:27:14Z

Why does the limit need to be this high? Can EC necessarily not work unless the limit is this high? When I get a GCP instance, the limit is 128. I'm hesitant to add preflights that will fail unless they're absolutely necessary.

I was just on a support case where 128 was preventing k0s from functioning properly - that's what prompted this

ajp-io · 2025-01-30T13:25:53Z

a more precise estimate probably depends heavily on the # of pods that you deploy and what they're doing

If this is customer-specific, would it be better to support vendor-supplied host preflight checks and have ITRS and anyone else who needs it set this themselves? ITRS seems like they have more charts (which I image translates to more pods) than many/most vendors. That makes me think many people won't be affected by this issue but will potentially face unneeded preflight failures anyway.

preflight for at least 1000 fs.inotify.max_user_instances

01facc8

laverya added the type::feature label Jan 29, 2025

update value to 8192, add max_user_watches

b63b59f

adamancini reviewed Jan 29, 2025

View reviewed changes

adamancini previously approved these changes Jan 29, 2025

View reviewed changes

1024 due to CI limits

7d574c1

laverya dismissed adamancini’s stale review via 7d574c1 January 29, 2025 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: preflight for at least 1000 fs.inotify.max_user_instances #1763

feat: preflight for at least 1000 fs.inotify.max_user_instances #1763

Uh oh!

laverya commented Jan 29, 2025

Uh oh!

github-actions bot commented Jan 29, 2025 •

edited

Loading

Uh oh!

adamancini left a comment

Uh oh!

ajp-io commented Jan 29, 2025

Uh oh!

ajp-io commented Jan 29, 2025

Uh oh!

adamancini commented Jan 29, 2025

Uh oh!

adamancini commented Jan 29, 2025

Uh oh!

jtuchscherer commented Jan 29, 2025

Uh oh!

laverya commented Jan 29, 2025

Uh oh!

ajp-io commented Jan 30, 2025

Uh oh!

Uh oh!

feat: preflight for at least 1000 fs.inotify.max_user_instances #1763

Are you sure you want to change the base?

feat: preflight for at least 1000 fs.inotify.max_user_instances #1763

Uh oh!

Conversation

laverya commented Jan 29, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR require a test?

Does this PR require a release note?

Does this PR require documentation?

Uh oh!

github-actions bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamancini left a comment

Choose a reason for hiding this comment

Uh oh!

ajp-io commented Jan 29, 2025

Uh oh!

ajp-io commented Jan 29, 2025

Uh oh!

adamancini commented Jan 29, 2025

Uh oh!

adamancini commented Jan 29, 2025

Uh oh!

jtuchscherer commented Jan 29, 2025

Uh oh!

laverya commented Jan 29, 2025

Uh oh!

ajp-io commented Jan 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Jan 29, 2025 •

edited

Loading