-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: preflight for at least 1000 fs.inotify.max_user_instances #1763
base: main
Are you sure you want to change the base?
Conversation
This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID. Online Installer:
Airgap Installer (may take a few minutes before the airgap bundle is built):
Happy debugging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Why does the limit need to be this high? Can EC necessarily not work unless the limit is this high? When I get a GCP instance, the limit is 128. I'm hesitant to add preflights that will fail unless they're absolutely necessary. |
If it is absolutely necessary, we could also consider setting this sysctl like we do for ip forwarding, and then the preflight only fails if we can't set it ourselves at the beginning. That way we don't introduce a lot of friction with failing preflights. |
@ajp-io the number can be up to 2^31, based on how much RAM is available to the system - a more precise estimate probably depends heavily on the # of pods that you deploy and what they're doing. 8192 is probably high enough to cover the majority of clusters, but this is kind of a "you have to tune this" parameter in the kernel. I have run into problems during upgrades in my EC cluster on GCP with the default 128 in Ubuntu. I often don't hit this on a fresh cluster but only after application pods start to get deployed. I imagine this event had helm charts installed as part of EC, so a greater than averge number of pods are started at first installation |
@ajp-io I do think we should probably put it into a drop-in - there are some cases where sysctl dropins get overridden by end user configs, like via Puppet or Ansible - so I support a preflight, supportbundle analyzer, and we should install a sysctl drop-in config. |
Let's hold off on merging and talk through the implications of adding this pre-flight check |
I was just on a support case where 128 was preventing k0s from functioning properly - that's what prompted this |
If this is customer-specific, would it be better to support vendor-supplied host preflight checks and have ITRS and anyone else who needs it set this themselves? ITRS seems like they have more charts (which I image translates to more pods) than many/most vendors. That makes me think many people won't be affected by this issue but will potentially face unneeded preflight failures anyway. |
What this PR does / why we need it:
Which issue(s) this PR fixes:
Does this PR require a test?
Does this PR require a release note?
Does this PR require documentation?