[Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui #13766

philippkahr · 2025-05-02T16:15:52Z

Agent scalability

When you start to add more and more agents to your cluster, you should change the fleet settings. This is not possible because the fleet policy per-se in ECH is managed and therefore not changeable. I interact with a lot of folks that have airgapped and or onpremise setups where we add additional fleet servers.

Now the questions start here, is that we make it very hard to know what to actually set in the fleet server. We have the scalability guide: https://www.elastic.co/guide/en/fleet/8.6/fleet-server-scalability.html#recommend-settings-scaling-agents which in version 8.6 still lists a table with suggested values to add to the fleet servers. With any version >8.6 we do not have that table anymore and this is a problem, because I don't know what exactly I should add.

This also means, that this PR is based on values that are a lot of versions old and therefore could be outdated. One of the main issues in addition to this is, that we simply list the settings without any reference values.

The simplest form that I found was to add a simple bool toggle for each sizing and append the needed settings into the agent.yml.hbs. All toggles are off per default and we use the default values that ECH, ECE, ECK sets. I do not want to change the defaults.

two things I would like to do:

enable only one toggle to be active at a time
remove the duplicated things in the agent.yml.hbs and simplify this

The way I tested it is the following way, and since this is a pretty major change nonetheless, I hope that we can test it in even a better way.

elastic-package test which didn't show any error.
elastic-package install and then checked out the new version, upgraded the integration in the UI and watched as the fleet server stayed healthy. Did a diagnostic of the fleet server
elastic-agent-diagnostics-2025-05-02T16-02-33Z-00.zip and saw that the config correctly added the settings for 5.000.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
~~- [ ] I have verified that any added dashboard complies with Kibana's Dashboard good practices ~~

Related issues

Relates: [REQUEST]: Fleet server scalability table ingest-docs#1790

Screenshots

elasticmachine · 2025-05-02T16:18:13Z

Pinging @elastic/fleet (Team:Fleet)

kpollich · 2025-05-02T16:25:32Z

packages/fleet_server/agent/input/agent.yml.hbs

@@ -14,3 +14,96 @@ server:
 {{#if custom}}
 {{custom}}
 {{/if}}
+


I think we'd still want to support overriding any of these settings via the custom YML box, so these should all appear above the custom block above, assuming the last value in this file will take precedence.

cmacknz · 2025-05-02T19:58:03Z

packages/fleet_server/agent/input/agent.yml.hbs

@@ -14,3 +14,96 @@ server:
 {{#if custom}}
 {{custom}}
 {{/if}}
+
+
+{{#if fleet_scalability_5000}}


These limits are all encoded into Fleet server, with the max_agents field behaving as a simplified way to configure them. This is meant to work like the Elasticsearch output presets where the preset selector is the number of agents.

The limits for each number of agents are available in https://github.com/elastic/fleet-server/tree/main/internal/pkg/config/defaults which is where Fleet server reads them from when it is compiled.

I think we should allow customizing these, but I don't think we should duplicate them here. We could put the values back into the documentation though for people to use as overrides, but I don't think we should duplicate them here. They will just go out of date as they exist in two places.

Probably the documentation around this overall needs to improve.

ok, so max_agents, is absolutely confusing. How should I know that? I think this should be renamed to expected agents communicating with fleet or something like that, that is much more verbose and then be a dropdown selector that is like this:

Expected Agents connecting to the fleet: < 1000 < 5000 < 10.000 < 30.000 < 50.000

I would be way to scared to put something into the max_agents, because it sounds like if I put 100 in there, and then I want to connect a 101 agent, it doesn't work and it sends me down a spiral debugging.

Not sure why it's confusing. We are asking the user to tell us how many agents they have in their deployment, based on that we determine what the right value should be configured for those variables. User should not need to know what value is being configured. We reserve the right to change the value for any of those variables. As Craig mentions these are similar to presets.

elastic-sonarqube · 2025-05-05T11:35:52Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2025-05-05T11:36:12Z

💚 Build Succeeded

Buildkite Build
Commit: 1747794

History

💚 Build #25447 succeeded 5af8d56

Added support for the fleet scalability

5af8d56

philippkahr added the enhancement New feature or request label May 2, 2025

philippkahr requested a review from a team as a code owner May 2, 2025 16:15

philippkahr mentioned this pull request May 2, 2025

[REQUEST]: Fleet server scalability table elastic/ingest-docs#1790

Open

kpollich added the Team:Fleet Fleet team [elastic/fleet] label May 2, 2025

kpollich added the Integration:fleet_server Fleet Server label May 2, 2025

kpollich changed the title ~~Added support for the fleet scalability settings as direct toggles int he fleet ui~~ [Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui May 2, 2025

kpollich reviewed May 2, 2025

View reviewed changes

cmacknz reviewed May 2, 2025

View reviewed changes

Rework the wording of the variables?

1747794

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui #13766

[Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui #13766

philippkahr commented May 2, 2025 •

edited

Loading

elasticmachine commented May 2, 2025

kpollich May 2, 2025

cmacknz May 2, 2025

philippkahr May 2, 2025

nimarezainia May 5, 2025

elastic-sonarqube bot commented May 5, 2025

elasticmachine commented May 5, 2025

[Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui #13766

Are you sure you want to change the base?

[Fleet Server] Added support for the fleet scalability settings as direct toggles in fleet ui #13766

Conversation

philippkahr commented May 2, 2025 • edited Loading

Agent scalability

Checklist

Related issues

Screenshots

elasticmachine commented May 2, 2025

kpollich May 2, 2025

Choose a reason for hiding this comment

cmacknz May 2, 2025

Choose a reason for hiding this comment

philippkahr May 2, 2025

Choose a reason for hiding this comment

nimarezainia May 5, 2025

Choose a reason for hiding this comment

elastic-sonarqube bot commented May 5, 2025

Quality Gate passed

elasticmachine commented May 5, 2025

💚 Build Succeeded

History

philippkahr commented May 2, 2025 •

edited

Loading