Skip to content

Add otel component manager to the coordinator #8529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Jun 16, 2025

What does this PR do?

Adds a new manager object for running components (in the sense of the agent's Component Model) in an otel collector.

Doing this involves two basic activities:

  • Translating agent configurations into beats receiver configurations for the otel collector.
  • Translating otel collector statuses into component states.

Up until now, the logic for these was haphazardly spread across the agent coordinator. This PR moves all of it into a new object - the OtelComponentManager - which can run both raw otel collector configurations and agent components in a single otel collector instance.

This new manager encapsulates all the logic involved in interfacing between the agent coordinator and the otel collector. In the near future, it will also take on additional responsibilities, like generating diagnostics for components it runs.

The only new logic this PR introduces lives in the new manager's main loop, and has to do with how updates and configurations are moved around. The rest is either existing logic moved to a new location, and new tests for that old logic.

Why is it important?

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Related issues

@swiatekm swiatekm added backport-8.19 Automated backport to the 8.19 branch skip-changelog chore Tasks that just need to be done, they are neither bug, nor enhancements labels Jun 16, 2025
@swiatekm swiatekm changed the title Chore/otel component manager Add otel component manager to the coordinator Jun 16, 2025

// MergedOtelConfig returns the merged Otel collector configuration, containing both the plain config and the
// component config.
MergedOtelConfig() *confmap.Conf
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is temporary, and is only necessary for diagnostics. The plan is to move all the Otel-related diagnostic hooks into the OtelComponentManager. Afterwards, there won't be any reason for the coordinator to track this value.

@swiatekm swiatekm force-pushed the chore/otel-component-manager branch from 379032c to 5275554 Compare June 17, 2025 09:43
Copy link
Contributor

mergify bot commented Jun 18, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b chore/otel-component-manager upstream/chore/otel-component-manager
git merge upstream/main
git push upstream chore/otel-component-manager

@swiatekm swiatekm force-pushed the chore/otel-component-manager branch 6 times, most recently from efa7f79 to fb1b5c6 Compare June 25, 2025 13:08
@swiatekm swiatekm marked this pull request as ready for review June 25, 2025 15:25
@swiatekm swiatekm requested a review from a team as a code owner June 25, 2025 15:25
@swiatekm swiatekm added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jun 25, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm added enhancement New feature or request and removed chore Tasks that just need to be done, they are neither bug, nor enhancements labels Jun 25, 2025
@swiatekm swiatekm requested review from cmacknz and blakerouse June 25, 2025 15:26
Comment on lines +121 to +147
for ctx.Err() == nil {
select {
case <-ctx.Done():
break

case collectorCfg := <-m.collectorUpdateChan:
if err := m.handleCollectorUpdate(collectorCfg); err != nil {
m.reportError(ctx, err)
}

case componentModel := <-m.componentUpdateChan:
if err := m.handleComponentUpdate(componentModel); err != nil {
m.reportError(ctx, err)
}

case err := <-m.otelManager.Errors():
m.reportError(ctx, err)

case otelStatus := <-m.otelManager.Watch():
componentUpdates, err := m.handleOtelStatusUpdate(otelStatus)
if err != nil {
m.reportError(ctx, err)
}
m.sendCollectorStatusUpdate(ctx)
m.sendComponentStateUpdates(ctx, componentUpdates)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the more I look at the newly introduced otel component manager the more I think that this shouldn't exist and every extra bit of logic here should become part of the existing otel manager. AFAICT every channel or other input you have here is already existing there and the transition of the logic seems straight-forward. I think that by migrating the logic there it will be easier for us to maintain/debug/extend the otel manager as we won't have another concurrent layer of channels to pass through, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do that, yeah. The current implementation is nice in that it's only concerned with translating to and from the component model, and then the OtelManager itself is only concerned with managing the otel collector, with no dependencies on the component model. But you're right that this separation may not be worth the mental overhead of the additional asynchronous behavior it introduces.

To be honest, one of the main reasons I did it this way was because it let me not interfered with #8248. Would you be ok with merging this PR as-is, and moving all of the logic into the otel manager after the dust settles?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8248 is ready to be merged; there was a hiccup with unit-tests on windows runners yesterday but @cmacknz had approved it, so this is close 🙂 That said, how about avoiding this back and forth and starting with the transition of this code there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see how much of a lift it is and get back to you. If it's simple enough, I don't mind doing it now rather than later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this has to go inside the otel manager either way. So yes do have a look but if it is too much to lift with #8248 then this should go inside otel manager as it is now on main and #8248 will be adjusted to accommodate for your changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did what you suggested in #8737. I'm not sure I feel better about it than this PR. OtelManager feels like it does too much with those changes imo. I don't have very strong feelings either way though, so please have a look and let me know what you think.

Copy link
Contributor

mergify bot commented Jun 27, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b chore/otel-component-manager upstream/chore/otel-component-manager
git merge upstream/main
git push upstream chore/otel-component-manager

@swiatekm swiatekm force-pushed the chore/otel-component-manager branch from fb1b5c6 to 69d2900 Compare June 27, 2025 13:05
Copy link
Contributor

mergify bot commented Jun 27, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b chore/otel-component-manager upstream/chore/otel-component-manager
git merge upstream/main
git push upstream chore/otel-component-manager

@swiatekm swiatekm force-pushed the chore/otel-component-manager branch from 69d2900 to 21a6306 Compare June 30, 2025 14:14
Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @swiatekm

@leehinman leehinman requested a review from faec July 1, 2025 13:09
Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks great to me code-wise, though I think I agree with @pkoutsovasilis that it would be nicer to add the component management to the existing otel manager as in #8737. (But control plane consensus should take precedence.)

@swiatekm swiatekm mentioned this pull request Jul 4, 2025
4 tasks
@swiatekm swiatekm requested a review from pkoutsovasilis July 4, 2025 16:28
@swiatekm
Copy link
Contributor Author

swiatekm commented Jul 4, 2025

This change looks great to me code-wise, though I think I agree with @pkoutsovasilis that it would be nicer to add the component management to the existing otel manager as in #8737. (But control plane consensus should take precedence.)

That's a second vote for the approach in #8737, so I'm going to move this one into draft and put the other one up for review instead.

@swiatekm swiatekm marked this pull request as draft July 4, 2025 16:29
Copy link
Contributor

mergify bot commented Jul 8, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b chore/otel-component-manager upstream/chore/otel-component-manager
git merge upstream/main
git push upstream chore/otel-component-manager

@swiatekm
Copy link
Contributor Author

#8737 was merged, closing this.

@swiatekm swiatekm closed this Jul 14, 2025
@swiatekm swiatekm deleted the chore/otel-component-manager branch July 14, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch enhancement New feature or request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants