Skip to content

Conversation

martinjt
Copy link
Contributor

@martinjt martinjt commented Mar 26, 2025

**Closes #602 **

PR Checklist

  • Created a feature/dev branch in your fork (vs. submitting directly from a commit on main)
  • Based off latest main branch of toolkit
  • PR doesn't include merge commits (always rebase on top of our main, if needed)
  • New integration
    • Docs are written
    • Added description of major feature to project description for NuGet package (4000 total character limit, so don't push entire description over that)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Contains NO breaking changes
  • Every new API (including internal ones) has full XML docs
  • Code follows all style conventions

Other information

@martinjt
Copy link
Contributor Author

@dotnet-policy-service agree

@martinjt
Copy link
Contributor Author

I'm not sure how ready this is yet. I've been playing with the API a bit to give it enough flexibility around the certificates, images and config files.

Feedback very welcome, I wanted to get this up as is before I added any tests and worked on the documentation, etc.

Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense that, rather than having a default collector image set, that it's a required property to be provided and there's a method that will set the defaults for you?

So, AddOpenTelemetryCollector is used if you want to specify a collector, whereas AddOpenTelemetryDefaultCollection uses the one that you have currently specified as a default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File no longer needed

@martinjt
Copy link
Contributor Author

Interesting approach, I'm honestly not sure, but maybe a .WithDefaultImage() might be better?

I was thinking that the approach I showed was more common, but happy if not.

I kinda wanted a minimal default command though.

@aaronpowell
Copy link
Member

Interesting approach, I'm honestly not sure, but maybe a .WithDefaultImage() might be better?

I was thinking that the approach I showed was more common, but happy if not.

I kinda wanted a minimal default command though.

The problem with a method like WithDefaultImage is that you could setup one without any image specified which would be a runtime error, which would be a less than desirable outcome.

@github-actions github-actions bot added the Stale label Apr 6, 2025
@aaronpowell aaronpowell added awaiting response Waiting for the author of the issue to provide more information or answer a question and removed Stale labels Apr 8, 2025
@martinjt
Copy link
Contributor Author

martinjt commented Apr 9, 2025

I agree it's not ideal, but I'd prefer to not have them need to know what the image link is to collector. So I believe the best option is to use the default image we recommend in Otel right now (which is latest) when the extension is called without parameters.

@martinjt martinjt force-pushed the otel-collector-component branch from 7ac9a9d to a38c078 Compare August 5, 2025 20:39
@martinjt
Copy link
Contributor Author

martinjt commented Aug 5, 2025

I'm coming back at this now @aaronpowell, with a hope to move it forward so I don't need to maintain a separate one (happy to be codeowner here).

On the devcerts, it seems those issues haven't got a resolution, would it be ok to leave the devcerts in this project for now, and consolidate/move later? This project would actually fix both of those issues as the users could run the non-ssl collector and forward their data through the collector as a way to unblock them.

The big outstanding issue is the collector image model, can you let me know what you'd suggest there. It should be latest since there isn't a v1 of the collector yet, and the advice from the collector team is to always use latest.

@aaronpowell
Copy link
Member

On the devcerts, it seems those issues haven't got a resolution, would it be ok to leave the devcerts in this project for now, and consolidate/move later? This project would actually fix both of those issues as the users could run the non-ssl collector and forward their data through the collector as a way to unblock them.

If you can pop it in the Shared folder we can then add it as a linked file into several projects that could benefit from it in the CT as well.

The big outstanding issue is the collector image model, can you let me know what you'd suggest there. It should be latest since there isn't a v1 of the collector yet, and the advice from the collector team is to always use latest.

What's the release frequency that they are going with? The main reason that we pin to a version is so that we don't risk any unexpected changes to command line args or other behaviour getting introduced that impact someone who hasn't opt-ed in to using a newer version. If they are releasing new versions daily, then pinned isn't ideal, but generally speaking, pinning a version is going to be preferred, and people can always specify a version override if we don't release an updated package quick enough (Ollama is a good example of where we get caught on that front).

@martinjt
Copy link
Contributor Author

martinjt commented Aug 6, 2025

The release cycle for the collector is every 2 weeks. The compatibility issues wouldn't be commandline args, its more likely the config of collector components that might change, but we haven't done a breaking change in about 6 months and those are expected to be the last.

The biggest issue with pinning a version is security issues, since the amount of components in the collector means that the updates contain those updates regularly.

I'll happily take guidance on what the best way to get this published is.

@martinjt martinjt force-pushed the otel-collector-component branch from 0ae0e03 to 7c26b63 Compare August 7, 2025 20:57
@martinjt
Copy link
Contributor Author

martinjt commented Aug 7, 2025

I've done a few major updates, so it probably needs a full re-review.

  • Implemented inbound SSL, which can also be forced off (this helps fix some of the clients that can't use HTTPS because of the certificate issues)
  • Outbound SSL was removed, it won't work until the Certificate supports host.docker.internal or the *.localhost stuff works.
  • Moved to using .WithConfig("config.yaml"), I'm not sure how I'd fail startup if they don't provide that, I'm happy to implement it.
  • .WithConfig() supports multiple configs
  • Moved to using Random ports supplied by Aspire
  • Added SSL certificates for HTTP and gRPC
  • Made the Environment variables detect the protocol that the App wants and use that (useful as Node, by default, supports HTTP not gRPC)
  • Support for both of the endpoint environment variable names ASPIRE_DASHBOARD_OTLP_ENDPOINT_URL and DOTNET_DASHBOARD_OTLP_ENDPOINT_URL
  • Updated DevCerts to not try and add the certificates twice (this will only be useful when the Outbound SSL functionality is available.

This does require the user to implement insecure_skip_verify, but I can't find a way around that until the new certificate work to make the authority match.

@martinjt martinjt marked this pull request as ready for review August 7, 2025 21:35
@Copilot Copilot AI review requested due to automatic review settings August 7, 2025 21:35
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an OpenTelemetry Collector integration to the .NET Aspire Community Toolkit, enabling users to add an OpenTelemetry Collector container to their Aspire applications for observability data collection and forwarding.

  • Introduces a new hosting integration with configurable settings for the OpenTelemetry Collector
  • Provides automatic HTTPS certificate handling for secure connections in development mode
  • Implements environment variable hooks to automatically configure project resources to send telemetry data to the collector

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Shared/DevCertHostingExtensions.cs Shared utility for injecting ASP.NET Core HTTPS dev certificates into resources via environment variables
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/OpenTelemetryCollectorSettings.cs Configuration settings class for the OpenTelemetry Collector
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/EnvironmentVariableHook.cs Lifecycle hook to automatically configure project resources with collector endpoints
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector.csproj Project file for the OpenTelemetry Collector integration
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CollectorResource.cs Resource definition for the OpenTelemetry Collector container
src/CommunityToolkit.Aspire.Hosting.OpenTelemetryCollector/CollectorExtensions.cs Extension methods for adding and configuring the OpenTelemetry Collector
CommunityToolkit.Aspire.slnx Solution file updated to include the new project

@davidfowl davidfowl removed the awaiting response Waiting for the author of the issue to provide more information or answer a question label Aug 9, 2025
Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change here would be to use the event callbacks of 9.4.

Then would you be able to introduce an example project so I can get a better idea of how this works.

@martinjt
Copy link
Contributor Author

I've done everything suggested other than moving the EnvironmentVariableHook. Since this needs to occur after all the resources had endpoints allocated, I'm struggling to see how I might achieve that.

I did have a thought of using OnAfterEndpointsAllocatedEvent, but that's deprecated, and then thought of using OnResourceEndpointsAllocated event on the collector resource, then access each resource to add that same event to add the endpoint, but I'm not sure how to access the eventing model from there?

@aaronpowell
Copy link
Member

Yes, sorry, I was mistaken, OnResourceEndpointsAllocated is the event that you'd want on the collector resource.

Is the logic that you'd have multiple collectors registered, or is it a singleton sort of thing?

@martinjt
Copy link
Contributor Author

The collector is a service that intercepts the OTLP telemetry from all the other services, so it intercepts all the other project resources and overrides the OTLP environment variable to point it to the collector instead.

For that reason, I thought that using the resource focused events wasn't something I could do.

This is instead of telling the user to access every resource and reference it to the collector.

@martinjt
Copy link
Contributor Author

@davidfowl let me know what you think of the new DevCerts code. I used yours as a base and updated it with optional parameters.

If thats OK, I'll build out the examples.

@martinjt martinjt force-pushed the otel-collector-component branch from 5d2da46 to 5f9f442 Compare August 29, 2025 17:26
@martinjt
Copy link
Contributor Author

Other than tests, I think this is pretty much there now @aaronpowell @davidfowl

I've struggled with the tests because the main thing to test is the AppForwarding (testing that all OTHER resources have their OTEL endpoint environment variables updated to point to the collector). This means that the endpoints need to be allocated in the collector before the event is fired.

The only other tests I can think of that would be beneficial are the environment variables for the Dashboard endpoints, and the devcerts, I'm not sure how we can test those.

If you've got pointers on the tests then please let me know!

@martinjt
Copy link
Contributor Author

Looks like it should all be resolved now.

@aaronpowell
Copy link
Member

Tests need to be added to the workflow file, since we run a job-per-test. Check https://github.com/CommunityToolkit/Aspire/actions/runs/17362288128/job/49289169742?pr=603#step:4:11

Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'd like to see is more tests included.

I understand that it's hard to test the running of this integration, so instead, let's focus the tests on the API surface area and that the things it does are what we expect.

We can have tests that the appropriate annotations are setup and that the devcert stuff can be configured (again, probably looking at annotations, maybe at the additional resource is included).

@martinjt
Copy link
Contributor Author

martinjt commented Sep 1, 2025

I've added some more tests. I don't think we'll be able to test the DevCerts because it requires run mode. This is because the annotations being added are based on the output from the DevCerts extraction.

I suppose we could change the code to make that do something "dummy" during non-run mode? However, that doesn't feel like the right thing to do?

Copy link
Member

@aaronpowell aaronpowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bunch more tests which sees the coverage boosted to a bit over 60% (which is our target).

@martinjt can you have a look over the tests to see if they make sense. Also, are you going to go down as the code owner, if so, can you add yourself to the CODEOWNERS file and I'll sort out the permissions.

</PropertyGroup>

<ItemGroup>
<ProjectReference Include="../CommunityToolkit.Aspire.Testing/CommunityToolkit.Aspire.Testing.csproj" />
Copy link
Contributor

@davidfowl davidfowl Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..\
../

😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go complain to the editor teams - I added the reference via devkit on my devcontainer 😝

@aaronpowell aaronpowell merged commit b72e5d5 into CommunityToolkit:main Sep 5, 2025
104 checks passed
if (!settings.ForceNonSecureReceiver && isHttpsEnabled && builder.ExecutionContext.IsRunMode && builder.Environment.IsDevelopment())
{
resourceBuilder.RunWithHttpsDevCertificate();
var certFilePath = Path.Combine(DevCertHostingExtensions.DEV_CERT_BIND_MOUNT_DEST_DIR, DevCertHostingExtensions.CERT_FILE_NAME);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is an issue with these two lines. On Windows, Path.Combine uses two backslashes, which causes the collector container to fall-over on startup, complaining about yaml format. I've tried replacing the line with a string.Join('/', ...) to force it to use forward slashes, which works correctly, e.g.

var certFilePath = string.Join('/', DevCertHostingExtensions.DEV_CERT_BIND_MOUNT_DEST_DIR, DevCertHostingExtensions.CERT_FILE_NAME);

Not the most elegant approach, but it has fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenTelemetry Collector extension
4 participants