Skip to content

[Container Tunnel] OtlpEndpointReferenceGatherer hangs forever for containers with OtlpExporterAnnotation when no containers are tunnel-dependent #15041

@ElanHasson

Description

@ElanHasson

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Any container with OtlpExporterAnnotation (e.g. Keycloak, which gets it from AddKeycloak()'s internal .WithOtlpExporter() call) silently hangs during container creation. The Container object is never submitted to DCP — no errors are logged anywhere.

Minimal repro:

var builder = DistributedApplication.CreateBuilder(args);
builder.AddKeycloak("keycloak");
builder.Build().Run();

Root Cause

Introduced by PR #14557 (0b5acf03 — "Enable container tunnel by default"). There is a logic mismatch between three components in DcpExecutor:

Step 1 — PrepareServicesAsync detects OtlpExporterAnnotation on any container and adds the Aspire dashboard to the host dependencies list, creating container-network services for the dashboard endpoints (DcpExecutor.cs#L1250-L1263):

// OTLP exporters do not refer to the OTLP ingestion endpoint via EndpointReference...
if (containers.Any(c => c.TryGetAnnotationsOfType<OtlpExporterAnnotation>(out _)))
{
    var maybeDashboard = _model.Resources.Where(...)
    if (maybeDashboard is HostResourceWithEndpoints dashboardResource)
    {
        hostDependencies.Add(dashboardResource);  // Dashboard added as host dependency
    }
}

Step 2 — GetContainerCreationSetsAsync classifies each container as regular vs tunnel-dependent by checking if it has EndpointReference dependencies on host resources (DcpExecutor.cs#L3007-L3015). But OTLP env vars use HostUrl (not EndpointReference), so no container is classified as tunnel-dependent.

Step 3 — createTunnel task is skipped because tunnelDependent is empty (DcpExecutor.cs#L231). This means AddAllocatedEndpointInfo(executables, AllocatedEndpointsMode.ContainerTunnel) is never called — the dashboard's container-network endpoint is never allocated.

Step 4 — OtlpEndpointReferenceGatherer runs during container creation (DcpExecutor.cs#L2131). For containers with OtlpExporterAnnotation, it creates an EndpointReference to the dashboard OTLP endpoint in the container network context and calls GetValueAsync():

var endpointReference = new EndpointReference(dashboardResource, grpcEndpoint, resourceNetwork);
// ...
var url = await endpointReference.GetValueAsync(vpc, cancellationToken).ConfigureAwait(false);

This calls GetAllocatedEndpointAsync(networkID)ValueSnapshot.GetValueAsync(), which waits on a TaskCompletionSource that never completes because the container-network endpoint was never allocated (Step 3 was skipped).

Summary

Component What it does The problem
PrepareServicesAsync Adds dashboard to host deps when any container has OtlpExporterAnnotation Creates container-network services for dashboard
GetContainerCreationSetsAsync Classifies containers by EndpointReference dependencies OTLP uses HostUrl, not EndpointReference → all containers classified as regular
createTunnel task Allocates container-network endpoints for host resources Skipped because no tunnel-dependent containers
OtlpEndpointReferenceGatherer Resolves dashboard OTLP endpoint in container network context Hangs forever — endpoint never allocated

Affected Resources

Any container resource that has OtlpExporterAnnotation. Currently this includes Keycloak (via AddKeycloak → internal .WithOtlpExporter() call), and any container where the user explicitly calls .WithOtlpExporter().

Workaround

Strip the OtlpExporterAnnotation from affected containers in a BeforeStartEvent handler. The OTLP environment variables still work — they're set by a separate EnvironmentCallbackAnnotation that uses HostUrl and doesn't depend on endpoint allocation:

builder.Eventing.Subscribe<BeforeStartEvent>((_, _) =>
{
    foreach (var a in keycloak.Resource.Annotations.OfType<OtlpExporterAnnotation>().ToList())
    {
        keycloak.Resource.Annotations.Remove(a);
    }

    return Task.CompletedTask;
});

DCP Log Evidence

With DCP_DIAGNOSTICS_LOG_LEVEL=debug:

# No Container object is ever created for keycloak:
grep "ContainerReconciler" dcp-logs/*run-controllers*.log | grep "keycloak"
# Returns nothing

# No container is ever scheduled to start:
grep "Scheduling container start" dcp-logs/*run-controllers*.log | grep "keycloak"
# Returns nothing

DCP does create Service objects for keycloak's endpoints (http, https, management) — the services are set up but there is no container to back them.

Regression Source

Introduced by PR #14557 (0b5acf03 — "Enable container tunnel by default"), which added OtlpEndpointReferenceGatherer and refactored container creation into regular vs tunnel-dependent sets.

Previously attributed to PR #14663 — this is incorrect. The Aspire.Hosting.Keycloak package code is identical between the GOOD and BAD versions; the bug is entirely in Aspire.Hosting core.

Working version: 13.3.0-preview.1.26124.2
Broken version: 13.3.0-preview.1.26124.16

Expected Behavior

Running an AppHost with builder.AddKeycloak("keycloak") should start a Keycloak Docker container, visible in docker ps and in the Aspire dashboard as a running resource.

Steps To Reproduce

  1. Clone the minimal repro: https://github.com/ElanHasson/aspire-keycloak-bug-repro
  2. Run the AppHost:
    cd KeycloakBug.AppHost
    DCP_DIAGNOSTICS_LOG_LEVEL=debug DCP_DIAGNOSTICS_LOG_FOLDER=./dcp-logs DCP_PRESERVE_EXECUTABLE_LOGS=true dotnet run
  3. Observe:
    • The Aspire dashboard starts
    • The keycloak resource appears but never starts
    • docker ps shows no keycloak container
    • No errors in any logs
    • DCP logs confirm no Container object was submitted

Exceptions (if any)

No exceptions. The ValueSnapshot<T>.GetValueAsync() silently blocks on a TaskCompletionSource that never completes. No timeout, no error.

.NET Version info

.NET SDK:
 Version:           10.0.103
 Commit:            c2435c3e0f
 Workload version:  10.0.102
 MSBuild version:   18.0.11+c2435c3e0

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  24.04
 OS Platform: Linux
 RID:         linux-x64

Host:
  Version:      10.0.3
  Architecture: x64
  Commit:       c2435c3e0f

Anything else?

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions