Skip to content

Allow logs to fully flush before disposing of DcpExecutor. #8423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

afscrome
Copy link
Contributor

Description

Updated DcpExecutor to wait for log streams to fully flush during StopAsync, rather than immediately cancelling them. Log streams should shut down once the associated resource is shut down and the logs have been processed.

Previous behaviour could cause logs from the very end of a process to be lost as the log streams could be shut down before the logs were forwarded from DCP to ILogger. This was particularly problematic for DistributedApplicationBuilderTests failing whilst waiting for a resource to enter a particular state as the final lines of error are usually the ones you want to work out why a container / executable exited / failed to start.

Fixes #8206

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No
  • Does the change require an update in our Aspire docs?

@afscrome afscrome requested a review from ReubenBond as a code owner March 30, 2025 21:48
@github-actions github-actions bot added the area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication label Mar 30, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 30, 2025
@afscrome afscrome force-pushed the dont-swallow-final-logs branch from fe13058 to 978ccb0 Compare March 30, 2025 23:14
Updated `DcpExecutor` to wait for log streams to fully flush during `StopAsync`, rather than immediately cancelling them.
 Log streams should shut down once the associated resource is shut down and the logs have been processed.

Previous behaviour could cause logs from the very end of a process to be lost as the log streams could be shut down before the logs were forwarded from DCP to ILogger.
This was particularly problematic for DistributedApplicationBuilderTests failing whilst waiting for a resource to enter a particular state as the final lines of error are usually the ones you want to work out why a container / executable exited / failed to start.

Fixes dotnet#8206
@afscrome afscrome force-pushed the dont-swallow-final-logs branch from 978ccb0 to 1b678fd Compare March 30, 2025 23:17
@danmoseley danmoseley requested a review from karolz-ms April 4, 2025 18:19
@danmoseley
Copy link
Member

Seems reasonable but it's not my area. Maybe Karol can review

if (_resourceWatchTask is { } resourceTask)
{
tasks.Add(resourceTask);
}

foreach (var (_, (cancellation, logTask)) in _logStreams)
{
cancellation.Cancel();
cancellationToken.Register(cancellation.Cancel);
Copy link
Member

@karolz-ms karolz-ms Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am not understanding the change, but I am not sure it has the effect you intended.

If you want to stop DCP and then stop the log streams, you should do just that, i.e. move the call that waits on the log stream tasks to end so that it happens after the call to stop DCP.

The cancellation token passed to StopAsync() is a cancellation token for the whole stopping operation. It may be cancelled after a long time, or never (CancellationToken.None), meaning that with your proposed change the log streams may never be cancelled, depending on what the cancellation token is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad, I meant to abandon this PR - see #8206 (comment). Whilst I got my failing test passing, I broke others as I stopped Aspire from shutting down any still running resources - unintended effects as you said.

I assume this needs some changes on DCP side to ensure when DCP is shutting down, it fully flushes logs before finishing the shutdown process (although without seeing DCP code it's hard for me to tell).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely no worries. We are in huge debt to you for all the help you have given us with making Aspire better.

I think though you were on the right track with this change. Log capture from services and streaming them to clients is asynchronous in DCP, so there is a bit of lag, but it will keep going until the API server shuts down, which is after the resource cleanup is complete. So at least part of the fix IMO should be to stop cancelling the log streams before app host requests DCP shutdown.

I also agree though that the second part might be fine-tuning how we handle the logs during shutdown inside DCP. Specifically, in the current implementation, there is no delay between the end of resource cleanup and the shutdown of the DCP/DCP API server. So there might be cases when logs are captured, resource (Executable or Container) is stopped, but a portion of the logs are not streamed to client before the stream is closed from DCP side because DCP is going away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Final container logs missing when disposing DistributedApplicationTestingBuilder
3 participants