Skip to content

Dispose the certificate chain elements with the chain #62531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jashook
Copy link
Contributor

@jashook jashook commented Jul 1, 2025

Dispose the certificate chain elements within the chain

This pr is going to fix a series of native memory leaks we have seen due to leaking certificates on the chain at Roblox. (fingers crossed)

  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a new feature or change, please open an issue to discuss the change or find an existing issue.

Summary of the changes (Less than 80 chars)

Description

{Detail}

Fixes #{bug number} (in this specific format)

@github-actions github-actions bot added the area-auth Includes: Authn, Authz, OAuth, OIDC, Bearer label Jul 1, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 1, 2025
@jashook jashook marked this pull request as ready for review July 2, 2025 00:42
@jashook jashook requested a review from halter73 as a code owner July 2, 2025 00:42
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Jul 9, 2025
@jashook
Copy link
Contributor Author

jashook commented Jul 9, 2025

Gentle ping @halter73

@jashook
Copy link
Contributor Author

jashook commented Jul 11, 2025

cc @janvorli

@janvorli
Copy link
Member

cc: @rzikm

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is in line with what we do in SslStream

https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Protocol.cs#L1148-L1163

Note that this PR doesn't fix a "leak" per se. The cert instances will be eventually collected by GC and finalization will ensure the native resources are released. However, explicitly disposing the certs is definitely an improvement.

@rzikm
Copy link
Member

rzikm commented Jul 11, 2025

@jashook
Copy link
Contributor Author

jashook commented Jul 12, 2025

Note that this PR doesn't fix a "leak" per se. The cert instances will be eventually collected by GC and finalization will ensure the native resources are released

Yes, and no, for us the rate at which we do tls handshakes outpaces the rate of gc. Which leads to unbounded memory growth, until the gc collects aggressively, at which the application will health check and die.

As in, yes you are correct this is not a native leak from the runtime, but it is effectively a managed leak with native resources which leads to the application degrading and restarting.

cc @leculver

@jashook
Copy link
Contributor Author

jashook commented Jul 12, 2025

Will address the comment. Can we take this into net8?

@jashook
Copy link
Contributor Author

jashook commented Jul 12, 2025

@jashook
Copy link
Contributor Author

jashook commented Jul 12, 2025

Seems like this can be merged. cc @halter73

@jashook
Copy link
Contributor Author

jashook commented Jul 12, 2025

:shipit:

@jashook
Copy link
Contributor Author

jashook commented Jul 17, 2025

Gentle ping

Copy link
Member

@MackinnonBuck MackinnonBuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this pattern is followed elsewhere, so LGTM. Just a couple nits to better follow the coding style of this repo.

cc @halter73 in case you want to take a look as well

@MackinnonBuck
Copy link
Member

Thanks for the contribution, @jashook!

@MackinnonBuck MackinnonBuck merged commit ee03bbd into dotnet:main Jul 18, 2025
28 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the 10.0-preview7 milestone Jul 18, 2025
@jashook
Copy link
Contributor Author

jashook commented Jul 21, 2025

Can we take this into net8? @MackinnonBuck

@jashook jashook deleted the jashook/dispose_full_certificate_chain_on_validate branch July 21, 2025 21:01
@jashook
Copy link
Contributor Author

jashook commented Jul 28, 2025

Gentle ping @MackinnonBuck

@MackinnonBuck
Copy link
Member

Thanks, @jashook. We'll consider this fix for servicing in .NET 8 and .NET 9.

cc @bartonjs in case you have thoughts about the severity of this issue.

@bartonjs
Copy link
Member

The general severity is pretty low. As was already pointed out, the GC will eventually catch up and release all of the stuff. But, since you have an external person asking for it, that gives it a boost.

The risk is also pretty low. And the change looks sound.

I don't actually participate in the servicing reviews, but, if I did, since it has a requestor, is straightforward, and isn't blazing any trails... I'd vote yes.

@jashook
Copy link
Contributor Author

jashook commented Jul 29, 2025

As was already pointed out, the GC will eventually catch up and release all of the stuff.

Right, I responded to this. It is an incorrect way of viewing this problem. It leaves this up to a game of chance between rate of connection establishment and gen0 gc count. In our case, we have high rate of re-establishments which are not going away quickly, and we want to keep our gen 0-1 GC count as low as possible. There is a workaround; however, I guarantee that Roblox is not the only effected customer here.

I don't actually participate in the servicing reviews, but, if I did, since it has a requestor, is straightforward, and isn't blazing any trails... I'd vote yes.

Could we tag who is needed to get this into the next shiproom discussion? Thank you!

Note for anyone else stumbling on this, we are working around this issue by using SslStreamCertificateContext to build the chain once and cache it.

@MackinnonBuck
Copy link
Member

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/aspnetcore/actions/runs/16605934603

@MackinnonBuck
Copy link
Member

/backport to release/8.0

Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/aspnetcore/actions/runs/16605950943

Copy link
Contributor

@MackinnonBuck backporting to "release/8.0" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Dispose the certificate chain elements with the chain
Applying: Fix the missing brace
Applying: Remove snarky comment.
Applying: Add another choice using based on review feedback
error: sha1 information is lacking or useless (src/Shared/CertificateGeneration/UnixCertificateManager.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0004 Add another choice using based on review feedback
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

MackinnonBuck added a commit that referenced this pull request Jul 29, 2025
* Dispose the certificate chain elements with the chain

* Fix the missing brace

* Remove snarky comment.

* Add another choice using based on review feedback

* Styling fixes

---------

Co-authored-by: Mackinnon Buck <[email protected]>
MackinnonBuck added a commit that referenced this pull request Jul 29, 2025
* Dispose the certificate chain elements with the chain

* Fix the missing brace

* Remove snarky comment.

* Add another choice using based on review feedback

* Styling fixes

---------

Co-authored-by: Mackinnon Buck <[email protected]>
@rzikm
Copy link
Member

rzikm commented Jul 30, 2025

Note for anyone else stumbling on this, we are working around this issue by using SslStreamCertificateContext to build the chain once and cache it.

@jashook would you mind elaborating a bit how that is expected to help? the changed code paths are on remote cert validation paths which don't create their own SslStreamCerificateContext.

If you mean that that you use cached SslStreamCertificateContext to specify local certificates, that is the recommended practice, especially if your app sees higher amount of traffic (in addition to avoiding rebuilding the cert chain, it also enables TLS Resume on Linux). AFAIK In case of server certificate, unless you use the certificate selection callback, the ASP.NET Core runtime will even cache the context for you automatically. Making sure that SslStreamCertificateContext is not created fresh for each connection is just another way to save CPU cycles and reduce the number of X509Certificate2 instances flying around and is orthogonal to changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-auth Includes: Authn, Authz, OAuth, OIDC, Bearer community-contribution Indicates that the PR has been added by a community member pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants