Fix: High idle CPU in RDS/Terminal Server environments (SCARD_E_INVALID_HANDLE)#445
Fix: High idle CPU in RDS/Terminal Server environments (SCARD_E_INVALID_HANDLE)#445DennisDyallo wants to merge 4 commits intodevelopfrom
Conversation
…ard listener (#434) When an RDS/Terminal Server session is disconnected, the Windows Smart Card Service invalidates existing SCARDCONTEXT handles. DesktopSmartCardDeviceListener called SCardGetStatusChange in a tight loop with the stale handle — WinSCard internally raises a C++ exception (CxxThrowException) for each call, pegging a CPU core. Fix: - Add SCARD_E_INVALID_HANDLE, SCARD_E_SYSTEM_CANCELLED, ERROR_BROKEN_PIPE to the UpdateContextIfNonCritical recovery switch - Add Thread.Sleep(1000) backoff after recovery to prevent secondary tight loop - Guard UpdateCurrentContext against failed SCardEstablishContext - Extract ISCardInterop interface enabling cross-platform unit tests without hardware Tests: - DesktopSmartCardDeviceListenerSCardErrorTests: cross-platform mock tests (Track B) - DesktopSmartCardDeviceListenerWindowsTests: Windows CPU regression test (Track A) measuring TotalProcessorTime before/after handle invalidation via real WinSCard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a high-idle-CPU regression in DesktopSmartCardDeviceListener seen in RDS/Terminal Server reconnect scenarios by handling invalidated smart-card contexts, adding recovery backoff, and improving testability via an injectable SCard interop abstraction.
Changes:
- Extracts the WinSCard/PCSC calls behind
ISCardInteropwith a productionSCardInteropimplementation to enable deterministic unit testing. - Extends non-critical error recovery to include
SCARD_E_INVALID_HANDLE,SCARD_E_SYSTEM_CANCELLED, andERROR_BROKEN_PIPE, and adds a 1s recovery backoff to prevent tight retry loops. - Adds cross-platform mock tests and Windows-only integration/regression tests (including CPU consumption measurement) covering the RDS invalid-handle scenario.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
Yubico.Core/src/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListener.cs |
Adds injectable SCard interop, expands recoverable error handling, and introduces recovery backoff + safer context refresh logic. |
Yubico.Core/src/Yubico/PlatformInterop/Desktop/SCard/ISCardInterop.cs |
Introduces an internal abstraction for the subset of SCard APIs the listener uses. |
Yubico.Core/src/Yubico/PlatformInterop/Desktop/SCard/SCardInterop.cs |
Provides the production ISCardInterop implementation delegating to NativeMethods. |
Yubico.Core/tests/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListenerSCardErrorTests.cs |
Adds cross-platform mock-based tests to validate recovery and throttling behavior. |
Yubico.Core/tests/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListenerWindowsTests.cs |
Adds Windows-only tests that invalidate a real context and validate CPU + recovery + disposal behavior. |
fix-rds-scard-invalid-handle.md |
Documents root cause analysis, fix details, and how to run/validate tests. |
| /// Internal constructor that accepts a test double for the SCard API surface. | ||
| /// </summary> | ||
| internal DesktopSmartCardDeviceListener(ISCardInterop scard) | ||
| { |
There was a problem hiding this comment.
The injected scard dependency is assigned without a null check. If a caller accidentally passes null (including future internal callers/tests), this will produce a NullReferenceException later in the listener thread and be harder to diagnose. Consider validating scard up-front (e.g., throw ArgumentNullException) before assigning to _scard.
| { | |
| { | |
| if (scard is null) | |
| { | |
| throw new ArgumentNullException(nameof(scard)); | |
| } |
| SCardContext context = GetListenerContext(listener); | ||
| // SCardReleaseContext with the raw IntPtr tells WinSCard the handle is gone. | ||
| // Subsequent SCardGetStatusChange calls using this handle will fail immediately | ||
| // with SCARD_E_INVALID_HANDLE and trigger WinSCard's internal C++ exception path. | ||
| NativeMethods.SCardReleaseContext(context.DangerousGetHandle()); | ||
| } |
There was a problem hiding this comment.
InvalidateListenerContext ignores the return value from SCardReleaseContext. If the call fails (e.g., context already replaced/disposed, service transition), the tests may proceed without actually invalidating the handle and could give false confidence. Consider checking the returned error code and either Assert success or Skip with a message when invalidation cannot be performed.
|
|
||
| Skip.IfNot(ListenerIsActive(listener), | ||
| "Smart Card service (SCardSvr) is not running on this machine."); | ||
|
|
||
| Thread.Sleep(300); | ||
| InvalidateListenerContext(listener); | ||
|
|
||
| // Let recovery fire once (1000ms sleep inside the listener thread). | ||
| Thread.Sleep(1500); | ||
|
|
||
| // Now dispose — must complete well within 8 seconds. | ||
| var stopwatch = Stopwatch.StartNew(); | ||
| var exception = Record.Exception(() => listener.Dispose()); | ||
| stopwatch.Stop(); | ||
|
|
||
| Assert.Null(exception); | ||
| Assert.True( | ||
| stopwatch.ElapsedMilliseconds < 5000, | ||
| $"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " + | ||
| "Expected < 5000ms. The listener thread may be blocked in the recovery sleep."); |
There was a problem hiding this comment.
In RealWinSCard_WhenHandleInvalidatedThenDisposed_DisposalCompletesCleanly, the listener is not in a using (or try/finally). If Skip.IfNot(ListenerIsActive(listener), ...) triggers, the test exits via exception and the listener is never disposed, leaking the background thread/context for the rest of the test run. Consider creating the listener with using var and still measuring a manual Dispose() call (double-dispose should be safe), or wrapping the listener in a try/finally that disposes it on all paths.
| Skip.IfNot(ListenerIsActive(listener), | |
| "Smart Card service (SCardSvr) is not running on this machine."); | |
| Thread.Sleep(300); | |
| InvalidateListenerContext(listener); | |
| // Let recovery fire once (1000ms sleep inside the listener thread). | |
| Thread.Sleep(1500); | |
| // Now dispose — must complete well within 8 seconds. | |
| var stopwatch = Stopwatch.StartNew(); | |
| var exception = Record.Exception(() => listener.Dispose()); | |
| stopwatch.Stop(); | |
| Assert.Null(exception); | |
| Assert.True( | |
| stopwatch.ElapsedMilliseconds < 5000, | |
| $"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " + | |
| "Expected < 5000ms. The listener thread may be blocked in the recovery sleep."); | |
| try | |
| { | |
| Skip.IfNot(ListenerIsActive(listener), | |
| "Smart Card service (SCardSvr) is not running on this machine."); | |
| Thread.Sleep(300); | |
| InvalidateListenerContext(listener); | |
| // Let recovery fire once (1000ms sleep inside the listener thread). | |
| Thread.Sleep(1500); | |
| // Now dispose — must complete well within 8 seconds. | |
| var stopwatch = Stopwatch.StartNew(); | |
| var exception = Record.Exception(() => listener.Dispose()); | |
| stopwatch.Stop(); | |
| Assert.Null(exception); | |
| Assert.True( | |
| stopwatch.ElapsedMilliseconds < 5000, | |
| $"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " + | |
| "Expected < 5000ms. The listener thread may be blocked in the recovery sleep."); | |
| } | |
| finally | |
| { | |
| listener.Dispose(); | |
| } |
…try/finally leak fix - Add ArgumentNullException guard for injected ISCardInterop in internal constructor - Check SCardReleaseContext return value in test helper and Skip on failure - Wrap disposal test listener in try/finally to prevent resource leak on Skip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CPM override - Replace Assert.NotEqual(..., string) with Assert.True(..., string) — xunit 2.9.3 has no string-message overload for NotEqual (that's xunit 3.x only) - Add Directory.Packages.props to .gitignore (local-only workaround that prevents a parent project's CPM config from breaking restore in this branch) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test Results: Windows 2 files 2 suites 29s ⏱️ Results for commit a9964fe. ♻️ This comment has been updated with latest results. |
Test Results: Ubuntu 2 files 2 suites 55s ⏱️ Results for commit a9964fe. ♻️ This comment has been updated with latest results. |
Test Results: MacOS 4 files 4 suites 58s ⏱️ Results for commit a9964fe. ♻️ This comment has been updated with latest results. |
| using Yubico.PlatformInterop; | ||
|
|
||
| namespace Yubico.Core.Devices.SmartCard.UnitTests | ||
| { |
There was a problem hiding this comment.
[Collection("WindowsOnlyTests")] groups these tests, but without a corresponding CollectionDefinition (e.g. DisableParallelization = true) or assembly-level parallelization settings, other test collections can still run concurrently and skew this file's CPU-sensitive Process.TotalProcessorTime assertions. Consider adding a CollectionDefinition("WindowsOnlyTests", DisableParallelization = true) in the test project or otherwise ensuring these tests run non-parallel.
| { | |
| { | |
| [CollectionDefinition("WindowsOnlyTests", DisableParallelization = true)] | |
| public class WindowsOnlyTestsCollection | |
| { | |
| } |
| // Non-critical errors that need context update | ||
| if (UpdateContextIfNonCritical(result)) | ||
| { | ||
| _log.LogInformation("GetStatusChange indicated non-critical status {Status:X}.", result); | ||
| return true; |
There was a problem hiding this comment.
When UpdateContextIfNonCritical(result) triggers UpdateCurrentContext(), _readerStates is refreshed, but the current CheckForUpdates call continues using the pre-recovery newStates clone and later assigns _readerStates = newStates. This can overwrite the refreshed reader list and keep polling with stale state after recovery. Consider short-circuiting the current update iteration when recovery occurs (restart with a fresh clone of _readerStates).
| // Sleep briefly to prevent a tight loop if this error persists (e.g. unknown | ||
| // persistent error codes not yet classified as recoverable). | ||
| _log.SCardApiCall(nameof(NativeMethods.SCardGetStatusChange), result); | ||
| _log.LogInformation("Reader states:\n{States}", states); |
There was a problem hiding this comment.
_log.LogInformation("Reader states:\n{States}", states) will log the array type name rather than per-reader details. Since SCARD_READER_STATE has an overridden ToString(), consider formatting the array into a joined string (or logging entries individually) so this diagnostic output is actionable when investigating SCard errors.
| _log.LogInformation("Reader states:\n{States}", states); | |
| string formattedStates = string.Join(Environment.NewLine, states.Select(s => s.ToString())); | |
| _log.LogInformation("Reader states:\n{States}", formattedStates); |
| context = new SCardContext(IntPtr.Zero); | ||
|
|
||
| if (_establishContextFailAfterFirstCall && callNum > 1) | ||
| { | ||
| return ErrorCode.SCARD_E_NO_SERVICE; | ||
| } | ||
|
|
There was a problem hiding this comment.
FakeSCardInterop.EstablishContext returns SCARD_S_SUCCESS but sets context to new SCardContext(IntPtr.Zero) (an invalid handle). This diverges from real WinSCard behavior and can mask bugs where production code assumes a successful establish yields a non-invalid context. Consider returning a distinct non-zero handle when returning success (and only using IntPtr.Zero on failure).
| context = new SCardContext(IntPtr.Zero); | |
| if (_establishContextFailAfterFirstCall && callNum > 1) | |
| { | |
| return ErrorCode.SCARD_E_NO_SERVICE; | |
| } | |
| if (_establishContextFailAfterFirstCall && callNum > 1) | |
| { | |
| context = new SCardContext(IntPtr.Zero); | |
| return ErrorCode.SCARD_E_NO_SERVICE; | |
| } | |
| // On success, simulate a valid, non-zero context handle (as WinSCard would). | |
| context = new SCardContext(new IntPtr(callNum)); |
- Short-circuit CheckForUpdates after context recovery at all three GetStatusChange call sites to prevent stale newStates from overwriting the freshly refreshed _readerStates (Copilot review finding). - Format SCARD_READER_STATE[] in diagnostic logging so individual reader entries are printed instead of the array type name. - Return distinct non-zero handles from FakeSCardInterop.EstablishContext on success, matching real WinSCard behavior. - Extract DRY helper for context-reestablishment test assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| // Return a distinct non-zero handle on success, matching real WinSCard behavior. | ||
| context = new SCardContext(new IntPtr(callNum)); |
There was a problem hiding this comment.
FakeSCardInterop.EstablishContext returns new SCardContext(new IntPtr(callNum)) for “successful” calls. SCardContext is a SafeHandle whose ReleaseHandle() P/Invokes NativeMethods.SCardReleaseContext, so disposing the listener in these mock tests will still invoke native smart-card APIs with a fake handle. That weakens the isolation of these tests (they can fail due to native/PCSC availability or native error behavior rather than the recovery logic). Consider returning IntPtr.Zero (so release is effectively a no-op) or using a test-only SCardContext implementation that avoids native calls in ReleaseHandle().
| // Return a distinct non-zero handle on success, matching real WinSCard behavior. | |
| context = new SCardContext(new IntPtr(callNum)); | |
| // Use IntPtr.Zero so disposing the context in tests does not invoke native APIs. | |
| context = new SCardContext(IntPtr.Zero); |
| [Collection("WindowsOnlyTests")] | ||
| public class DesktopSmartCardDeviceListenerWindowsTests |
There was a problem hiding this comment.
[Collection("WindowsOnlyTests")] does not prevent this class from running in parallel with other test collections, but the assertions measure whole-process CPU (Process.TotalProcessorTime) and the file header says the tests should run in isolation. This can make the CPU threshold assertions flaky when xUnit parallelization is enabled.
Consider adding a [CollectionDefinition("WindowsOnlyTests", DisableParallelization = true)] fixture in the test project (or otherwise disabling parallelization for these tests) so they won’t run concurrently with other collections.
| @@ -31,6 +29,7 @@ internal class DesktopSmartCardDeviceListener : SmartCardDeviceListener | |||
| { | |||
| private static readonly string[] readerNames = new[] { "\\\\?\\Pnp\\Notifications" }; | |||
There was a problem hiding this comment.
The special PC/SC PnP notification reader name literals look inconsistent and likely incorrect:
readerNamesuses"\\\\?\\Pnp\\Notifications"(pluralNotifications)GetReaderStateListprepends"\\\\?PnP?\\Notification"(different escaping/placement)
WinSCard/PCSC uses the virtual reader \\?\\PnP\\Notification (singular). Consider replacing both with a single shared constant using the canonical value to avoid relying on fallback behavior and to make behavior consistent across platforms.
| private static readonly string[] readerNames = new[] { "\\\\?\\Pnp\\Notifications" }; | |
| private const string PnpNotificationReaderName = @"\\?\PnP\Notification"; | |
| private static readonly string[] readerNames = new[] { PnpNotificationReaderName }; |
Summary
Closes #434.
When an RDS/Windows 365/Terminal Server session is disconnected and reconnected, the Windows Smart Card Service invalidates existing
SCARDCONTEXThandles.DesktopSmartCardDeviceListenerwas callingSCardGetStatusChangein a tight loop with the stale handle.WinSCard.dllinternally raises and unwinds a C++ exception (CxxThrowException) for every such call — confirmed in 6 of 10 minidumps. This ran thousands of times per second, pegging one CPU core.The error code returned to managed code (
SCARD_E_INVALID_HANDLE = 0x80100003) was not handled byUpdateContextIfNonCritical. The code logged the error and immediately retried with no backoff and no context re-establishment.Changes
ISCardInteropinterface +SCardInteropconcrete class — extracts the four SCard P/Invoke calls behind an injectable interface, enabling cross-platform unit tests without hardware or WindowsUpdateContextIfNonCritical— addsSCARD_E_INVALID_HANDLE,SCARD_E_SYSTEM_CANCELLED,ERROR_BROKEN_PIPEto the recovery switch (all three surface in RDS session transitions)Thread.Sleep(1000)backoff — applied after every recovery attempt to prevent a secondary tight loop ifSCardEstablishContextalso fails while the service is transitioningUpdateCurrentContextguard — checksSCardEstablishContextreturn value; on failure keeps existing context rather than replacing with a failed handle; explicitly disposes old handleSee
fix-rds-scard-invalid-handle.mdfor full root cause analysis, fix details, and test instructions.Test plan
Cross-platform mock tests (run on any CI platform, no hardware needed)
dotnet test Yubico.Core\tests\Yubico.Core.UnitTests.csproj \ --filter "FullyQualifiedName~DesktopSmartCardDeviceListenerSCardErrorTests" \ --logger "console;verbosity=detailed"WhenGetStatusChangeReturnsInvalidHandle_ContextIsReestablished— fails before fix, passes afterWhenGetStatusChangeAlwaysReturnsInvalidHandle_LoopDoesNotSpin— proves no tight loop (< 15 calls in 600ms)WhenGetStatusChangeReturnsSystemCancelled_ContextIsReestablishedWhenContextReestablishmentFails_ListenerContinuesWithoutCrashingWindows CPU regression test (requires Windows with SCardSvr running — no reader needed)
dotnet test Yubico.Core\tests\Yubico.Core.UnitTests.csproj \ --filter "Category=WindowsOnly" \ --logger "console;verbosity=detailed"RealWinSCard_WhenHandleInvalidated_CpuDoesNotSpikeuses reflection to invalidate the liveSCARDCONTEXThandle (same as RDS disconnect), then measuresProcess.TotalProcessorTimeover 3 seconds:This test exercises the real
WinSCard.dllC++ exception path reported by the OP.🤖 Generated with Claude Code