Skip to content

Fix: High idle CPU in RDS/Terminal Server environments (SCARD_E_INVALID_HANDLE)#445

Draft
DennisDyallo wants to merge 4 commits intodevelopfrom
dennisdyallo/fix-rds-scard-invalid-handle
Draft

Fix: High idle CPU in RDS/Terminal Server environments (SCARD_E_INVALID_HANDLE)#445
DennisDyallo wants to merge 4 commits intodevelopfrom
dennisdyallo/fix-rds-scard-invalid-handle

Conversation

@DennisDyallo
Copy link
Copy Markdown
Collaborator

Summary

Closes #434.

When an RDS/Windows 365/Terminal Server session is disconnected and reconnected, the Windows Smart Card Service invalidates existing SCARDCONTEXT handles. DesktopSmartCardDeviceListener was calling SCardGetStatusChange in a tight loop with the stale handle. WinSCard.dll internally raises and unwinds a C++ exception (CxxThrowException) for every such call — confirmed in 6 of 10 minidumps. This ran thousands of times per second, pegging one CPU core.

The error code returned to managed code (SCARD_E_INVALID_HANDLE = 0x80100003) was not handled by UpdateContextIfNonCritical. The code logged the error and immediately retried with no backoff and no context re-establishment.

Changes

  • ISCardInterop interface + SCardInterop concrete class — extracts the four SCard P/Invoke calls behind an injectable interface, enabling cross-platform unit tests without hardware or Windows
  • UpdateContextIfNonCritical — adds SCARD_E_INVALID_HANDLE, SCARD_E_SYSTEM_CANCELLED, ERROR_BROKEN_PIPE to the recovery switch (all three surface in RDS session transitions)
  • Thread.Sleep(1000) backoff — applied after every recovery attempt to prevent a secondary tight loop if SCardEstablishContext also fails while the service is transitioning
  • UpdateCurrentContext guard — checks SCardEstablishContext return value; on failure keeps existing context rather than replacing with a failed handle; explicitly disposes old handle
  • Default path backoff — unrecognised error codes also sleep 1000 ms (catch-all for unknown persistent errors)

See fix-rds-scard-invalid-handle.md for full root cause analysis, fix details, and test instructions.

Test plan

Cross-platform mock tests (run on any CI platform, no hardware needed)

dotnet test Yubico.Core\tests\Yubico.Core.UnitTests.csproj \
    --filter "FullyQualifiedName~DesktopSmartCardDeviceListenerSCardErrorTests" \
    --logger "console;verbosity=detailed"
  • WhenGetStatusChangeReturnsInvalidHandle_ContextIsReestablishedfails before fix, passes after
  • WhenGetStatusChangeAlwaysReturnsInvalidHandle_LoopDoesNotSpin — proves no tight loop (< 15 calls in 600ms)
  • WhenGetStatusChangeReturnsSystemCancelled_ContextIsReestablished
  • WhenContextReestablishmentFails_ListenerContinuesWithoutCrashing

Windows CPU regression test (requires Windows with SCardSvr running — no reader needed)

dotnet test Yubico.Core\tests\Yubico.Core.UnitTests.csproj \
    --filter "Category=WindowsOnly" \
    --logger "console;verbosity=detailed"

RealWinSCard_WhenHandleInvalidated_CpuDoesNotSpike uses reflection to invalidate the live SCARDCONTEXT handle (same as RDS disconnect), then measures Process.TotalProcessorTime over 3 seconds:

  • Before fix: ≈ 2500–3000 ms CPU consumed → FAIL
  • After fix: ≈ 30–100 ms CPU consumed → PASS

This test exercises the real WinSCard.dll C++ exception path reported by the OP.

🤖 Generated with Claude Code

…ard listener (#434)

When an RDS/Terminal Server session is disconnected, the Windows Smart Card Service
invalidates existing SCARDCONTEXT handles. DesktopSmartCardDeviceListener called
SCardGetStatusChange in a tight loop with the stale handle — WinSCard internally raises
a C++ exception (CxxThrowException) for each call, pegging a CPU core.

Fix:
- Add SCARD_E_INVALID_HANDLE, SCARD_E_SYSTEM_CANCELLED, ERROR_BROKEN_PIPE to the
  UpdateContextIfNonCritical recovery switch
- Add Thread.Sleep(1000) backoff after recovery to prevent secondary tight loop
- Guard UpdateCurrentContext against failed SCardEstablishContext
- Extract ISCardInterop interface enabling cross-platform unit tests without hardware

Tests:
- DesktopSmartCardDeviceListenerSCardErrorTests: cross-platform mock tests (Track B)
- DesktopSmartCardDeviceListenerWindowsTests: Windows CPU regression test (Track A)
  measuring TotalProcessorTime before/after handle invalidation via real WinSCard

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a high-idle-CPU regression in DesktopSmartCardDeviceListener seen in RDS/Terminal Server reconnect scenarios by handling invalidated smart-card contexts, adding recovery backoff, and improving testability via an injectable SCard interop abstraction.

Changes:

  • Extracts the WinSCard/PCSC calls behind ISCardInterop with a production SCardInterop implementation to enable deterministic unit testing.
  • Extends non-critical error recovery to include SCARD_E_INVALID_HANDLE, SCARD_E_SYSTEM_CANCELLED, and ERROR_BROKEN_PIPE, and adds a 1s recovery backoff to prevent tight retry loops.
  • Adds cross-platform mock tests and Windows-only integration/regression tests (including CPU consumption measurement) covering the RDS invalid-handle scenario.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Yubico.Core/src/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListener.cs Adds injectable SCard interop, expands recoverable error handling, and introduces recovery backoff + safer context refresh logic.
Yubico.Core/src/Yubico/PlatformInterop/Desktop/SCard/ISCardInterop.cs Introduces an internal abstraction for the subset of SCard APIs the listener uses.
Yubico.Core/src/Yubico/PlatformInterop/Desktop/SCard/SCardInterop.cs Provides the production ISCardInterop implementation delegating to NativeMethods.
Yubico.Core/tests/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListenerSCardErrorTests.cs Adds cross-platform mock-based tests to validate recovery and throttling behavior.
Yubico.Core/tests/Yubico/Core/Devices/SmartCard/DesktopSmartCardDeviceListenerWindowsTests.cs Adds Windows-only tests that invalidate a real context and validate CPU + recovery + disposal behavior.
fix-rds-scard-invalid-handle.md Documents root cause analysis, fix details, and how to run/validate tests.

/// Internal constructor that accepts a test double for the SCard API surface.
/// </summary>
internal DesktopSmartCardDeviceListener(ISCardInterop scard)
{
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The injected scard dependency is assigned without a null check. If a caller accidentally passes null (including future internal callers/tests), this will produce a NullReferenceException later in the listener thread and be harder to diagnose. Consider validating scard up-front (e.g., throw ArgumentNullException) before assigning to _scard.

Suggested change
{
{
if (scard is null)
{
throw new ArgumentNullException(nameof(scard));
}

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +90
SCardContext context = GetListenerContext(listener);
// SCardReleaseContext with the raw IntPtr tells WinSCard the handle is gone.
// Subsequent SCardGetStatusChange calls using this handle will fail immediately
// with SCARD_E_INVALID_HANDLE and trigger WinSCard's internal C++ exception path.
NativeMethods.SCardReleaseContext(context.DangerousGetHandle());
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidateListenerContext ignores the return value from SCardReleaseContext. If the call fails (e.g., context already replaced/disposed, service transition), the tests may proceed without actually invalidating the handle and could give false confidence. Consider checking the returned error code and either Assert success or Skip with a message when invalidation cannot be performed.

Copilot uses AI. Check for mistakes.
Comment on lines +227 to +246

Skip.IfNot(ListenerIsActive(listener),
"Smart Card service (SCardSvr) is not running on this machine.");

Thread.Sleep(300);
InvalidateListenerContext(listener);

// Let recovery fire once (1000ms sleep inside the listener thread).
Thread.Sleep(1500);

// Now dispose — must complete well within 8 seconds.
var stopwatch = Stopwatch.StartNew();
var exception = Record.Exception(() => listener.Dispose());
stopwatch.Stop();

Assert.Null(exception);
Assert.True(
stopwatch.ElapsedMilliseconds < 5000,
$"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " +
"Expected < 5000ms. The listener thread may be blocked in the recovery sleep.");
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RealWinSCard_WhenHandleInvalidatedThenDisposed_DisposalCompletesCleanly, the listener is not in a using (or try/finally). If Skip.IfNot(ListenerIsActive(listener), ...) triggers, the test exits via exception and the listener is never disposed, leaking the background thread/context for the rest of the test run. Consider creating the listener with using var and still measuring a manual Dispose() call (double-dispose should be safe), or wrapping the listener in a try/finally that disposes it on all paths.

Suggested change
Skip.IfNot(ListenerIsActive(listener),
"Smart Card service (SCardSvr) is not running on this machine.");
Thread.Sleep(300);
InvalidateListenerContext(listener);
// Let recovery fire once (1000ms sleep inside the listener thread).
Thread.Sleep(1500);
// Now dispose — must complete well within 8 seconds.
var stopwatch = Stopwatch.StartNew();
var exception = Record.Exception(() => listener.Dispose());
stopwatch.Stop();
Assert.Null(exception);
Assert.True(
stopwatch.ElapsedMilliseconds < 5000,
$"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " +
"Expected < 5000ms. The listener thread may be blocked in the recovery sleep.");
try
{
Skip.IfNot(ListenerIsActive(listener),
"Smart Card service (SCardSvr) is not running on this machine.");
Thread.Sleep(300);
InvalidateListenerContext(listener);
// Let recovery fire once (1000ms sleep inside the listener thread).
Thread.Sleep(1500);
// Now dispose — must complete well within 8 seconds.
var stopwatch = Stopwatch.StartNew();
var exception = Record.Exception(() => listener.Dispose());
stopwatch.Stop();
Assert.Null(exception);
Assert.True(
stopwatch.ElapsedMilliseconds < 5000,
$"Dispose took {stopwatch.ElapsedMilliseconds}ms after handle invalidation. " +
"Expected < 5000ms. The listener thread may be blocked in the recovery sleep.");
}
finally
{
listener.Dispose();
}

Copilot uses AI. Check for mistakes.
DennisDyallo and others added 2 commits April 1, 2026 18:29
…try/finally leak fix

- Add ArgumentNullException guard for injected ISCardInterop in internal constructor
- Check SCardReleaseContext return value in test helper and Skip on failure
- Wrap disposal test listener in try/finally to prevent resource leak on Skip

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CPM override

- Replace Assert.NotEqual(..., string) with Assert.True(..., string) — xunit 2.9.3
  has no string-message overload for NotEqual (that's xunit 3.x only)
- Add Directory.Packages.props to .gitignore (local-only workaround that prevents
  a parent project's CPM config from breaking restore in this branch)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Test Results: Windows

    2 files      2 suites   29s ⏱️
4 060 tests 4 038 ✅ 22 💤 0 ❌
4 062 runs  4 040 ✅ 22 💤 0 ❌

Results for commit a9964fe.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Test Results: Ubuntu

    2 files      2 suites   55s ⏱️
4 052 tests 4 030 ✅ 22 💤 0 ❌
4 054 runs  4 032 ✅ 22 💤 0 ❌

Results for commit a9964fe.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Test Results: MacOS

    4 files      4 suites   58s ⏱️
4 034 tests 4 031 ✅ 3 💤 0 ❌
4 036 runs  4 033 ✅ 3 💤 0 ❌

Results for commit a9964fe.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

using Yubico.PlatformInterop;

namespace Yubico.Core.Devices.SmartCard.UnitTests
{
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Collection("WindowsOnlyTests")] groups these tests, but without a corresponding CollectionDefinition (e.g. DisableParallelization = true) or assembly-level parallelization settings, other test collections can still run concurrently and skew this file's CPU-sensitive Process.TotalProcessorTime assertions. Consider adding a CollectionDefinition("WindowsOnlyTests", DisableParallelization = true) in the test project or otherwise ensuring these tests run non-parallel.

Suggested change
{
{
[CollectionDefinition("WindowsOnlyTests", DisableParallelization = true)]
public class WindowsOnlyTestsCollection
{
}

Copilot uses AI. Check for mistakes.
Comment on lines 511 to 515
// Non-critical errors that need context update
if (UpdateContextIfNonCritical(result))
{
_log.LogInformation("GetStatusChange indicated non-critical status {Status:X}.", result);
return true;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UpdateContextIfNonCritical(result) triggers UpdateCurrentContext(), _readerStates is refreshed, but the current CheckForUpdates call continues using the pre-recovery newStates clone and later assigns _readerStates = newStates. This can overwrite the refreshed reader list and keep polling with stale state after recovery. Consider short-circuiting the current update iteration when recovery occurs (restart with a fresh clone of _readerStates).

Copilot uses AI. Check for mistakes.
// Sleep briefly to prevent a tight loop if this error persists (e.g. unknown
// persistent error codes not yet classified as recoverable).
_log.SCardApiCall(nameof(NativeMethods.SCardGetStatusChange), result);
_log.LogInformation("Reader states:\n{States}", states);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_log.LogInformation("Reader states:\n{States}", states) will log the array type name rather than per-reader details. Since SCARD_READER_STATE has an overridden ToString(), consider formatting the array into a joined string (or logging entries individually) so this diagnostic output is actionable when investigating SCard errors.

Suggested change
_log.LogInformation("Reader states:\n{States}", states);
string formattedStates = string.Join(Environment.NewLine, states.Select(s => s.ToString()));
_log.LogInformation("Reader states:\n{States}", formattedStates);

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +256
context = new SCardContext(IntPtr.Zero);

if (_establishContextFailAfterFirstCall && callNum > 1)
{
return ErrorCode.SCARD_E_NO_SERVICE;
}

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeSCardInterop.EstablishContext returns SCARD_S_SUCCESS but sets context to new SCardContext(IntPtr.Zero) (an invalid handle). This diverges from real WinSCard behavior and can mask bugs where production code assumes a successful establish yields a non-invalid context. Consider returning a distinct non-zero handle when returning success (and only using IntPtr.Zero on failure).

Suggested change
context = new SCardContext(IntPtr.Zero);
if (_establishContextFailAfterFirstCall && callNum > 1)
{
return ErrorCode.SCARD_E_NO_SERVICE;
}
if (_establishContextFailAfterFirstCall && callNum > 1)
{
context = new SCardContext(IntPtr.Zero);
return ErrorCode.SCARD_E_NO_SERVICE;
}
// On success, simulate a valid, non-zero context handle (as WinSCard would).
context = new SCardContext(new IntPtr(callNum));

Copilot uses AI. Check for mistakes.
- Short-circuit CheckForUpdates after context recovery at all three
  GetStatusChange call sites to prevent stale newStates from overwriting
  the freshly refreshed _readerStates (Copilot review finding).
- Format SCARD_READER_STATE[] in diagnostic logging so individual reader
  entries are printed instead of the array type name.
- Return distinct non-zero handles from FakeSCardInterop.EstablishContext
  on success, matching real WinSCard behavior.
- Extract DRY helper for context-reestablishment test assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Code Coverage

Package Line Rate Branch Rate Complexity Health
Yubico.Core 52% 42% 1551
Yubico.YubiKey 50% 46% 7180
Summary 50% (12840 / 25478) 45% (3085 / 6882) 8731

Minimum allowed line rate is 40%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines +229 to +230
// Return a distinct non-zero handle on success, matching real WinSCard behavior.
context = new SCardContext(new IntPtr(callNum));
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeSCardInterop.EstablishContext returns new SCardContext(new IntPtr(callNum)) for “successful” calls. SCardContext is a SafeHandle whose ReleaseHandle() P/Invokes NativeMethods.SCardReleaseContext, so disposing the listener in these mock tests will still invoke native smart-card APIs with a fake handle. That weakens the isolation of these tests (they can fail due to native/PCSC availability or native error behavior rather than the recovery logic). Consider returning IntPtr.Zero (so release is effectively a no-op) or using a test-only SCardContext implementation that avoids native calls in ReleaseHandle().

Suggested change
// Return a distinct non-zero handle on success, matching real WinSCard behavior.
context = new SCardContext(new IntPtr(callNum));
// Use IntPtr.Zero so disposing the context in tests does not invoke native APIs.
context = new SCardContext(IntPtr.Zero);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Comment on lines +57 to +58
[Collection("WindowsOnlyTests")]
public class DesktopSmartCardDeviceListenerWindowsTests
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Collection("WindowsOnlyTests")] does not prevent this class from running in parallel with other test collections, but the assertions measure whole-process CPU (Process.TotalProcessorTime) and the file header says the tests should run in isolation. This can make the CPU threshold assertions flaky when xUnit parallelization is enabled.

Consider adding a [CollectionDefinition("WindowsOnlyTests", DisableParallelization = true)] fixture in the test project (or otherwise disabling parallelization for these tests) so they won’t run concurrently with other collections.

Copilot uses AI. Check for mistakes.
@@ -31,6 +29,7 @@ internal class DesktopSmartCardDeviceListener : SmartCardDeviceListener
{
private static readonly string[] readerNames = new[] { "\\\\?\\Pnp\\Notifications" };
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special PC/SC PnP notification reader name literals look inconsistent and likely incorrect:

  • readerNames uses "\\\\?\\Pnp\\Notifications" (plural Notifications)
  • GetReaderStateList prepends "\\\\?PnP?\\Notification" (different escaping/placement)

WinSCard/PCSC uses the virtual reader \\?\\PnP\\Notification (singular). Consider replacing both with a single shared constant using the canonical value to avoid relying on fallback behavior and to make behavior consistent across platforms.

Suggested change
private static readonly string[] readerNames = new[] { "\\\\?\\Pnp\\Notifications" };
private const string PnpNotificationReaderName = @"\\?\PnP\Notification";
private static readonly string[] readerNames = new[] { PnpNotificationReaderName };

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] High idle CPU cost of enumerating devices In certain terminal server environments

2 participants