Skip to content

test: harden AuthE2E connection params to fix migrator timeouts on CI#41

Merged
Theauxm merged 1 commit into
mainfrom
test/share-authe2e-host-per-fixture
May 5, 2026
Merged

test: harden AuthE2E connection params to fix migrator timeouts on CI#41
Theauxm merged 1 commit into
mainfrom
test/share-authe2e-host-per-fixture

Conversation

@Theauxm
Copy link
Copy Markdown
Member

@Theauxm Theauxm commented May 5, 2026

Background

After the per-class DB split landed in #39, one AuthE2E test still flakes intermittently on CI:
```
RepeatedEvents_SamePrincipalEveryTime [1 m 6 s]
System.TimeoutException : The operation has timed out
at NpgsqlConnection.OpenAsync
at DatabaseMigrator.Migrate
```
The 1m6s wall time = 4 retries × 15s Npgsql default Timeout, consistent with the migrator's first connection attempt failing repeatedly. CI runners sometimes can't accept a fresh TCP connection within 15s when the prior tests have just torn down theirs (OS-level TIME_WAIT slots take ~60s to release on Linux).

Fix

Three connection-string knobs:

  • `Timeout=30` — gives the migrator's first `OpenAsync` 30s of grace instead of 15.
  • `Tcp Keepalive=true` — surfaces half-closed peers before another test rents the pool slot.
  • `Pool Size=8` — small enough that 5 AuthE2E fixtures × 8 = 40 connections stays well under postgres's default max_connections=100.

Also tried sharing a single host per fixture (one `OneTimeSetUp` call, all tests reuse it). That broke 9 tests because HotChocolate's WS layer only supports one socket interceptor per schema, so a host wired with both ApiKey and JWT subscriptions can't accept ApiKey tokens (the JWT interceptor wins and rejects them). The existing `SubscriptionE2ETests.BothSchemes_ApiKeyToken_RejectedByLastInterceptor` documents this as an intentional limitation. Reverted that approach.

Test plan

  • All 60 AuthE2E tests pass locally with the new connection params
  • `dotnet csharpier check .` — clean

Bump Timeout from the 15s Npgsql default to 30s, enable Tcp Keepalive,
and slightly enlarge the per-host pool. The original RepeatedEvents test
flake was a 60-second migrator timeout (4 retries × 15s default) caused
by OS-level TCP slot churn under the long sequence of host startups in
CI. Larger Timeout absorbs the brief unavailability; Keepalive prevents
pool slots from holding stale half-closed connections.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Theauxm Theauxm merged commit 5e3ecb8 into main May 5, 2026
2 checks passed
@Theauxm Theauxm deleted the test/share-authe2e-host-per-fixture branch May 5, 2026 20:56
@traxsharp
Copy link
Copy Markdown

traxsharp Bot commented May 8, 2026

This PR is included in version 1.25.0

Theauxm added a commit that referenced this pull request May 11, 2026
OperationsQueriesTests, WorkQueueOperationsTests, TraxHealthServiceTests
and LogQueriesTests each hand-rolled a connection string with

  Maximum Pool Size=4;Connection Idle Lifetime=1;Connection Pruning Interval=1

intended to keep the test suite under Postgres's max_connections cap.
The cure was worse than the disease: under CI contention, every SetUp
paid a full TCP+auth round-trip because the pool prunes every idle
connection within 1s. CI Postgres routinely needs >15s for that
handshake under load, so the 15s Npgsql default timeout fired and the
test failed in SetUp (see #50 CI run 25690772831, WorkQueueOperations
SetUp 24s TimeoutException).

PR #41 already solved this exact problem class for the AuthE2E tests by
dropping the pruning interval and bumping Idle Lifetime + Pool Size +
Timeout. Reusing those numbers here: Pool Size=8, Idle Lifetime=30,
Timeout=30, Tcp Keepalive=true. Eight connections across four fixtures
is 32, still well under postgres default max_connections=100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant