test: harden AuthE2E connection params to fix migrator timeouts on CI#41
Merged
Conversation
Bump Timeout from the 15s Npgsql default to 30s, enable Tcp Keepalive, and slightly enlarge the per-host pool. The original RepeatedEvents test flake was a 60-second migrator timeout (4 retries × 15s default) caused by OS-level TCP slot churn under the long sequence of host startups in CI. Larger Timeout absorbs the brief unavailability; Keepalive prevents pool slots from holding stale half-closed connections.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
This PR is included in version 1.25.0 |
3 tasks
Theauxm
added a commit
that referenced
this pull request
May 11, 2026
OperationsQueriesTests, WorkQueueOperationsTests, TraxHealthServiceTests and LogQueriesTests each hand-rolled a connection string with Maximum Pool Size=4;Connection Idle Lifetime=1;Connection Pruning Interval=1 intended to keep the test suite under Postgres's max_connections cap. The cure was worse than the disease: under CI contention, every SetUp paid a full TCP+auth round-trip because the pool prunes every idle connection within 1s. CI Postgres routinely needs >15s for that handshake under load, so the 15s Npgsql default timeout fired and the test failed in SetUp (see #50 CI run 25690772831, WorkQueueOperations SetUp 24s TimeoutException). PR #41 already solved this exact problem class for the AuthE2E tests by dropping the pruning interval and bumping Idle Lifetime + Pool Size + Timeout. Reusing those numbers here: Pool Size=8, Idle Lifetime=30, Timeout=30, Tcp Keepalive=true. Eight connections across four fixtures is 32, still well under postgres default max_connections=100.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
After the per-class DB split landed in #39, one AuthE2E test still flakes intermittently on CI:
```
RepeatedEvents_SamePrincipalEveryTime [1 m 6 s]
System.TimeoutException : The operation has timed out
at NpgsqlConnection.OpenAsync
at DatabaseMigrator.Migrate
```
The 1m6s wall time = 4 retries × 15s Npgsql default Timeout, consistent with the migrator's first connection attempt failing repeatedly. CI runners sometimes can't accept a fresh TCP connection within 15s when the prior tests have just torn down theirs (OS-level TIME_WAIT slots take ~60s to release on Linux).
Fix
Three connection-string knobs:
Also tried sharing a single host per fixture (one `OneTimeSetUp` call, all tests reuse it). That broke 9 tests because HotChocolate's WS layer only supports one socket interceptor per schema, so a host wired with both ApiKey and JWT subscriptions can't accept ApiKey tokens (the JWT interceptor wins and rejects them). The existing `SubscriptionE2ETests.BothSchemes_ApiKeyToken_RejectedByLastInterceptor` documents this as an intentional limitation. Reverted that approach.
Test plan