Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-3134: pre-populate pool in flaky CSOT runCommandCursor test #1769

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

baileympearson
Copy link
Contributor

@baileympearson baileympearson commented Mar 13, 2025

The "Non-tailable cursor lifetime remaining timeoutMS applied to getMore if timeoutMode is unset" flakes fairly regularly in Node with the following error:

 AssertionError: Expected event count mismatch, expected [ 'commandStartedEvent', 'commandStartedEvent' ] but got [ 'CommandStartedEvent' ]

Turns out connection establishment is taking roughly 40ms worth of time. This seems unusually high, measuring connection establishment while running this test repeatedly usually shows establishment taking 2-8 ms. I also observe that, as the tests run, we occasionally hit periods where establishment increases to between 20-40ms for a few iterations of the tests, and then drops back down to the usual 2-8ms for establishment.

The delay in connection establishment causes that the find to time out, not the getMore. As a result, we still get a timeout error in the test but we have too few CommandStartedEvents and the test fails.

Node has seen this in other non-spec tests we wrote while implementing CSOT. The most reliable way to resolve the flakiness is to pre-populate the pool with a connection before starting the test. For any test that relies on failpoints to cause timeouts, it makes sense to pre-populate the pool with a connection so that we can as much variance in timing before the connection layer as possible.

Please complete the following before merging:

  • Update changelog.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, sharded
    clusters, and serverless).

@baileympearson baileympearson force-pushed the DRIVERS-3134-flaky-csot-test branch from 35f8b87 to 6a1c9fb Compare March 14, 2025 16:40
@baileympearson baileympearson changed the title DRIVERS-3134: increase timeouts in flaky CSOT runCommandCursor test DRIVERS-3134: pre-populate pool in flaky CSOT runCommandCursor test Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant