fix(rivetkit): fix engine driver tests, engine sqlite teardown logic#4293
Conversation
|
🚅 Deployed to the rivet-pr-4293 environment in rivet-frontend
|
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR Review: fix(rivetkit): fix engine driver tests, engine sqlite teardown logicGood set of fixes addressing real-world flakiness and race conditions in the engine driver. The changes are well-targeted. Below is my feedback organized by file/concern. actor-driver.ts — most significant changesBrittle error string matching: if (message.includes("WebSocket connection closed during shutdown")) {
// ignore race
}This is fragile — if the upstream error message changes, the guard silently stops working and errors get swallowed. Better to use a typed error class, an error code, or at minimum store the string as a named constant so it's easy to find and update.
const specifier = "@rivetkit/" + "sqlite-vfs";
const { SqliteVfs } = await import(specifier);This is the same bundler-escape trick used in Silent swallow on } catch (stopError) {
logger().error({ ... });
// not re-thrown
}If Shutdown timeout: The 15-second
instance/mod.tsThe early abort at destroy time is a good fix — calling Changing workflow/driver.tsthis.#actor.stateManager.saveState({
immediate: true,
allowStoppingState: true,
})Good fix. Without this, workflow state could be lost during teardown if the state manager guards against saves in stopping state. Low risk — driver-test-suite/utils.tsCorrect fix for teardown ordering: disposing the client before calling driver-test-suite/tests/actor-db.tsCritical indentation fix: The
Migration failure test using driver-test-suite/tests/actor-workflow.tsReplacing the fixed 1-second wait with a polling loop (50 × 100ms) is a clear improvement. The conditions checked ( driver-engine.test.tsPolling loop for runner registration (120 × 100ms = up to 12 seconds): Good replacement for the fixed 1-second delay. If the runner takes longer than 12 seconds, the failure message may not be obvious — consider logging something like "Runner did not register within 12 seconds" when the loop exhausts.
Minor
SummaryThe fixes are sound and target real problems (actor lifecycle races, test flakiness, SQLite teardown data loss). The main items worth addressing before merge:
|
755725d to
b61ca1a
Compare
b61ca1a to
d041080
Compare
d041080 to
7129b83
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: