Skip to content

fix(rivetkit): fix engine driver tests, engine sqlite teardown logic#4293

Merged
NathanFlurry merged 1 commit intomainfrom
02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic
Feb 26, 2026
Merged

fix(rivetkit): fix engine driver tests, engine sqlite teardown logic#4293
NathanFlurry merged 1 commit intomainfrom
02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic

Conversation

@NathanFlurry
Copy link
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4293 February 24, 2026 10:44 Destroyed
@railway-app
Copy link

railway-app bot commented Feb 24, 2026

🚅 Deployed to the rivet-pr-4293 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 🕒 Building (View Logs) Web Feb 26, 2026 at 1:47 am
frontend-cloud 🕒 Building (View Logs) Web Feb 26, 2026 at 1:47 am
website 😴 Sleeping (View Logs) Web Feb 24, 2026 at 10:55 am
mcp-hub ✅ Success (View Logs) Web Feb 24, 2026 at 10:45 am
ladle ❌ Build Failed (View Logs) Web Feb 24, 2026 at 10:44 am

Copy link
Member Author


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 24, 2026

PR Review: fix(rivetkit): fix engine driver tests, engine sqlite teardown logic

Good set of fixes addressing real-world flakiness and race conditions in the engine driver. The changes are well-targeted. Below is my feedback organized by file/concern.


actor-driver.ts — most significant changes

Brittle error string matching:

if (message.includes("WebSocket connection closed during shutdown")) {
    // ignore race
}

This is fragile — if the upstream error message changes, the guard silently stops working and errors get swallowed. Better to use a typed error class, an error code, or at minimum store the string as a named constant so it's easy to find and update.

createSqliteVfs() dynamic import trick:

const specifier = "@rivetkit/" + "sqlite-vfs";
const { SqliteVfs } = await import(specifier);

This is the same bundler-escape trick used in vfs.ts — intentional, but the "why" is not obvious to a reader. A brief comment explaining this prevents bundler from statically resolving the import would help both sites (vfs.ts and here).

Silent swallow on stopActor during start failure:

} catch (stopError) {
    logger().error({ ... });
    // not re-thrown
}

If stopActor throws during the failure-cleanup path, the actor entry may not get cleaned up, potentially leaking the #actors map entry. Consider ensuring this.#actors.delete(actorId) is always called in the catch regardless.

Shutdown timeout: The 15-second RUNNER_STOP_WAIT_MS timeout logs a warning but continues normally. This is reasonable for tests, but worth a note on whether this can cause issues in production if the runner is genuinely stuck.

actorStartError field pattern: Clean and correct. Storing the error for reuse on subsequent #loadActorHandler calls is the right approach.


instance/mod.ts

The early abort at destroy time is a good fix — calling abortController.abort() before waiting for the driver stop handshake correctly unblocks any in-flight waits. The wrapped try/catch is appropriate since abort may throw in edge cases.

Changing assertReady() to assertReady(true) for keepAwake is a subtle behavior change — worth a brief comment on why keepAwake is valid during stopping state (i.e., keep-alive pings should not fail just because shutdown is in progress).


workflow/driver.ts

this.#actor.stateManager.saveState({
    immediate: true,
    allowStoppingState: true,
})

Good fix. Without this, workflow state could be lost during teardown if the state manager guards against saves in stopping state. Low risk — allowStoppingState is opt-in.


driver-test-suite/utils.ts

Correct fix for teardown ordering: disposing the client before calling cleanup() prevents dangling connections. Awaiting cleanup() is also important since the previous version didn't await it. No concerns here.


driver-test-suite/tests/actor-db.ts

Critical indentation fix: The for (const variant of variants) loop being at the wrong indent level was a real correctness bug causing lifecycle tests to run once per variant instead of once. Good catch.

toBeGreaterThanOrEqual for real-timer mode: Appropriate loosening — shared namespaces mean exact counts aren't reliable. The baseline-count approach for the sleep persistence test is a clean solution.

Migration failure test using getTotalCleanupCount(): Reasonable workaround for the race where the failing actor's ID is not reliably resolvable. Could add a comment explaining why the per-actor ID approach was unreliable.


driver-test-suite/tests/actor-workflow.ts

Replacing the fixed 1-second wait with a polling loop (50 × 100ms) is a clear improvement. The conditions checked (runCount > 0 && history.length > 0 && guardTriggered) match what the test actually needs to verify. No concerns.


driver-engine.test.ts

Polling loop for runner registration (120 × 100ms = up to 12 seconds): Good replacement for the fixed 1-second delay. If the runner takes longer than 12 seconds, the failure message may not be obvious — consider logging something like "Runner did not register within 12 seconds" when the loop exhausts.

registry.parseConfig() moved after mutations: Correct fix — parsing config before setting namespace/runner values would bake in stale data.


Minor

  • Missing PR description: The PR template is not filled in. A one-paragraph summary of what's being fixed and why would help reviewers and future git bisect.
  • actor-db-drizzle.ts / actor-db-raw.ts: The 120-second action timeout is a pragmatic fix for slow engine tests, though it would be worth a comment noting this is specifically for engine driver test environments, not a production recommendation.

Summary

The fixes are sound and target real problems (actor lifecycle races, test flakiness, SQLite teardown data loss). The main items worth addressing before merge:

  1. Replace the error-string check with a typed/coded error guard in the shutdown path.
  2. Add a comment explaining the bundler-escape string-split pattern in both vfs.ts and actor-driver.ts.
  3. Ensure #actors.delete is always called even when stopActor throws during the start-failure cleanup path.
  4. Fill in the PR description.

@NathanFlurry NathanFlurry force-pushed the 02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic branch from 755725d to b61ca1a Compare February 26, 2026 01:36
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4293 February 26, 2026 01:36 Destroyed
@NathanFlurry NathanFlurry force-pushed the 02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic branch from b61ca1a to d041080 Compare February 26, 2026 01:39
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4293 February 26, 2026 01:39 Destroyed
@NathanFlurry NathanFlurry marked this pull request as ready for review February 26, 2026 01:40
@NathanFlurry NathanFlurry force-pushed the 02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic branch from d041080 to 7129b83 Compare February 26, 2026 01:47
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4293 February 26, 2026 01:47 Destroyed
@NathanFlurry NathanFlurry merged commit f0da0c3 into main Feb 26, 2026
10 of 22 checks passed
@NathanFlurry NathanFlurry deleted the 02-24-fix_rivetkit_fix_engine_driver_tests_engine_sqlite_teardown_logic branch February 26, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant