Skip to content

feat(rust/engine-runner): feature parity with typescript#4289

Draft
NathanFlurry wants to merge 1 commit intomainfrom
02-23-feat_rust_engine-runner_feature_parity_with_typescript
Draft

feat(rust/engine-runner): feature parity with typescript#4289
NathanFlurry wants to merge 1 commit intomainfrom
02-23-feat_rust_engine-runner_feature_parity_with_typescript

Conversation

@NathanFlurry
Copy link
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Feb 24, 2026

🚅 Deployed to the rivet-pr-4289 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 😴 Sleeping (View Logs) Web Feb 24, 2026 at 8:35 am
website 😴 Sleeping (View Logs) Web Feb 24, 2026 at 8:29 am
frontend-cloud 😴 Sleeping (View Logs) Web Feb 24, 2026 at 8:26 am
mcp-hub ✅ Success (View Logs) Web Feb 24, 2026 at 8:18 am
ladle ❌ Build Failed (View Logs) Web Feb 24, 2026 at 8:17 am

Copy link
Member Author


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 24, 2026

PR Review: feat(rust/engine-runner): feature parity with typescript

This is a substantial and well-structured PR that transforms the Rust engine runner from a pure test harness into a production-capable SDK mirroring the TypeScript runner. The reconnect loop, tunnel protocol support, serverless mode, Axum-based actor routing, and e2e test suite are all significant improvements. Notes below are organized by severity.


Issues

1. Unbounded body collection in Axum handler (actor.rs)

let body = axum::body::to_bytes(body, usize::MAX)
    .await
    .context("failed to collect actor response body")?;

Using usize::MAX as the size limit means a malformed or adversarial actor response can exhaust all available memory before the runner can react. Set a reasonable maximum (e.g., 64 MiB or a configurable constant) to match documented KV/message size limits.


2. Race condition in serverless start handler (runner.rs)

if !runner.runner.inner.started.load(Ordering::SeqCst) {
    runner.runner.start().await.map_err(internal_error)?;
}

Two concurrent SSE connections can both observe started == false and then both call start(). The second call returns an error ("runner.start called more than once"), which bubbles up to the client as a 500. Since start() already guards with a swap, the right fix here is to ignore the error when start() returns due to it being already started, or use a Once/OnceLock cell so the error path is unreachable.


3. reqwest::Client::new() in SDK code (runner.rs, upsert_serverless_runner_config)

let client = reqwest::Client::new();

CLAUDE.md says to use rivet_pools::reqwest::client().await? instead of building a new client. Even though rivet_pools may not be available in this SDK crate, creating a fresh client per call skips connection pooling and incurs TLS handshake overhead on every serverless upsert. At minimum, consider caching the client (e.g., via OnceLock) or accepting it as a parameter.


4. Empty string as "no namespace" is non-idiomatic (api_actors_delete.rs and related tests)

// Before
namespace: None
// After
namespace: "".to_string()

Using an empty string to signal "no namespace" is not idiomatic Rust—Option<String> is the conventional representation. The diff doesn't show the DeleteQuery struct definition change, so the rationale isn't clear; if the proto/serialization layer requires a non-optional field this should at least be documented with a comment at the struct level.


5. Indentation inconsistency in test files

In api_runner_configs_list.rs and api_runner_configs_upsert.rs, the newly added metadata_poll_interval: None, lines use inconsistent tab depth compared to their surrounding struct fields (extra leading tab on the new line). This is cosmetic but will trigger cargo fmt warnings.


Nits / Suggestions

6. clone_for_handle() is misleading

fn clone_for_handle(&self) -> RunnerInner {
    self.clone()
}

This just calls Clone and is only used to supply Arc::new(self.clone_for_handle()). All the fields in RunnerInner are already Arc-wrapped, so cloning the inner and then re-wrapping it in a new Arc works correctly, but creates a second RunnerInner allocation for each RunnerHandle created inside message handling. Since RunnerHandle already wraps Arc<RunnerInner>, callers could simply clone the outer arc. Consider storing the arc directly and propagating it.

7. Retry on HTTP 400 in test helper

Ok(response) if response.status() == reqwest::StatusCode::BAD_REQUEST => {
    tokio::time::sleep(Duration::from_millis(250)).await;
    drop(response);
    continue;
}

400 Bad Request usually indicates a permanent client error; retrying it will mask bugs and make tests flaky. If this is intended to handle a specific transient case (e.g., actor not yet routable), the condition should be narrower (e.g., inspect a specific error body field).

8. serverless_metadata hardcodes "runtime": "rivetkit"

The Rust runner is not rivetkit (the TypeScript package). If the engine uses this field to distinguish runner implementations, this will misidentify Rust runners.

9. Duplicate builder entry points

Both Runner::builder(config) and RunnerBuilder::new(config) are exposed and do the same thing. Keeping only one reduces surface area; the example already uses Runner::builder, so RunnerBuilder::new could be left as a lower-level entry point or the convenience method removed.

10. wait_ready() loop could miss notification

pub async fn wait_ready(&self) -> Result<String> {
    loop {
        if let Some(runner_id) = self.inner.runner_id.lock().await.clone() {
            return Ok(runner_id);
        }
        self.inner.ready_notify.notified().await;
    }
}

If the ToClientInit message arrives (setting runner_id and calling notify_waiters) between the lock check and the notified().await call, the notification is missed and the loop spins until the next notify. Use Notify::notified() before releasing the lock, or use a watch channel which inherently handles this ordering.


Positive observations

  • The RunnerApp trait design is clean and mirrors TypeScript idioms well.
  • Exponential backoff for reconnects with explicit RECONNECT_INITIAL_DELAY_MS / RECONNECT_MAX_DELAY_MS constants is good practice.
  • Splitting the writer into a dedicated task and decoupling sends via mpsc::UnboundedSender<rp::ToServer> is a solid improvement over the previous mutable-stream-passed-by-ref approach.
  • Per-actor event history with ack-based pruning correctly replaces the old global event history.
  • The e2e test harness in engine/sdks/rust/engine-runner/tests/ with OnceLock-based engine binary caching and the serialized test lock (acquire_test_lock) are well-thought-out.
  • Updating CLAUDE.md with the engine runner parity guideline is exactly the right way to enforce the policy.

@NathanFlurry NathanFlurry force-pushed the 02-23-feat_rust_engine-runner_feature_parity_with_typescript branch from 20e4fd0 to 0c4e446 Compare February 24, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant