Skip to content

Conversation

@djeebus
Copy link
Contributor

@djeebus djeebus commented Nov 17, 2025

This helps us break apart the main function into components, which also helps us close cleanly.


Note

Replaces the orchestrator’s monolithic main with go.uber.org/fx modules (gRPC, HTTP health, Hyperloop, networking, storage, observability, template manager), enabling clean startup/shutdown and better testability, plus related API tweaks, tests, and CI/dev updates.

  • Orchestrator (core refactor):
    • Introduces go.uber.org/fx DI modules: cmux, grpc, health, hyperloop, network, observability, redis, sandboxes, storage, templateManager, events, graph.
    • New app wiring (ioc/*), main.go simplified to ioc.New(...).Run(); add NewConfig() and Validate().
    • Graceful shutdown: health transitions to draining; waits for sandboxes/template builds; coordinated cmux/HTTP/gRPC stop.
    • Adds tests (ioc/app_test.go) for graph validation and startup/shutdown.
  • Networking:
    • network.Storage now has Setup(ctx); Pool.Populate(ctx) error and returns early on errors.
    • StorageLocal/Memory/KV updated; local storage separated NewStorageLocal() + Setup().
  • Servers & services:
    • hyperloopserver.NewHyperloopServer(port, logger, sandboxes) (drops ctx).
    • server.New(cfg) (*Server, error); service info creation simplified; sandbox factory tracks active sandboxes (Wait, Add/Subtract).
  • Template/build & tests:
    • Update benchmark and cmd/build-template to new network/storage APIs; trim log whitespace.
    • Template cache hardens old-dir cleanup (handles ENOENT).
  • Shared:
    • Add logger.IsSyncError; minor catalog cleanup.
  • CI/Docs/Config:
    • CI enables NBD and udev rule; DEV guide uses pushd/popd and adds FAQ; new .env.local vars and .gitignore for graph.dot.
  • Go tooling/deps:
    • Set go 1.24.7 across modules; adjust several dep versions; scripts bump e2b and add dotenv.

Written by Cursor Bugbot for commit b3a59ff. This will update automatically on new commits. Configure here.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@djeebus djeebus marked this pull request as draft November 18, 2025 17:23
@djeebus djeebus marked this pull request as ready for review November 19, 2025 20:21
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return err
}

return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing error logging in invokeAsync goroutines

The invokeAsync function calls fx.Shutdowner.Shutdown when an error occurs, but the errors from input.CMUX.Serve(), httpServer.Serve(), and grpcServer.Serve() are not logged before triggering shutdown. This makes debugging difficult when these servers fail to start, as the error information is lost. The invokeAsync helper should log the error before calling Shutdown, or the errors should be logged at the call sites before being returned.

Fix in Cursor Fix in Web

# Conflicts:
#	packages/api/go.mod
#	packages/api/go.sum
#	packages/clickhouse/go.mod
#	packages/clickhouse/go.sum
#	packages/client-proxy/go.mod
#	packages/client-proxy/go.sum
#	packages/db/go.mod
#	packages/db/go.sum
#	packages/docker-reverse-proxy/go.mod
#	packages/docker-reverse-proxy/go.sum
#	packages/envd/go.mod
#	packages/envd/go.sum
#	packages/local-dev/go.mod
#	packages/local-dev/go.sum
#	packages/orchestrator/benchmark_test.go
#	packages/orchestrator/cmd/build-template/main.go
#	packages/orchestrator/go.mod
#	packages/orchestrator/go.sum
#	packages/orchestrator/internal/sandbox/nbd/testutils/template_rootfs.go
#	packages/orchestrator/internal/sandbox/template/cache.go
#	packages/orchestrator/main.go
#	packages/shared/go.mod
#	packages/shared/go.sum
#	tests/integration/go.mod
#	tests/integration/go.sum
)
require.NoError(t, err)
store.Start(t.Context())
t.Cleanup(store.Close)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate store initialization and cleanup calls

The TestDiffStoreDelayEviction test function calls store.Start() and t.Cleanup(store.Close) twice consecutively. This causes the store to be started twice and the close function to be registered for cleanup twice, which can lead to unexpected behavior or runtime errors when the store is closed multiple times.

Fix in Cursor Fix in Web


cleaner.Add(func(context.Context) error {
store.RemoveCache()
store.Close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Store close function called twice in cleanup chain

The cleanup function at lines 99-108 registers store.Close() twice: once within a cleanup function that also calls store.RemoveCache(), and again in a separate cleanup function. This causes store.Close() to be called multiple times, which could result in panics or resource errors if the store's close method is not idempotent.

Fix in Cursor Fix in Web

// sbxlogger.SetSandboxLoggerExternal(logger)

slotStorage, err := network.NewStorageLocal(b.Context(), config.NetworkConfig)
slotStorage, err := network.NewStorageLocal(config.NetworkConfig)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unhandled error from networkPool.Populate in benchmark

The networkPool.Populate method now returns an error, but the benchmark test ignores it. This could cause the test to proceed with an uninitialized network pool if Setup fails, leading to silent failures or panics during test execution.

Fix in Cursor Fix in Web

# Conflicts:
#	packages/orchestrator/internal/service/info.go
#	packages/orchestrator/main.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants