fix(warm-pool): Wire dispatch path, fix stale GCS paths#46
fix(warm-pool): Wire dispatch path, fix stale GCS paths#46
Conversation
Fixes from honest review of v0.3.0 warm pool implementation: 1. DISPATCH WIRED: GCERunBackend.launch() now passes warm_pool_idle_timeout through gce_launcher to build_startup_script. register_instance() called after fresh launch when pool has capacity. Previously the entire dispatch path was dead code — enabling warm pool had no effect. 2. GCS PATHS FIXED: Idle loop now updates GCS_STDOUT_PATH, GCS_STDERR_PATH, GCS_EXIT_CODE_PATH from the new run's run_path. Previously all subsequent jobs would upload logs to the FIRST run's GCS location. 3. EXIT CODE FIXED: Uses direct gsutil cp to GCS instead of stale EXIT_CODE_FILE local path that pointed to the first run's gcsfuse mount. 4. GCS_BUCKET EXPORTED: Idle loop needs bucket name to construct paths for subsequent runs. Now exported before idle loop definition. 5. POLL SCAN FIXED: _get_run_handle no longer scans all running warm instances. Uses single-row lookup by backend_handle (O(1) not O(n)). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Code Review — fix(warm-pool): Wire dispatch path, fix stale GCS pathsGood set of targeted fixes overall. The dispatch wiring, O(n)→O(1) poll fix, and instance metadata exit-code approach are all solid improvements. Found one functional bug that will cause all warm-pool re-runs to write logs to wrong GCS paths. Bug: Double
|
Fixes all issues from honest review: 1. DISPATCH WIRED IN HOT PATH: warm_pool_manager passed into gce_launcher.launch_instance(). Warm pool claim happens AFTER scripts are built but BEFORE instance creation — where all the pieces (env_map, pre_run_cmds, post_run_cmds, docker_cmd) are available. New _build_docker_cmd_script() generates standalone Docker run script for warm pool reuse. 2. GCS PATHS UPDATED PER JOB: Idle loop now updates GCS_STDOUT_PATH, GCS_STDERR_PATH, GCS_EXIT_CODE_PATH from new run's run_path. Exit code written directly to GCS via gsutil, not stale local path. 3. BACKGROUND PROCESSES RESET: Idle loop kills watchdog, supervisor, log syncer, and metadata syncer PIDs between jobs. Metadata syncer restarted for each new job. 4. POLL SCAN O(1): _get_run_handle uses single-row lookup by backend_handle instead of scanning all running warm instances. 5. INTEGRATION TESTS: 8 new tests covering full lifecycle — register, claim, ACK, timeout, release, reap, emergency cleanup, pool cap. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
test |
Goldfish Development Guide
Quick Reference# Development cycle
make lint # Ruff + mypy via pre-commit - run before commits
make test # Fast unit tests (<1s) - run frequently
make test-integration # Integration tests (~2min) - before pushing
make ci # Full CI suite (lint + all tests)
# First-time setup
uv pip install -e ".[dev]"
make install-hooks # REQUIRED: installs pre-commit hooksGolden rule: Never suppress lint errors—always fix the source. TDD: Test-Driven DevelopmentThis codebase uses TDD. Write tests BEFORE implementation. Why TDD matters for LLMs:
Workflow:
Test naming: No exceptions: Even "quick fixes" get tests first. The test documents the bug and prevents regression. What is Goldfish?An MCP server enabling Claude Code to conduct ML experiments by managing:
Core invariants:
Architecture at a GlanceKey Files
The Nine Abstractions1. Workspaces = Copy-Based IsolationKey operations: 2. Versions = Git Tags (100% Provenance)Every Stored in 3. Pipelines = YAMLstages:
- name: preprocess
inputs: {raw: {type: dataset, dataset: sales_v1}}
outputs: {features: {type: npy}}
- name: train
inputs: {features: {from_stage: preprocess, signal: features}}Parser validates: unique names, type compatibility, no cycles, datasets exist. 4. Stages = Docker Containers# modules/train.py - runs in container
from goldfish.io import load_input, save_output
features = load_input("features") # from /mnt/inputs/
save_output("model", model_dir) # to /mnt/outputs/5. Signals = Data Flow
Tracked in 6. Resource Profiles# configs/train.yaml
compute:
profile: "h100-spot" # Claude writes thisGoldfish resolves to: Built-in: 7. SVS (Semantic Validation System)Core System: SVS provides defense-in-depth through three phases:
Key Patterns:
Security:
8. Cloud Abstraction LayerCore System: The cloud abstraction layer isolates provider-specific code (GCP, AWS, local) from core Goldfish logic: Key Protocols:
BackendCapabilities - Behavior configuration instead of conditionals: @dataclass
class BackendCapabilities:
ack_timeout_seconds: float = 1.0 # How long to wait for ACK
has_launch_delay: bool = False # GCE has startup delay, local doesn't
timeout_becomes_pending: bool = False # GCE timeout = sync pending, local = failure
logs_unavailable_message: str = "Logs not available"
zone_resolution_method: str = "config" # "config" or "handle"Usage Pattern - Always use protocol, never direct launcher: # GOOD: Protocol-based
result = self.run_backend.launch(spec)
status = self.run_backend.get_status(handle)
logs = self.run_backend.get_logs(handle)
# BAD: Direct launcher access (violates abstraction)
self.gce_launcher.launch_instance(...) # NEVER do thisAdding a New Backend:
9. Configuration FlexibilityDefaults Section - Global settings for stage execution: # goldfish.yaml
defaults:
timeout_seconds: 7200 # 2 hours (default: 3600)
log_sync_interval: 15 # Sync logs every 15 seconds (default: 10)
backend: gce # Default compute backend: local, gce, kubernetesStorage Backend Configuration - Multi-provider storage support: # goldfish.yaml
storage:
backend: "gcs" # or "s3", "azure", "local"
# GCS configuration (when backend: gcs)
gcs:
bucket: "my-bucket"
sources_prefix: "sources/"
artifacts_prefix: "artifacts/"
# S3 configuration (when backend: s3) - adapter coming soon
s3:
bucket: "my-bucket"
region: "us-east-1"
endpoint_url: "http://localhost:9000" # For MinIO/S3-compatible
# Azure configuration (when backend: azure) - adapter coming soon
azure:
container: "my-container"
account: "mystorageaccount"Backend Selection Priority:
Per-Profile Backend Selection - Different compute backends per profile: # goldfish.yaml
gce:
project_id: my-project
profile_overrides:
# GPU workloads on GCE
h100-spot:
zones: ["us-central1-a"]
# CPU workloads could use different config
cpu-large:
zones: ["us-west1-a", "us-west1-b"]Config Model Hierarchy: Critical PatternsDatabase Access# ALWAYS use context manager
with self.db._conn() as conn:
conn.execute("INSERT INTO ...")
# Transaction auto-commits on success, auto-rollbacks on exceptionError Handling# ALWAYS use specific error types with details
raise WorkspaceNotFoundError(
f"Workspace '{name}' not found",
details={"available": available_workspaces}
)
# NEVER expose git internals
# BAD: raise Exception("fatal: not a valid object name")
# GOOD: raise WorkspaceNotFoundError("Workspace not found")TypedDict Returns from Database# When returning TypedDict, ALWAYS use cast()
from typing import cast
return cast(JobRow, dict(row)) if row else None
# For lists:
return [cast(SourceRow, dict(r)) for r in rows]MCP Tool Pattern@mcp.tool()
def my_tool(param: str) -> dict:
"""Docstring for Claude."""
try:
validate_workspace_name(param) # 1. Validate
result = manager.do_thing(param) # 2. Execute
ctx.db.record_audit("my_tool", {...}) # 3. Audit
return {"success": True, "result": result} # 4. Return
except GoldfishError as e:
return {"success": False, "error": e.message}Security Model (4 Layers)1. Input Validation (
|
| Input | Pattern | Example |
|---|---|---|
| Workspace name | ^[a-zA-Z0-9_-]+$ |
baseline_lstm |
| Snapshot ID | ^snap-[a-f0-9]{8}-\d{8}-\d{6}$ |
snap-abc12345-20251210-143000 |
| Stage run ID | ^stage-[a-f0-9]+$ |
stage-abc123 |
2. Path Traversal Protection
# ALWAYS validate paths
def validate_path_within_root(path: Path, root: Path) -> None:
if not path.resolve().is_relative_to(root.resolve()):
raise ValidationError("Path traversal")
# ALWAYS check symlinks (TOCTOU prevention)
if path.is_symlink():
raise InvalidLogPathError("Symlink detected")3. Docker Sandboxing (cloud/adapters/local/)
# Containers run with:
--memory 4g --cpus 2.0 --pids-limit 100
--user 1000:1000 # non-root
-v inputs:/mnt/inputs:ro # read-only inputs4. Git Error Translation (errors.py)
All git errors translated to Goldfish concepts before reaching Claude.
Stage Execution Flow
run("w1", stages=["train"])
│
├─▶ 1. Validate workspace mounted
├─▶ 2. SYNC: Copy user/w1 → dev-repo/branch (with delete semantics)
├─▶ 3. COMMIT: Auto-commit changes in dev-repo
├─▶ 4. PUSH: Push to remote (for GCE execution)
├─▶ 5. Auto-version (create git tag from committed SHA)
├─▶ 6. Load pipeline, validate stage exists
├─▶ 7. Resolve inputs (datasets or upstream signals)
├─▶ 8. Build Docker image
├─▶ 9. Launch container (local or GCE)
├─▶ 10. Monitor status, stream logs
└─▶ 11. Finalize: register outputs in signal_lineage
Key methods:
GitLayer.sync_slot_to_branch()- sync + commit (provenance guard)StageExecutor.run_stage()injobs/stage_executor.py
Database Schema (Key Tables)
workspace_versions(workspace_name, version, git_sha, created_by, created_at)
stage_runs(id, workspace_name, version, stage_name, status, backend_type, ...)
signal_lineage(stage_run_id, signal_name, signal_type, storage_location)
audit(operation, workspace, details_json, created_at)Full schema: db/schema.sql
Testing
Structure
tests/
├── unit/ # Over 2400 tests, <1s, pure logic, all mocked
├── integration/ # Over 1200 tests, ~2min, real DB + git
├── e2e/ # Full Docker tests
│ └── deluxe/ # GCE tests (@pytest.mark.deluxe_gce)
└── conftest.py # Fixtures: test_db, temp_git_repo
Key Fixtures
test_db # Fresh SQLite with schema
temp_git_repo # Initialized git repo with main branch
test_config # GoldfishConfig for testingWriting Tests
def test_feature(test_db, temp_git_repo):
"""What + Why in docstring."""
manager = WorkspaceManager(db=test_db, ...)
result = manager.create_workspace("test", "goal")
assert result.name == "test"
# Always verify DB state too
with test_db._conn() as conn:
row = conn.execute("SELECT ...").fetchone()
assert row is not NoneDO and DON'T
| DO | DON'T |
|---|---|
make lint before committing |
# type: ignore (fix the issue) |
Specific error types (WorkspaceNotFoundError) |
Expose git terminology to MCP clients |
cast() for TypedDict database returns |
Bare except: (use except Exception:) |
| Validate all inputs before operations | raise X without from e when re-raising |
| Record audit log for user-facing operations | Commit with failing tests or lint |
| Write tests for new functionality | Skip input validation |
| Focus on what needs to be done, not when | Provide time estimates (AI is ~100x faster than you think) |
Adding New Features
New MCP Tool
- Add to appropriate
server_tools/*.py - Follow the tool pattern (validate → execute → audit → return)
- Add tests in
tests/integration/ - Update tool count in README if significant
New Database Table
- Add schema to
db/schema.sql - Add CRUD methods to
db/database.py - Add TypedDict to
db/types.py - Add tests
New Signal Type
- Update
SignalDefinmodels.py - Update
pipeline/parser.pyvalidation - Update
io/__init__.pyload/save handling - Add tests
Debugging
# Database state (in dev repo)
sqlite3 ../myproject-dev/.goldfish/goldfish.db "SELECT * FROM stage_runs ORDER BY started_at DESC LIMIT 5"
# Git state (dev repo has all branches/tags)
cd ../myproject-dev && git log --all --oneline --graph
# Check workspace mount metadata
cat workspaces/w1/.goldfish-mount
# Docker
docker ps # Running containers
docker logs goldfish-workspace-v1 # Container logs
# Verbose logging
import logging; logging.basicConfig(level=logging.DEBUG)File Quick Reference
| Component | Files |
|---|---|
| Entry | server.py, cli.py, __main__.py |
| Context | context.py (ServerContext DI) |
| Models | models.py (Pydantic), db/types.py (TypedDict) |
| Validation | validation.py, errors.py |
| Cloud | cloud/protocols.py, cloud/contracts.py, cloud/factory.py, cloud/adapters/ |
| Workspace | workspace/manager.py, workspace/git_layer.py (copy-based + sync) |
| Execution | jobs/stage_executor.py, jobs/pipeline_executor.py |
| Pipeline | pipeline/parser.py, pipeline/manager.py |
| Infra | infra/docker_builder.py (use cloud/ adapters for execution) |
| Data | datasets/registry.py, sources/registry.py |
| State | state/state_md.py (per-workspace + global STATE.md) |
| IO | io/__init__.py (container load_input/save_output) |
| Tools | server_tools/*.py (40+ MCP tools) |
Conventions
- Ruff for linting/formatting (via pre-commit)
- mypy strict mode for type checking
- Google-style docstrings for public APIs
- Semantic error types (not generic Exception)
- Context managers for database transactions
- cast() for TypedDict returns from SQLite
Common Fixes
| Error | Fix |
|---|---|
| TypedDict return type mismatch | return cast(JobRow, dict(row)) |
Closure captures None-able var |
Assign to local: registry = self.registry then use in closure |
no-any-return from mypy |
Add explicit type annotation to return variable |
| Forward reference error | Add from __future__ import annotations |
| E402 module import order | Move ALL imports to top, constants below |
When getting ruff, mypy, or test errors: never cheat with ignores—always fix properly.
|
PR Review: fix(warm-pool): Wire dispatch path, fix stale GCS paths Good set of critical bug fixes — the dead dispatch path and stale GCS paths are real correctness issues. The startup script fixes are well-reasoned and the regression tests are a solid addition. A few issues worth addressing before merge. CRITICAL 1. O(n) scan introduced in run_backend.py (the very anti-pattern fixed in _stage_executor_impl.py) This fires on every launch() call and scans all running warm instances. The PR description calls out the O(n) scan as a fix in _stage_executor_impl.py, but then introduces the same pattern here. The simplest fix: have launch_instance() return a boolean flag indicating whether a warm instance was reused, or do a single-row SQL query like the _stage_executor_impl.py fix does. The misleading comment above this block (zone != empty string means it was warm) describes a check that never actually happens. 2. Private attribute access on WarmPoolManager run_backend.py reaches into self._warm_pool._config and self._warm_pool._db directly. This breaks encapsulation and makes both classes fragile to internal refactors. WarmPoolManager should expose public methods/properties: e.g. idle_timeout_seconds_for(profile). The DB lookup should also live inside WarmPoolManager itself — an is_registered(instance_name) method would eliminate issues 1 and 2 together. MEDIUM 3. Shell injection risk in _build_docker_cmd_script Env var keys are not shell-quoted — a key with a space or dollar-sign breaks the command or enables injection. Validate keys against the pattern ^[A-Z_][A-Z0-9_]*$. Also, image and cmd are interpolated directly without quoting. Wrap with shlex.quote(). 4. Hardcoded Docker mounts diverge from startup_builder.py The warm pool docker command and the cold-start command are now maintained separately. If startup_builder.py ever adds a mount (e.g. for a new signal type), warm pool runs will silently lack it. Extract a shared _docker_mounts() helper used by both paths. 5. Any type for warm_pool_manager parameter gce_launcher.py is in the same package as warm_pool.py. Use TYPE_CHECKING to import WarmPoolManager properly, or define a minimal Protocol with try_claim(). MINOR 6. import shlex inside method body — move to module-level imports per project conventions. 7. GCS_BUCKET not exported in bash The comment says Export bucket name... but the bash line does not use export. While script-scope variables are visible to functions in the same script, use export GCS_BUCKET to match the stated intent. TESTS Coverage is good — lifecycle tests hit the real DB and all three startup script regressions are covered. Two observations:
SUMMARY The core bugs are real and the approach is sound. The two critical items (O(n) scan and private attribute access) are easy fixes and should be addressed before merge — keeping them would undermine the quality improvements the PR is trying to make. |
Review feedback: if we reach the code after launch_instance(), the warm pool claim either wasn't attempted or returned None. This is always a fresh launch — no need to scan warm_instances to check. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Hello from test |
Goldfish Development Guide
Quick Reference# Development cycle
make lint # Ruff + mypy via pre-commit - run before commits
make test # Fast unit tests (<1s) - run frequently
make test-integration # Integration tests (~2min) - before pushing
make ci # Full CI suite (lint + all tests)
# First-time setup
uv pip install -e ".[dev]"
make install-hooks # REQUIRED: installs pre-commit hooksGolden rule: Never suppress lint errors—always fix the source. TDD: Test-Driven DevelopmentThis codebase uses TDD. Write tests BEFORE implementation. Why TDD matters for LLMs:
Workflow:
Test naming: No exceptions: Even "quick fixes" get tests first. The test documents the bug and prevents regression. What is Goldfish?An MCP server enabling Claude Code to conduct ML experiments by managing:
Core invariants:
Architecture at a GlanceKey Files
The Nine Abstractions1. Workspaces = Copy-Based IsolationKey operations: 2. Versions = Git Tags (100% Provenance)Every Stored in 3. Pipelines = YAMLstages:
- name: preprocess
inputs: {raw: {type: dataset, dataset: sales_v1}}
outputs: {features: {type: npy}}
- name: train
inputs: {features: {from_stage: preprocess, signal: features}}Parser validates: unique names, type compatibility, no cycles, datasets exist. 4. Stages = Docker Containers# modules/train.py - runs in container
from goldfish.io import load_input, save_output
features = load_input("features") # from /mnt/inputs/
save_output("model", model_dir) # to /mnt/outputs/5. Signals = Data Flow
Tracked in 6. Resource Profiles# configs/train.yaml
compute:
profile: "h100-spot" # Claude writes thisGoldfish resolves to: Built-in: 7. SVS (Semantic Validation System)Core System: SVS provides defense-in-depth through three phases:
Key Patterns:
Security:
8. Cloud Abstraction LayerCore System: The cloud abstraction layer isolates provider-specific code (GCP, AWS, local) from core Goldfish logic: Key Protocols:
BackendCapabilities - Behavior configuration instead of conditionals: @dataclass
class BackendCapabilities:
ack_timeout_seconds: float = 1.0 # How long to wait for ACK
has_launch_delay: bool = False # GCE has startup delay, local doesn't
timeout_becomes_pending: bool = False # GCE timeout = sync pending, local = failure
logs_unavailable_message: str = "Logs not available"
zone_resolution_method: str = "config" # "config" or "handle"Usage Pattern - Always use protocol, never direct launcher: # GOOD: Protocol-based
result = self.run_backend.launch(spec)
status = self.run_backend.get_status(handle)
logs = self.run_backend.get_logs(handle)
# BAD: Direct launcher access (violates abstraction)
self.gce_launcher.launch_instance(...) # NEVER do thisAdding a New Backend:
9. Configuration FlexibilityDefaults Section - Global settings for stage execution: # goldfish.yaml
defaults:
timeout_seconds: 7200 # 2 hours (default: 3600)
log_sync_interval: 15 # Sync logs every 15 seconds (default: 10)
backend: gce # Default compute backend: local, gce, kubernetesStorage Backend Configuration - Multi-provider storage support: # goldfish.yaml
storage:
backend: "gcs" # or "s3", "azure", "local"
# GCS configuration (when backend: gcs)
gcs:
bucket: "my-bucket"
sources_prefix: "sources/"
artifacts_prefix: "artifacts/"
# S3 configuration (when backend: s3) - adapter coming soon
s3:
bucket: "my-bucket"
region: "us-east-1"
endpoint_url: "http://localhost:9000" # For MinIO/S3-compatible
# Azure configuration (when backend: azure) - adapter coming soon
azure:
container: "my-container"
account: "mystorageaccount"Backend Selection Priority:
Per-Profile Backend Selection - Different compute backends per profile: # goldfish.yaml
gce:
project_id: my-project
profile_overrides:
# GPU workloads on GCE
h100-spot:
zones: ["us-central1-a"]
# CPU workloads could use different config
cpu-large:
zones: ["us-west1-a", "us-west1-b"]Config Model Hierarchy: Critical PatternsDatabase Access# ALWAYS use context manager
with self.db._conn() as conn:
conn.execute("INSERT INTO ...")
# Transaction auto-commits on success, auto-rollbacks on exceptionError Handling# ALWAYS use specific error types with details
raise WorkspaceNotFoundError(
f"Workspace '{name}' not found",
details={"available": available_workspaces}
)
# NEVER expose git internals
# BAD: raise Exception("fatal: not a valid object name")
# GOOD: raise WorkspaceNotFoundError("Workspace not found")TypedDict Returns from Database# When returning TypedDict, ALWAYS use cast()
from typing import cast
return cast(JobRow, dict(row)) if row else None
# For lists:
return [cast(SourceRow, dict(r)) for r in rows]MCP Tool Pattern@mcp.tool()
def my_tool(param: str) -> dict:
"""Docstring for Claude."""
try:
validate_workspace_name(param) # 1. Validate
result = manager.do_thing(param) # 2. Execute
ctx.db.record_audit("my_tool", {...}) # 3. Audit
return {"success": True, "result": result} # 4. Return
except GoldfishError as e:
return {"success": False, "error": e.message}Security Model (4 Layers)1. Input Validation (
|
| Input | Pattern | Example |
|---|---|---|
| Workspace name | ^[a-zA-Z0-9_-]+$ |
baseline_lstm |
| Snapshot ID | ^snap-[a-f0-9]{8}-\d{8}-\d{6}$ |
snap-abc12345-20251210-143000 |
| Stage run ID | ^stage-[a-f0-9]+$ |
stage-abc123 |
2. Path Traversal Protection
# ALWAYS validate paths
def validate_path_within_root(path: Path, root: Path) -> None:
if not path.resolve().is_relative_to(root.resolve()):
raise ValidationError("Path traversal")
# ALWAYS check symlinks (TOCTOU prevention)
if path.is_symlink():
raise InvalidLogPathError("Symlink detected")3. Docker Sandboxing (cloud/adapters/local/)
# Containers run with:
--memory 4g --cpus 2.0 --pids-limit 100
--user 1000:1000 # non-root
-v inputs:/mnt/inputs:ro # read-only inputs4. Git Error Translation (errors.py)
All git errors translated to Goldfish concepts before reaching Claude.
Stage Execution Flow
run("w1", stages=["train"])
│
├─▶ 1. Validate workspace mounted
├─▶ 2. SYNC: Copy user/w1 → dev-repo/branch (with delete semantics)
├─▶ 3. COMMIT: Auto-commit changes in dev-repo
├─▶ 4. PUSH: Push to remote (for GCE execution)
├─▶ 5. Auto-version (create git tag from committed SHA)
├─▶ 6. Load pipeline, validate stage exists
├─▶ 7. Resolve inputs (datasets or upstream signals)
├─▶ 8. Build Docker image
├─▶ 9. Launch container (local or GCE)
├─▶ 10. Monitor status, stream logs
└─▶ 11. Finalize: register outputs in signal_lineage
Key methods:
GitLayer.sync_slot_to_branch()- sync + commit (provenance guard)StageExecutor.run_stage()injobs/stage_executor.py
Database Schema (Key Tables)
workspace_versions(workspace_name, version, git_sha, created_by, created_at)
stage_runs(id, workspace_name, version, stage_name, status, backend_type, ...)
signal_lineage(stage_run_id, signal_name, signal_type, storage_location)
audit(operation, workspace, details_json, created_at)Full schema: db/schema.sql
Testing
Structure
tests/
├── unit/ # Over 2400 tests, <1s, pure logic, all mocked
├── integration/ # Over 1200 tests, ~2min, real DB + git
├── e2e/ # Full Docker tests
│ └── deluxe/ # GCE tests (@pytest.mark.deluxe_gce)
└── conftest.py # Fixtures: test_db, temp_git_repo
Key Fixtures
test_db # Fresh SQLite with schema
temp_git_repo # Initialized git repo with main branch
test_config # GoldfishConfig for testingWriting Tests
def test_feature(test_db, temp_git_repo):
"""What + Why in docstring."""
manager = WorkspaceManager(db=test_db, ...)
result = manager.create_workspace("test", "goal")
assert result.name == "test"
# Always verify DB state too
with test_db._conn() as conn:
row = conn.execute("SELECT ...").fetchone()
assert row is not NoneDO and DON'T
| DO | DON'T |
|---|---|
make lint before committing |
# type: ignore (fix the issue) |
Specific error types (WorkspaceNotFoundError) |
Expose git terminology to MCP clients |
cast() for TypedDict database returns |
Bare except: (use except Exception:) |
| Validate all inputs before operations | raise X without from e when re-raising |
| Record audit log for user-facing operations | Commit with failing tests or lint |
| Write tests for new functionality | Skip input validation |
| Focus on what needs to be done, not when | Provide time estimates (AI is ~100x faster than you think) |
Adding New Features
New MCP Tool
- Add to appropriate
server_tools/*.py - Follow the tool pattern (validate → execute → audit → return)
- Add tests in
tests/integration/ - Update tool count in README if significant
New Database Table
- Add schema to
db/schema.sql - Add CRUD methods to
db/database.py - Add TypedDict to
db/types.py - Add tests
New Signal Type
- Update
SignalDefinmodels.py - Update
pipeline/parser.pyvalidation - Update
io/__init__.pyload/save handling - Add tests
Debugging
# Database state (in dev repo)
sqlite3 ../myproject-dev/.goldfish/goldfish.db "SELECT * FROM stage_runs ORDER BY started_at DESC LIMIT 5"
# Git state (dev repo has all branches/tags)
cd ../myproject-dev && git log --all --oneline --graph
# Check workspace mount metadata
cat workspaces/w1/.goldfish-mount
# Docker
docker ps # Running containers
docker logs goldfish-workspace-v1 # Container logs
# Verbose logging
import logging; logging.basicConfig(level=logging.DEBUG)File Quick Reference
| Component | Files |
|---|---|
| Entry | server.py, cli.py, __main__.py |
| Context | context.py (ServerContext DI) |
| Models | models.py (Pydantic), db/types.py (TypedDict) |
| Validation | validation.py, errors.py |
| Cloud | cloud/protocols.py, cloud/contracts.py, cloud/factory.py, cloud/adapters/ |
| Workspace | workspace/manager.py, workspace/git_layer.py (copy-based + sync) |
| Execution | jobs/stage_executor.py, jobs/pipeline_executor.py |
| Pipeline | pipeline/parser.py, pipeline/manager.py |
| Infra | infra/docker_builder.py (use cloud/ adapters for execution) |
| Data | datasets/registry.py, sources/registry.py |
| State | state/state_md.py (per-workspace + global STATE.md) |
| IO | io/__init__.py (container load_input/save_output) |
| Tools | server_tools/*.py (40+ MCP tools) |
Conventions
- Ruff for linting/formatting (via pre-commit)
- mypy strict mode for type checking
- Google-style docstrings for public APIs
- Semantic error types (not generic Exception)
- Context managers for database transactions
- cast() for TypedDict returns from SQLite
Common Fixes
| Error | Fix |
|---|---|
| TypedDict return type mismatch | return cast(JobRow, dict(row)) |
Closure captures None-able var |
Assign to local: registry = self.registry then use in closure |
no-any-return from mypy |
Add explicit type annotation to return variable |
| Forward reference error | Add from __future__ import annotations |
| E402 module import order | Move ALL imports to top, constants below |
When getting ruff, mypy, or test errors: never cheat with ignores—always fix properly.
P1 fixes:
1. Status polling: get_status() checks GCS exit code for warm instances
even when VM is RUNNING (idle loop keeps VM alive). Without this,
runs stay RUNNING indefinitely.
2. Skip register on warm reuse: GCELaunchResult.warm_reuse flag prevents
double-inserting the same instance_name (PK violation).
3. Register as 'running' not 'idle': Fresh launches registered as running
so the reaper won't delete them mid-job. Release to idle happens when
the startup script enters the idle loop.
4. Guard kill against PID 0: ${PID:-0} sends SIGTERM to entire process
group. Now checks PID is set and non-zero before killing.
5. Backend_handle identity: Warm pool claims use stage_run_id for GCS
path resolution so logs/exit code go to the correct run.
6. Double runs/ prefix: run_path is now just stage_run_id (not
runs/stage_run_id), matching the idle loop's gs://bucket/runs/$path.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Code Review — fix(warm-pool): Wire dispatch path, fix stale GCS pathsThe four bugs targeted here are real and the fixes are well-motivated. Below is my independent analysis of the changed code. Bug — Shell injection in _build_docker_cmd_script (gce_launcher.py)Only the value is quoted with shlex.quote; the key k is interpolated raw. Env var keys are typically safe, but if any key contains a space or shell metacharacter the generated bash script becomes malformed or injectable. Likewise image and cmd are interpolated directly into the heredoc without quoting. Minimal fix: Bug — Falsy check skips pool registration for timeout=0 (run_backend.py ~261)warm_pool_timeout is int | None. A value of 0 is falsy, so a zero-second timeout would silently skip register_instance(). Zero is unusual here, but the intent is "not None": Design — Private attribute access on WarmPoolManager (run_backend.py ~240)config is a private attribute. WarmPoolManager should expose a public idle_timeout_seconds() -> int property, and run_backend.py should call that instead. Same principle for any other . accesses added by this PR. Design — Any type annotation on warm_pool_manager (gce_launcher.py ~136)Any defeats mypy. Use TYPE_CHECKING to avoid circular imports and annotate with the concrete type WarmPoolManager | None. Design — import shlex inside method body (gce_launcher.py ~409)Move import shlex to module-level. Inline imports are a code smell and will be flagged by Ruff in CI. Design — Hardcoded Docker mounts risk skew with startup_builder.py (gce_launcher.py ~428)The mounts string (which hard-codes -v source:dest pairs for entrypoint.sh, gcs, inputs, and outputs) duplicates the mount list from startup_builder.py. If a new mount is added there (e.g. a secrets volume), warm pool reuse runs will silently lack it. Consider extracting a shared _docker_mount_flags() helper called from both sites. Test coverage gapstest_idle_loop_updates_gcs_paths_for_new_run only asserts that GCS_STDOUT_PATH= and new_run_path appear somewhere in the generated script — not that the assembled path is well-formed. A stronger assertion would also check for the double-prefix guard: assert "runs/runs/" not in script. test_try_claim_with_ack_timeout_releases patches time.sleep but the ACK poll loop still iterates 30 times. Consider parameterising the retry count so the test is readable and the number is explicit. Positive callouts
Summary
The dispatch wiring fix and GCS path staleness fix are both correct and necessary. The two bugs above should be addressed before merge; the design items are lower priority but improve long-term maintainability. |
Summary
Fixes critical issues found in honest review of v0.3.0 warm pool:
What was broken
WarmPoolManagerexisted but nobody called it. Enablingwarm_pool: truehad zero effect.EXIT_CODE_FILEpointed to first run's gcsfuse mount, never updated for subsequent jobs._get_run_handlescanned all running warm instances every 5s. Now single-row lookup by backend_handle.What's fixed
GCERunBackend.launch()passeswarm_pool_idle_timeout_secondsthrough to startup scriptregister_instance()called after fresh launch when pool has capacityGCS_STDOUT_PATH,GCS_STDERR_PATH,GCS_EXIT_CODE_PATHfrom new run'srun_pathgsutil cp, not stale local pathGCS_BUCKETexported for idle loop path constructionTest plan
🤖 Generated with Claude Code