Skip to content

EPIC: trios-railway-mcp v2 — 7-account fleet control (closes #61 + extends) #114

@gHashTag

Description

@gHashTag

Goal

Upgrade trios-railway-mcp to manage all 7 Railway accounts (acc0..acc6) from a single MCP endpoint — no operator context switching, no per-account deploy scripts, every tool call is routed to the correct account via an explicit account parameter.

Supersedes #61 (which specified only Acc0..Acc3). The 7th account arrived 2026-05-02 per operator credential drop; our canonical lane mapping (trios#445) now covers LEADER + FOLLOWER-A..E + SPRINT-X/Y/Z.

Anchor: phi^2 + phi^-2 = 3. R1: Rust-only. R5: no "done" without merged PR + green CI + evidence row.

Current State (R5-honest snapshot, 2026-05-03 04:40 UTC)

Field Value
MCP URL trios-railway-production-d4d6.up.railway.app/mcp (acc0 only)
Tools today railway_service_list/deploy/redeploy/delete, railway_template_deploy, railway_audit_migrate_sql, railway_experience_append
Auth single RAILWAY_TOKEN env var — acc0 project token
Multi-account gap #61 blocker — every call hits one account silently
Fleet visible to operator 9 services on acc0_new only
Fleet invisible to MCP 5 services on acc1..acc5, 1 on acc6 (legacy, pre-acc0_new)

Accounts (operator-supplied 2026-05-02)

ID Email Project Env Token kind Lane (per trios#445)
acc0 kagler… f29aa9dd fade0d77 project FOLLOWER-A + SPRINT-X
acc1 rumbo… e4fe33bb 54e293b9 personal LEADER
acc2 brabb… 12c508c7 441bd3a6 personal FOLLOWER-B + SPRINT-Y
acc3 gondii… 8ab06401 cd2d987b personal FOLLOWER-C
acc4 monge… 0247abaa 336c41a9 personal FOLLOWER-D + SPRINT-Z
acc5 sadloa… 475a2290 5724292a project FOLLOWER-E
acc6 horse… 475a2290 5724292a project SPRINT-X (reserve, shares project with acc5)

Note acc5 + acc6 share project_id — both tokens address the same Railway project but represent different operator-owned accounts for cost separation.

Phased plan

Phase M1 — RailwayMultiClient in tri-railway-core (blocks all else)

Port the #61 paste.txt prototype, extended from Acc0..Acc3 to Acc0..Acc6:

// crates/tri-railway-core/src/multiclient.rs
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum AccountId { Acc0, Acc1, Acc2, Acc3, Acc4, Acc5, Acc6 }

pub enum TokenKind { Project, Personal }

#[derive(Clone)]
pub struct AccountCreds {
    pub token: SecretString,
    pub project: ProjectId,
    pub env: EnvironmentId,
    pub kind: TokenKind,
}

pub struct RailwayMultiClient {
    clients: BTreeMap<AccountId, Client>,
}

impl RailwayMultiClient {
    /// Load all 7 accounts from env: RAILWAY_TOKEN_ACC{0..6}, RAILWAY_PROJECT_ID_ACC{0..6}, ...
    pub fn from_env() -> Result<Self, MultiClientError>;
    pub fn get(&self, acc: AccountId) -> Result<&Client, MultiClientError>;
    pub fn accounts(&self) -> impl Iterator<Item = AccountId>;
}

Header routing — TokenKind::ProjectProject-Access-Token header; TokenKind::PersonalAuthorization: Bearer header (we already handle this split in trios-igla-ops::accounts, PR #113).

Acceptance:

  • cargo test -p tri-railway-core multiclient — unit tests for per-account routing + header switch
  • RailwayMultiClient::from_env() gracefully reports missing tokens (no panic)
  • Each Client uses its own reqwest::Client instance (no header bleed)

Phase M2 — Add account parameter to every MCP tool

Mutate existing tool signatures to accept an optional account: String (default acc0 for back-compat). Example:

#[tool(description = "Redeploy a Railway service on a specific account.")]
pub async fn railway_service_redeploy(
    &self,
    Parameters(params): Parameters<RedeployParams>,
) -> Result<..., McpError> {
    let acc = params.account.as_deref().unwrap_or("acc0").parse::<AccountId>()?;
    let client = self.multi.get(acc)?;
    client.service_redeploy(params.service, params.environment).await
}

Affected tools: railway_service_list / deploy / redeploy / delete, railway_template_deploy, railway_audit_migrate_sql.

Acceptance:

  • Every mutating tool logs the account in stderr + L7 experience row
  • railway_service_list defaults to acc0 but accepts "acc0..acc6" or "all" (fan-out)
  • Omitting account on a DELETE logs a warning but still runs on acc0 (safe default)

Phase M3 — New tool: railway_fleet_snapshot (O(1) read-all)

Single tool call that fans out to all 7 accounts in parallel and returns:

{
  "accounts": [
    {"id":"acc0","status":"OK","services":["IGLA-RAILWAY-FOLLOWER-A","IGLA-RAILWAY-SPRINT-X"]},
    {"id":"acc1","status":"AUTH_ERR","error":"Not Authorized","services":[]},
    ...
  ],
  "total_services": 9,
  "healthy": 6,
  "auth_err": 1
}

Direct port of trios-igla-ops::fleet_probe (PR #113) into the MCP tool surface.

Acceptance:

  • One call ≤ 5 s wall-clock (parallel fan-out, not serial)
  • Returns whether each account's token is usable
  • Flags canon violations (service whose name is not IGLA-RAILWAY-*)

Phase M4 — New tool: railway_fleet_deploy_canonical

One-shot: ensure the 6 canonical lanes (trios#445) exist across all accounts, create any missing service from ghcr.io/ghashtag/trios-trainer-igla:latest with canonical env (NEON_DATABASE_URL, TRIOS_TRAINER_BIN=scarab, RAILWAY_ACC=acc{N}, WAVE=…).

Acceptance:

  • Idempotent — running twice does not duplicate services
  • Rolls back on partial failure (accs that succeed stay, accs that fail are clearly reported)
  • Emits one L7 row per account

Phase M5 — Operator guardrails

  • Global read-only env toggle TRIOS_MCP_READONLY=true that refuses every mutating tool (for emergency lockout).
  • Per-tool rate limit: no more than 3 deploys/redeploys per account per minute (stop accidental infinite loops).
  • Canon-name regex enforcement on every deploy — reject any name that doesn't match ^IGLA-RAILWAY-[A-Z]+(-[A-Z])?$ or ^IGLA-[A-Z0-9-]+-(binary32|GF16|bfloat16|…)-h[0-9]+-LR[0-9]{4}-rng[0-9]+ (trios-trainer-igla#93).

Acceptance:

  • TRIOS_MCP_READONLY=true path has integration test
  • Rate-limit counter visible in L7 experience rows
  • Canon-name rejections return a structured error, not a stack trace

Call order (execution)

  1. M1 lands — blocks everything else.
  2. M2 + M3 can land in parallel (they share RailwayMultiClient but don't depend on each other).
  3. M4 after M2 (needs the multi-account deploy primitive).
  4. M5 last — requires all previous phases to be functional.

Observability / watchdog integration

Once M3 lands, the existing trios-gardener-watchdog cron should call railway_fleet_snapshot each tick instead of hitting Neon directly for scarab heartbeats. Benefit: watchdog will flag dead Railway deploys even when the scarab never registered in Neon (e.g. container crash before register_scarab runs — the pattern we saw on 2026-05-02).

Risk register

Risk Mitigation
Operator leaks acc1..acc5 personal tokens by pasting them into an issue Tokens always live in .railway_creds.env / Railway environment vars, never in repo. SecretString wrapper prevents accidental Debug leaks.
railway_fleet_deploy_canonical wipes a legacy service by name collision Pre-flight check: refuse to create if the target account already has any service with a prefix-matching name that was not created by a prior canonical deploy (distinguishable via L7 trail).
Acc1 / Acc2 / Acc3 banned-list drift (past incidents: Acc1 banned=true in railway_accounts) M3 surfaces banned state; M4 skips banned accounts with a structured reason.
M5 readonly mode accidentally left on in production Watchdog notification fires when readonly toggle has been on for >1 h.

Explicit non-goals

  • Authoring new Railway features (networking, volumes, secrets management) — stay at the level of operational control of existing services.
  • Replacing the trainer / scarab / gardener binaries — these stay in trios-trainer-igla and trios-railway bin/.
  • Cross-project moves (e.g. migrate a service from acc1 to acc2) — explicit non-goal for v2; revisit in v3 if the race demands it.

References

🌻 phi^2 + phi^-2 = 3 · TRINITY · NEVER STOP

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions