Skip to content

ci(scenarios): nightly E2E pass for Glass Box scenario registry against live backends #1576

@roryford

Description

@roryford

Context

Glass Box P3/P4 scenarios run hermetically in CI (scripted mode via ScriptedGenerationBackend). The #1531 spec called for a nightly live-backend pass that exercises the same RuntimeScenarioRegistry entries against real inference backends — asserting only the structural event subsequence, not token content.

What's needed

  • A nightly CI job (.github/workflows/nightly.yml or added to the existing nightly tier) that runs test_allRegisteredScenarios_passInLiveMode against Ollama (already in the nightly trait set) and optionally CloudSaaS
  • The live-mode test method in ManifoldRuntimeTests (the scripted-mode gate test_allRegisteredScenarios_passInScriptedMode already exists; live mode needs a parallel method guarded by --traits Ollama)
  • Failure notifications distinct from the per-PR gate so a flaky live model doesn't block PRs

Scope

  • Ollama is the default live target (local model, deterministic enough for structural asserts)
  • CloudSaaS gated separately (requires env secrets); optional in v1
  • Each scenario's structural [ConversationEventKind] subsequence is the assertion oracle — same data the scripted gate uses

Policy

Add to the existing nightly workflow tier rather than creating a new workflow file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions