Context
Glass Box P3/P4 scenarios run hermetically in CI (scripted mode via ScriptedGenerationBackend). The #1531 spec called for a nightly live-backend pass that exercises the same RuntimeScenarioRegistry entries against real inference backends — asserting only the structural event subsequence, not token content.
What's needed
- A nightly CI job (
.github/workflows/nightly.yml or added to the existing nightly tier) that runs test_allRegisteredScenarios_passInLiveMode against Ollama (already in the nightly trait set) and optionally CloudSaaS
- The live-mode test method in
ManifoldRuntimeTests (the scripted-mode gate test_allRegisteredScenarios_passInScriptedMode already exists; live mode needs a parallel method guarded by --traits Ollama)
- Failure notifications distinct from the per-PR gate so a flaky live model doesn't block PRs
Scope
- Ollama is the default live target (local model, deterministic enough for structural asserts)
- CloudSaaS gated separately (requires env secrets); optional in v1
- Each scenario's structural
[ConversationEventKind] subsequence is the assertion oracle — same data the scripted gate uses
Policy
Add to the existing nightly workflow tier rather than creating a new workflow file.
Context
Glass Box P3/P4 scenarios run hermetically in CI (scripted mode via
ScriptedGenerationBackend). The #1531 spec called for a nightly live-backend pass that exercises the sameRuntimeScenarioRegistryentries against real inference backends — asserting only the structural event subsequence, not token content.What's needed
.github/workflows/nightly.ymlor added to the existing nightly tier) that runstest_allRegisteredScenarios_passInLiveModeagainst Ollama (already in the nightly trait set) and optionally CloudSaaSManifoldRuntimeTests(the scripted-mode gatetest_allRegisteredScenarios_passInScriptedModealready exists; live mode needs a parallel method guarded by--traits Ollama)Scope
[ConversationEventKind]subsequence is the assertion oracle — same data the scripted gate usesPolicy
Add to the existing nightly workflow tier rather than creating a new workflow file.