[PR-5] M5 计算/展示依赖边界测试

> 本任务属于 Epic #23。共享上下文(整体目标、Agent Execution Contract、Non-goals、回归基线)见 #23。
> 依赖:无(独立)。
> 代码定位按当前仓库版本核对;若行号漂移,以函数名 / 字段名 / 挂载点语义为准。

完成本 PR 需遵守 Epic #23 的 Agent Execution Contract 与 Test Execution Protocol。PR 描述需列出:changed files / tests added / tests run / before-after / non-goals。

---

## 目标

锁定并验证"展示层不依赖 GPU/模型重依赖、可在纯 CPU 环境运行",使展示/门禁/出稿能独立于实验执行运行。

## Allowed changes
- import boundary tests
- offline rendering fixture tests
- materialize seam tests
- PR-5 fixtures

## Forbidden changes
- 不 mock GPU 执行端来伪造端到端成功
- 不新增 stub 计算后端
- 不让展示层 import `torch` / `transformers` / `vllm` / `experiment_forge` / `experiment_executor`
- 不把 M5-4 GPU 冒烟放进 CPU CI 必过项

## 现状(已核实)
- 传递性确认无 GPU 依赖:`import agents.paper_completeness` / `agents.paper_orchestra_pipeline` 后查 `sys.modules`,无 torch/transformers/vllm;传递加载的内部模块全仓 grep 也无这些依赖。
- 全仓唯一真 torch 依赖在 `agents/experiment_forge.py`,且仅以生成子进程脚本的字符串模板形式出现(`:1977`、`:3718`)。`experiment_executor.py` 无 torch import。
- 真实展示入口 `generate_bundle_paper_orchestra(run_id)`(`paper_orchestra_pipeline.py:1421`)是 DB 驱动(读 sqlite);`paper_completeness.py:201-209` 另读 `benchmark_summary.json` 和 `run_config.json`。
- 纯离线渲染器 = `assemble_main_tex(state, orchestrated, ...)`(`:432`),确定性、不调 LLM。完整 orchestra 渲染走 `_run_full_pipeline` → 调 LLM(httpx)+ matplotlib(现有测试 `test_vnext_manuscript.py:286` 即 mock 掉它)。
- CPU 文件 seam = `materialize_deep_benchmark_artifacts`(`benchmark_artifacts.py`):读 `raw_predictions.jsonl`(`min_lines=100`)→ 写 `benchmark_summary.json` / `seed_variance_table.json` / `per_dataset_results.json` / `ablation_table.json`,纯 Python。

## 验收用例

| # | 输入 | 期望输出 | 环境 |
|---|---|---|---|
| M5-1 | `import agents.paper_orchestra_pipeline` + `agents.paper_completeness` 后查 `sys.modules` | 不含 `torch`/`transformers`/`vllm`/`experiment_forge`/`experiment_executor` | CPU |
| M5-2 | 录制好的 `state`/`orchestrated` 快照夹具 | `assemble_main_tex` 渲染出 `main.tex`,全程无 GPU、无 LLM 调用 | CPU |
| M5-2′(可选) | 同上但走 `generate_bundle_paper_orchestra` | mock `_run_full_pipeline`(同现有测试)→ 出 bundle,无 GPU | CPU |
| M5-3 | 手写 ≥100 行 `raw_predictions.jsonl` 夹具 | `materialize_deep_benchmark_artifacts` 产出 4 个 JSON,字段符合 #23 schema | CPU |
| M5-4(GPU 冒烟) | `experiment_forge` benchmark 脚本喂真 job | 真跑出 `raw_predictions.jsonl` / `run_config.json` / `benchmark_summary.json`;端到端冒烟 | GPU |

> 关键约束:`experiment_forge.py:2830` 有 `RuntimeError("Real LLM benchmark requires CUDA. No synthetic or mocked fallback is allowed.")` —— GPU 执行端禁止合成/mock 跑通。故 M5-3 只验已存在的 CPU 转换缝(raw→materialize),不是 "stub 计算后端"。已录好的 jsonl 可当 CPU 夹具喂给 materialize。

## DoD(CPU CI 与 GPU 冒烟分开)
- **CPU CI required:** M5-1 / M5-2 / M5-3 必须在纯 CPU CI 通过。
- **Manual GPU smoke:** M5-4 只在真实 GPU 环境手动或单独 pipeline 验证,**不作为 CPU CI 阻断项**。
- 禁止为通过 M5-4 在 CPU 环境新增 synthetic / mocked GPU fallback。
</content>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PR-5] M5 计算/展示依赖边界测试 #26

目标

Allowed changes

Forbidden changes

现状(已核实)

验收用例

DoD(CPU CI 与 GPU 冒烟分开)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	输入	期望输出	环境
M5-1	`import agents.paper_orchestra_pipeline` + `agents.paper_completeness` 后查 `sys.modules`	不含 `torch`/`transformers`/`vllm`/`experiment_forge`/`experiment_executor`	CPU
M5-2	录制好的 `state`/`orchestrated` 快照夹具	`assemble_main_tex` 渲染出 `main.tex`,全程无 GPU、无 LLM 调用	CPU
M5-2′(可选)	同上但走 `generate_bundle_paper_orchestra`	mock `_run_full_pipeline`(同现有测试)→ 出 bundle,无 GPU	CPU
M5-3	手写 ≥100 行 `raw_predictions.jsonl` 夹具	`materialize_deep_benchmark_artifacts` 产出 4 个 JSON,字段符合 #23 schema	CPU
M5-4(GPU 冒烟)	`experiment_forge` benchmark 脚本喂真 job	真跑出 `raw_predictions.jsonl` / `run_config.json` / `benchmark_summary.json`;端到端冒烟	GPU

[PR-5] M5 计算/展示依赖边界测试 #26

Description

目标

Allowed changes

Forbidden changes

现状(已核实)

验收用例

DoD(CPU CI 与 GPU 冒烟分开)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions