Complete Epic #23 presentation evidence gates by Protocol-zero-0 · Pull Request #29 · billion-token-one-task/Deepgraph

Protocol-zero-0 · 2026-06-04T21:01:47Z

Summary

Completes Epic #23 across the requested issue order on convergence branch epic-23-evidence-ledger:

[PR-1] M1 显著性数值修复 + EvidenceLedger 最小结构 #24: numeric significance gating and minimal EvidenceLedger builder
[PR-2] M4 EvidenceLedger full traceability checker #27: EvidenceLedger validation and Abstract/Conclusion traceability checks
[PR-3] M2 四个确定性检查(未解析/脚手架/串台/复读) #25: deterministic LaTeX sanity checks for unresolved refs, scaffold leakage, cross-run identity, and repetition
[PR-4] M3 接入渲染门禁(paper_orchestra_pipeline.py:1843) #28: render-gate integration for deterministic checks
[PR-5] M5 计算/展示依赖边界测试 #26: CPU-only presentation/import/materialization boundary tests for M5-1/2/3

Includes one preliminary branch-only regression fixture stabilization commit so the required baseline tests are green on this convergence branch. master was not modified.

Changed Files

agents/paper_completeness.py
agents/paper_orchestra_pipeline.py
tests/test_paper_completeness_m1.py
tests/test_paper_completeness_m4.py
tests/test_latex_sanity_m2.py
tests/test_vnext_manuscript.py
tests/test_presentation_cpu_boundary_m5.py
tests/fixtures/*

Tests Run

Preflight stabilization:

pytest tests/test_pipeline_contracts.py -> 12 passed
pytest tests/test_vnext_manuscript.py -> 5 passed

#24:

pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
milestone: pytest tests/test_paper_completeness_m1.py -> 7 passed
regression: pytest tests/test_pipeline_contracts.py -> 12 passed
regression: pytest tests/test_vnext_manuscript.py -> 5 passed

#27:

pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
milestone: pytest tests/test_paper_completeness_m4.py -> 6 passed
regression: pytest tests/test_pipeline_contracts.py -> 12 passed
regression: pytest tests/test_vnext_manuscript.py -> 5 passed
combined: pytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py -> 13 passed

#25:

pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
milestone: pytest tests/test_latex_sanity_m2.py -> 8 passed
regression: pytest tests/test_pipeline_contracts.py -> 12 passed
regression: pytest tests/test_vnext_manuscript.py -> 5 passed

#28:

pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
milestone: pytest tests/test_vnext_manuscript.py -> 10 passed
regression: pytest tests/test_pipeline_contracts.py -> 12 passed
combined: pytest tests/test_latex_sanity_m2.py tests/test_vnext_manuscript.py -> 18 passed

#26:

pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
pre-change baseline: pytest tests/test_vnext_manuscript.py -> 10 passed
milestone: pytest tests/test_presentation_cpu_boundary_m5.py -> 3 passed
regression: pytest tests/test_pipeline_contracts.py -> 12 passed
regression: pytest tests/test_vnext_manuscript.py -> 10 passed
combined: pytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py tests/test_latex_sanity_m2.py tests/test_presentation_cpu_boundary_m5.py -> 24 passed

Non-goals / Skips

Did not modify or merge into master.
Did not run or fake [PR-5] M5 计算/展示依赖边界测试 #26 M5-4 GPU smoke.
Did not add synthetic or mocked compute backends.
Did not implement cross-file LaTeX \input{content/conclusion} traceability.
Did not modify contracts/pipeline.py or require_submission_ready().

Copilot

Pull request overview

This PR completes Epic #23’s “presentation evidence gates” work by (1) introducing an EvidenceLedger-based numeric significance gate, (2) adding deterministic LaTeX sanity checks (unresolved refs, scaffold leakage, cross-run identity, boilerplate repetition) and wiring them into the submission bundle render gate, and (3) adding CPU-only boundary tests to ensure presentation code paths don’t import GPU/execution dependencies.

Changes:

Add minimal build_evidence_ledger, numeric significance gating, and EvidenceLedger traceability/schema checks in agents/paper_completeness.py.
Expand latex_sanity_check with deterministic checks and pass state from the submission pipeline so cross-run identity gating can work.
Add milestone/regression tests and fixtures for M1/M2/M4/M5 gates, including submission-bundle blocking tests.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`agents/paper_completeness.py`	Adds EvidenceLedger builder, numeric significance gating, traceability/schema checks, and deterministic LaTeX sanity checks.
`agents/paper_orchestra_pipeline.py`	Passes `state` into `latex_sanity_check` to enable state-aware deterministic gating during bundle generation.
`tests/test_paper_completeness_m1.py`	Adds M1 tests for numeric significance gating and minimal EvidenceLedger builder contract.
`tests/test_paper_completeness_m4.py`	Adds M4 tests for EvidenceLedger schema validation and Abstract/Conclusion traceability checks.
`tests/test_latex_sanity_m2.py`	Adds M2 tests for deterministic LaTeX sanity rules (unresolved refs, placeholders, cross-run identity, repetition).
`tests/test_vnext_manuscript.py`	Adds submission-bundle integration tests ensuring render-gate blocks deterministic LaTeX violations and preserves existing gates.
`tests/test_presentation_cpu_boundary_m5.py`	Adds CPU-only boundary tests ensuring presentation modules don’t load GPU/execution dependencies and can render/materialize offline.
`tests/fixtures/*`	Adds fixtures for M1/M2/M4 deterministic checks and traceability scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _significance_alpha() -> float:
+    raw = os.environ.get("DEEPGRAPH_SIGNIFICANCE_ALPHA")
+    alpha = _numeric(raw)
+    if alpha is None or alpha <= 0:
+        return 0.05
+    return alpha


+    p_value = _numeric(packet.get("p_value"))
+    effect_size = _numeric(_first_present(packet.get("effect_size"), packet.get("effect_pct")))
+    metric = _text(_first_present(packet.get("metric_name"), summary.get("primary_metric"), summary.get("metric_name")))


+def _strip_latex_code_blocks(text: str) -> str:
+    stripped = re.sub(
+        r"\\begin\{(?:verbatim|lstlisting|minted)\}.*?\\end\{(?:verbatim|lstlisting|minted)\}",
+        "",
+        text or "",
+        flags=re.DOTALL | re.IGNORECASE,
+    )
+    stripped = re.sub(r"```.*?```", "", stripped, flags=re.DOTALL)
+    return stripped


+                    hits.append(
+                        _line_hit(
+                            "cross_run_identity",
+                            token,
+                            base_line + snippet.count("\n", 0, token_match.start()),
+                            snippet.splitlines()[0] if snippet.splitlines() else snippet,
+                        )
+                    )


Protocol-zero-0 added 6 commits June 4, 2026 20:40

Stabilize vnext manuscript regression fixture

60fcc09

Fix significance evidence ledger gate (#24)

062cf55

Add EvidenceLedger traceability checks (#27)

28b2c63

Add deterministic LaTeX sanity checks (#25)

9cfc1cb

Wire deterministic sanity checks into render gate (#28)

2c769f9

Add CPU presentation boundary tests (#26)

3846642

Copilot AI review requested due to automatic review settings June 4, 2026 21:01

Copilot started reviewing on behalf of Protocol-zero-0 June 4, 2026 21:01 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete Epic #23 presentation evidence gates#29

Complete Epic #23 presentation evidence gates#29
Protocol-zero-0 wants to merge 6 commits into
masterfrom
epic-23-evidence-ledger

Protocol-zero-0 commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Protocol-zero-0 commented Jun 4, 2026

Summary

Changed Files

Tests Run

Non-goals / Skips

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants