Skip to content

Complete Epic #23 presentation evidence gates#29

Open
Protocol-zero-0 wants to merge 6 commits into
masterfrom
epic-23-evidence-ledger
Open

Complete Epic #23 presentation evidence gates#29
Protocol-zero-0 wants to merge 6 commits into
masterfrom
epic-23-evidence-ledger

Conversation

@Protocol-zero-0
Copy link
Copy Markdown
Contributor

Summary

Completes Epic #23 across the requested issue order on convergence branch epic-23-evidence-ledger:

Includes one preliminary branch-only regression fixture stabilization commit so the required baseline tests are green on this convergence branch. master was not modified.

Changed Files

  • agents/paper_completeness.py
  • agents/paper_orchestra_pipeline.py
  • tests/test_paper_completeness_m1.py
  • tests/test_paper_completeness_m4.py
  • tests/test_latex_sanity_m2.py
  • tests/test_vnext_manuscript.py
  • tests/test_presentation_cpu_boundary_m5.py
  • tests/fixtures/*

Tests Run

Preflight stabilization:

  • pytest tests/test_pipeline_contracts.py -> 12 passed
  • pytest tests/test_vnext_manuscript.py -> 5 passed

#24:

  • pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
  • pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
  • milestone: pytest tests/test_paper_completeness_m1.py -> 7 passed
  • regression: pytest tests/test_pipeline_contracts.py -> 12 passed
  • regression: pytest tests/test_vnext_manuscript.py -> 5 passed

#27:

  • pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
  • pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
  • milestone: pytest tests/test_paper_completeness_m4.py -> 6 passed
  • regression: pytest tests/test_pipeline_contracts.py -> 12 passed
  • regression: pytest tests/test_vnext_manuscript.py -> 5 passed
  • combined: pytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py -> 13 passed

#25:

  • pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
  • pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
  • milestone: pytest tests/test_latex_sanity_m2.py -> 8 passed
  • regression: pytest tests/test_pipeline_contracts.py -> 12 passed
  • regression: pytest tests/test_vnext_manuscript.py -> 5 passed

#28:

  • pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
  • pre-change baseline: pytest tests/test_vnext_manuscript.py -> 5 passed
  • milestone: pytest tests/test_vnext_manuscript.py -> 10 passed
  • regression: pytest tests/test_pipeline_contracts.py -> 12 passed
  • combined: pytest tests/test_latex_sanity_m2.py tests/test_vnext_manuscript.py -> 18 passed

#26:

  • pre-change baseline: pytest tests/test_pipeline_contracts.py -> 12 passed
  • pre-change baseline: pytest tests/test_vnext_manuscript.py -> 10 passed
  • milestone: pytest tests/test_presentation_cpu_boundary_m5.py -> 3 passed
  • regression: pytest tests/test_pipeline_contracts.py -> 12 passed
  • regression: pytest tests/test_vnext_manuscript.py -> 10 passed
  • combined: pytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py tests/test_latex_sanity_m2.py tests/test_presentation_cpu_boundary_m5.py -> 24 passed

Non-goals / Skips

  • Did not modify or merge into master.
  • Did not run or fake [PR-5] M5 计算/展示依赖边界测试 #26 M5-4 GPU smoke.
  • Did not add synthetic or mocked compute backends.
  • Did not implement cross-file LaTeX \input{content/conclusion} traceability.
  • Did not modify contracts/pipeline.py or require_submission_ready().

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes Epic #23’s “presentation evidence gates” work by (1) introducing an EvidenceLedger-based numeric significance gate, (2) adding deterministic LaTeX sanity checks (unresolved refs, scaffold leakage, cross-run identity, boilerplate repetition) and wiring them into the submission bundle render gate, and (3) adding CPU-only boundary tests to ensure presentation code paths don’t import GPU/execution dependencies.

Changes:

  • Add minimal build_evidence_ledger, numeric significance gating, and EvidenceLedger traceability/schema checks in agents/paper_completeness.py.
  • Expand latex_sanity_check with deterministic checks and pass state from the submission pipeline so cross-run identity gating can work.
  • Add milestone/regression tests and fixtures for M1/M2/M4/M5 gates, including submission-bundle blocking tests.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
agents/paper_completeness.py Adds EvidenceLedger builder, numeric significance gating, traceability/schema checks, and deterministic LaTeX sanity checks.
agents/paper_orchestra_pipeline.py Passes state into latex_sanity_check to enable state-aware deterministic gating during bundle generation.
tests/test_paper_completeness_m1.py Adds M1 tests for numeric significance gating and minimal EvidenceLedger builder contract.
tests/test_paper_completeness_m4.py Adds M4 tests for EvidenceLedger schema validation and Abstract/Conclusion traceability checks.
tests/test_latex_sanity_m2.py Adds M2 tests for deterministic LaTeX sanity rules (unresolved refs, placeholders, cross-run identity, repetition).
tests/test_vnext_manuscript.py Adds submission-bundle integration tests ensuring render-gate blocks deterministic LaTeX violations and preserves existing gates.
tests/test_presentation_cpu_boundary_m5.py Adds CPU-only boundary tests ensuring presentation modules don’t load GPU/execution dependencies and can render/materialize offline.
tests/fixtures/* Adds fixtures for M1/M2/M4 deterministic checks and traceability scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +171
def _significance_alpha() -> float:
raw = os.environ.get("DEEPGRAPH_SIGNIFICANCE_ALPHA")
alpha = _numeric(raw)
if alpha is None or alpha <= 0:
return 0.05
return alpha
Comment on lines +184 to +186
p_value = _numeric(packet.get("p_value"))
effect_size = _numeric(_first_present(packet.get("effect_size"), packet.get("effect_pct")))
metric = _text(_first_present(packet.get("metric_name"), summary.get("primary_metric"), summary.get("metric_name")))
Comment on lines +1064 to +1072
def _strip_latex_code_blocks(text: str) -> str:
stripped = re.sub(
r"\\begin\{(?:verbatim|lstlisting|minted)\}.*?\\end\{(?:verbatim|lstlisting|minted)\}",
"",
text or "",
flags=re.DOTALL | re.IGNORECASE,
)
stripped = re.sub(r"```.*?```", "", stripped, flags=re.DOTALL)
return stripped
Comment on lines +1109 to +1116
hits.append(
_line_hit(
"cross_run_identity",
token,
base_line + snippet.count("\n", 0, token_match.start()),
snippet.splitlines()[0] if snippet.splitlines() else snippet,
)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants