Complete Epic #23 presentation evidence gates#29
Open
Protocol-zero-0 wants to merge 6 commits into
Open
Conversation
This was referenced Jun 4, 2026
There was a problem hiding this comment.
Pull request overview
This PR completes Epic #23’s “presentation evidence gates” work by (1) introducing an EvidenceLedger-based numeric significance gate, (2) adding deterministic LaTeX sanity checks (unresolved refs, scaffold leakage, cross-run identity, boilerplate repetition) and wiring them into the submission bundle render gate, and (3) adding CPU-only boundary tests to ensure presentation code paths don’t import GPU/execution dependencies.
Changes:
- Add minimal
build_evidence_ledger, numeric significance gating, and EvidenceLedger traceability/schema checks inagents/paper_completeness.py. - Expand
latex_sanity_checkwith deterministic checks and passstatefrom the submission pipeline so cross-run identity gating can work. - Add milestone/regression tests and fixtures for M1/M2/M4/M5 gates, including submission-bundle blocking tests.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
agents/paper_completeness.py |
Adds EvidenceLedger builder, numeric significance gating, traceability/schema checks, and deterministic LaTeX sanity checks. |
agents/paper_orchestra_pipeline.py |
Passes state into latex_sanity_check to enable state-aware deterministic gating during bundle generation. |
tests/test_paper_completeness_m1.py |
Adds M1 tests for numeric significance gating and minimal EvidenceLedger builder contract. |
tests/test_paper_completeness_m4.py |
Adds M4 tests for EvidenceLedger schema validation and Abstract/Conclusion traceability checks. |
tests/test_latex_sanity_m2.py |
Adds M2 tests for deterministic LaTeX sanity rules (unresolved refs, placeholders, cross-run identity, repetition). |
tests/test_vnext_manuscript.py |
Adds submission-bundle integration tests ensuring render-gate blocks deterministic LaTeX violations and preserves existing gates. |
tests/test_presentation_cpu_boundary_m5.py |
Adds CPU-only boundary tests ensuring presentation modules don’t load GPU/execution dependencies and can render/materialize offline. |
tests/fixtures/* |
Adds fixtures for M1/M2/M4 deterministic checks and traceability scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+166
to
+171
| def _significance_alpha() -> float: | ||
| raw = os.environ.get("DEEPGRAPH_SIGNIFICANCE_ALPHA") | ||
| alpha = _numeric(raw) | ||
| if alpha is None or alpha <= 0: | ||
| return 0.05 | ||
| return alpha |
Comment on lines
+184
to
+186
| p_value = _numeric(packet.get("p_value")) | ||
| effect_size = _numeric(_first_present(packet.get("effect_size"), packet.get("effect_pct"))) | ||
| metric = _text(_first_present(packet.get("metric_name"), summary.get("primary_metric"), summary.get("metric_name"))) |
Comment on lines
+1064
to
+1072
| def _strip_latex_code_blocks(text: str) -> str: | ||
| stripped = re.sub( | ||
| r"\\begin\{(?:verbatim|lstlisting|minted)\}.*?\\end\{(?:verbatim|lstlisting|minted)\}", | ||
| "", | ||
| text or "", | ||
| flags=re.DOTALL | re.IGNORECASE, | ||
| ) | ||
| stripped = re.sub(r"```.*?```", "", stripped, flags=re.DOTALL) | ||
| return stripped |
Comment on lines
+1109
to
+1116
| hits.append( | ||
| _line_hit( | ||
| "cross_run_identity", | ||
| token, | ||
| base_line + snippet.count("\n", 0, token_match.start()), | ||
| snippet.splitlines()[0] if snippet.splitlines() else snippet, | ||
| ) | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes Epic #23 across the requested issue order on convergence branch
epic-23-evidence-ledger:Includes one preliminary branch-only regression fixture stabilization commit so the required baseline tests are green on this convergence branch.
masterwas not modified.Changed Files
agents/paper_completeness.pyagents/paper_orchestra_pipeline.pytests/test_paper_completeness_m1.pytests/test_paper_completeness_m4.pytests/test_latex_sanity_m2.pytests/test_vnext_manuscript.pytests/test_presentation_cpu_boundary_m5.pytests/fixtures/*Tests Run
Preflight stabilization:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#24:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m1.py->7 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#27:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m4.py->6 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py->13 passed#25:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_latex_sanity_m2.py->8 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passed#28:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->5 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_latex_sanity_m2.py tests/test_vnext_manuscript.py->18 passed#26:
pytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_presentation_cpu_boundary_m5.py->3 passedpytest tests/test_pipeline_contracts.py->12 passedpytest tests/test_vnext_manuscript.py->10 passedpytest tests/test_paper_completeness_m1.py tests/test_paper_completeness_m4.py tests/test_latex_sanity_m2.py tests/test_presentation_cpu_boundary_m5.py->24 passedNon-goals / Skips
master.\input{content/conclusion}traceability.contracts/pipeline.pyorrequire_submission_ready().