Comprehensive quality assurance for research outputs
Location: infrastructure/validation/
Quick Reference: Modules Guide | API Reference
- PDF Validation: Structural integrity checks, xref table verification, trailer validation, text extraction
- Markdown Validation: Heading hierarchy, image paths, cross-references, math expression checks
- Output Integrity: File integrity, data consistency, academic standards verification
- Figure Validation: Figure registry completeness and correctness
- Repository Scanning: Accuracy and completeness audits across the entire repository
- No-Mock Enforcement: Ensures test suites comply with the no-mocks policy
- Issue Categorization: Severity assignment, false-positive filtering, prioritization
- CLI Interface: Unified command-line entry point for all validation tasks
from pathlib import Path
from infrastructure.validation import validate_pdf_rendering, extract_text_from_pdf, scan_for_issues
pdf_path = Path("output/template_code_project/pdf/template_code_project_combined.pdf")
# Validate structural integrity of a rendered PDF
results = validate_pdf_rendering(pdf_path)
# Extract text content for downstream analysis
text = extract_text_from_pdf(pdf_path)
# Scan extracted text for rendering issues
issues = scan_for_issues(text)from pathlib import Path
from infrastructure.validation import discover_markdown_files, validate_markdown, validate_images, validate_refs, validate_math
from infrastructure.validation.content.markdown_validator import collect_symbols
repo_root = Path(".")
manuscript_dir = repo_root / "projects" / "templates" / "template_code_project" / "manuscript"
# Validate all markdown files in a manuscript directory
problems, exit_code = validate_markdown(manuscript_dir, repo_root)
# Individual checks
md_files = [str(p) for p in discover_markdown_files(manuscript_dir, scope="tree")]
labels, anchors = collect_symbols(md_files)
image_issues = validate_images(md_files, repo_root)
ref_issues = validate_refs(md_files, repo_root, labels, anchors)
math_issues = validate_math(md_files, repo_root)from pathlib import Path
from infrastructure.validation import (
verify_output_integrity, verify_file_integrity,
verify_cross_references, verify_data_consistency,
verify_academic_standards, generate_integrity_report,
)
output_dir = Path("output/template_code_project")
manuscript_dir = Path("projects/templates/template_code_project/manuscript")
markdown_files = sorted(manuscript_dir.glob("*.md"))
# Full integrity check across all output artifacts
report = verify_output_integrity(output_dir)
# Targeted checks (each takes a list of paths)
verify_file_integrity([output_dir / "pdf" / "template_code_project_combined.pdf"])
verify_cross_references(markdown_files)
verify_data_consistency(sorted((output_dir / "data").glob("*")))
verify_academic_standards(markdown_files)
# Generate a structured integrity report from the IntegrityReport
integrity_report = generate_integrity_report(report)from pathlib import Path
from infrastructure.validation import validate_figure_registry
success, issues = validate_figure_registry(
Path("projects/templates/template_code_project/output/figures/figure_registry.json"),
Path("projects/templates/template_code_project/manuscript"),
)Both registry shapes are accepted:
- Dict shape —
{"fig:label": {...}, ...}(emitted byFigureManager). - List shape —
[{"label": "fig:label", ...}, ...](emitted by project-side scripts, e.g.cognitive_case_diagrams/scripts/generate_diagrams.py).
from pathlib import Path
from infrastructure.validation import (
run_comprehensive_audit, generate_audit_report,
categorize_by_type, assign_severity,
filter_false_positives, prioritize_issues, generate_issue_summary,
)
project_path = Path("projects/templates/template_code_project")
# Run all validation checks in a single pass
audit_results = run_comprehensive_audit(project_path)
report = generate_audit_report(audit_results)
# Process and triage discovered issues
categorized = categorize_by_type(audit_results)
filtered = filter_false_positives(categorized)
prioritized = prioritize_issues(filtered)
summary = generate_issue_summary(prioritized)from pathlib import Path
from infrastructure.validation import validate_output_structure, validate_copied_outputs
validate_output_structure(Path("output/template_code_project"))
validate_copied_outputs(
source_dir=Path("projects/templates/template_code_project/output"),
dest_dir=Path("output/template_code_project"),
)from pathlib import Path
from infrastructure.validation import LinkValidator
validator = LinkValidator()
results = validator.check_all(Path("docs"))# Validate PDFs in an output directory
uv run python -m infrastructure.validation.cli pdf output/{project}/pdf/
# Validate markdown manuscript files
uv run python -m infrastructure.validation.cli markdown projects/{name}/manuscript/
# Both commands support the unified CLI entry point
uv run python -m infrastructure.validation.cli --helpThe validation package is organized into four logical subpackage groups:
| Group | Modules | Purpose |
|---|---|---|
| Content | pdf_validator, markdown_validator, figure_validator |
File-type-specific validation |
| Integrity | integrity, link_validator, check_links |
Cross-reference and link verification |
| Repository | scanner, audit_orchestrator, issue_categorizer |
Project-wide scanning and triage |
| Output | validator |
Pipeline output structure checks |
All public functions are re-exported from infrastructure.validation for convenient single-import access.
- Rendering Module -- generates the outputs that validation checks
- Reporting Module -- aggregates validation results into pipeline reports
- Testing Strategy -- no-mocks policy and coverage requirements
- PDF Validation Guide -- detailed PDF validation reference