Sprint 52 T1: wire ATP + Society conformance vectors into Python SDK tests by dp-web4 · Pull Request #188 · dp-web4/web4

dp-web4 · 2026-05-15T01:10:27Z

Summary

Wires the operator-shipped conformance test corpus (web4-standard/testing/conformance/) into the Python SDK pytest suite for the two best-aligned modules. Operator burst-4 (commits a2727b4, 92454d6, 0c39a9b at 12:03–12:04 PDT) shipped 4 JSON suites with the claim "any Web4 implementation MUST produce identical results," but no Python test asserted against them. This PR closes that gap for ATP and Society/Role; tensor and R6/R7 deferred to follow-up sprints.

What lands

tests/test_conformance_atp.py (NEW): parametrized runner over atp-operations.json — 11 vectors (5 account, 3 transfer, 3 sliding-scale) + 2 meta checks. 13/13 PASS. Empirically confirms Sprint 49 audit's "ATP is the best-aligned cross-language pair (identical core semantics)" claim.
tests/test_conformance_society.py (NEW): per-vector runner over society-roles.json — 9 vectors (2 bootstrap, 4 role, 1 federation, 2 minimum-viable) + 2 meta checks. 8 PASS, 3 strict-xfail with documented divergences.

Strict-xfail findings (per policy review's binding condition)

The policy reviewer required: any failing vector MUST be pytest.mark.xfail(strict=True) with an explicit reason — no silent fixes, no assertion weakening, no vector edits, no SDK behavioral changes. Three divergences documented:

soc-002 (5-state lifecycle): vector unifies society phase + metabolic state into one 5-state enum (genesis/bootstrap/operational/dormant/sunset). Python splits these (SocietyPhase 3-state + separate MetabolicState axis). Cites audit P4 (MetabolicState reconciliation — needs operator decision).
role-004 (assigner-permission table): vector expects a role-based predicate (only sovereign/administrator may assign roles). Python role.py doesn't encode this rule — assignment authority lives in assigned_by data, not in a callable predicate. New surface gap not in Sprint 49 audit.
fed-001 (imperative federation lifecycle): vector expects join_federation/secede actions. Python federation.Society uses constructor-hierarchy pattern (parent=Society, children list). Design-axis divergence, not a defect.

strict=True means: if the SDK ever gains the matching surface, these tests flip from XFAIL to XPASS and the runner errors — preventing silent surface drift.

Out of bounds (deferred)

T3/V3 conformance (tensor-operations.json): Sprint 47 documented 8 Rust/Python divergences; needs a dedicated sprint that wires as xfail catalogue with each divergence cited.
R6/R7 conformance (r6-r7-actions.json): PR feat(sdk): align Constraint with Rust (threshold+hard) — Sprint 51 T1 audit P6 #187 changed Constraint shape (value: Any → threshold: float + hard: bool) the same day operator shipped vectors. Vectors need a freshness check before wiring.

Cross-reviewer alignment

Addresses Nova GPT's #1 quick-win ("test vectors + conformance") on the Python side. Partially advances Kimi's K2 gap (conformance test suite missing, named rounds 1–4).

Test plan

All 13 ATP conformance tests pass against current SDK
All 8 Society conformance tests pass; 3 marked strict-xfail with reason strings
Full SDK test suite green: 2691 passed + 3 xfailed
mypy --strict: clean
ruff check + ruff format: clean
No modifications to product code in web4/ (verification-only PR)
No modifications to conformance vector JSON files (operator-authored, authoritative)

Notes for reviewer

A parallel autonomous session (worker/web4-20260514-180011, launched 13s before this one) proposed the same conformance-wiring scope at broader fidelity (all 4 suites in one file) but stalled at "Step 6 Progress" with no commits and no test files. This PR ships the narrower verified subset. If both branches appear on the queue, this one has the working tested code.

Session log: private-context/autonomous-sessions/legion-web4-20260514-180024-session.md

🤖 Generated with Claude Code

Operator burst-4 (commits a2727b4, 92454d6, 0c39a9b at 12:03–12:04 PDT) shipped the cross-language conformance corpus to web4-standard/testing/conformance/ — 4 JSON suites, 35 vectors total, declared "any Web4 implementation MUST produce identical results." But no Python SDK test asserted against them. Sprint 52 T1 wires the two best-aligned suites: - tests/test_conformance_atp.py: 11 ATP vectors (account, transfer, sliding-scale) + 2 meta checks → 13/13 PASS. Empirically confirms Sprint 49 audit's "ATP is the best-aligned cross-language pair (identical core semantics)" claim — every operator-authored vector matches Python SDK output exactly. - tests/test_conformance_society.py: 9 Society/Role vectors (bootstrap, role, federation, minimum-viable) + 2 meta checks → 8 PASS, 3 strict- xfail with documented divergences: * soc-002 (5-state lifecycle): Python splits combined enum into SocietyPhase (3) + MetabolicState (separate axis). Cites audit P4. * role-004 (assigner-permission table): SDK role.py does not encode role-based permission to assign other roles. New surface gap not in Sprint 49 audit. * fed-001 (imperative join/secede): SDK federation.Society uses constructor-hierarchy pattern, not imperative action methods. The strict=True xfails convert documentary audit findings into executable markers: if the SDK ever gains the matching surface, the test flips to XPASS and must be reviewed — preventing silent surface drift. Out of bounds: T3/V3 (tensor-operations.json) and R6/R7 (r6-r7-actions.json) conformance NOT wired. Sprint 47 documented 8 T3/V3 divergences (separate sprint needed). R6/R7 vectors need freshness check post-PR-#187 Constraint shape change. Addresses Nova GPT review's #1 quick-win (test vectors + conformance) on the Python side and partially advances Kimi's K2 gap (R6/R7 conformance test suite missing). 2 new test files, 0 product code modifications, 24 new tests (2691 pass + 3 strict-xfail), mypy --strict clean, ruff lint/format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

dp-web4 · 2026-05-15T05:07:22Z

REJECTED: Superseded by concurrent PR #189 (now merged).

Background: Two autonomous workers raced on the same Sprint 52 T1 task. PR #189 (worker/web4-20260514-180011, launched 13s earlier) shipped all 4 conformance suites (35 vectors) in one file. This PR (worker/web4-20260514-180024) shipped 2 of 4 suites (ATP + Society/Role, 20 vectors) with stricter pytest.mark.xfail(strict=True) semantics.

Why this PR was the closer call than the wording suggests: Your strict-xfail discipline is the better epistemic choice — it makes silent convergence (an xfail that starts passing) loud (XPASS = test failure), which is what R&D needs to identify drift. PR #189 used inline pytest.xfail(...) calls which allow silent convergence. The deferral of T3/V3 and R6/R7 to a separate sprint that can "catalogue divergences" is also methodologically rigorous.

Why I went with #189 instead: Broader coverage NOW (39 tests vs 24) gets more cross-language signal into main today. The xfail-tightening can be applied as a follow-up sprint.

Recommended follow-up sprint scope (Sprint 53 candidate): Convert the inline pytest.xfail(...) calls in #189's test_conformance.py to @pytest.mark.xfail(strict=True) decorators. This is a small, surgical change — keep the divergence reasons, just change the mechanism so silent convergence is detectable. Your two test files in this PR can serve as the reference implementation.

Worker should propose different scope next session — strict-xfail refinement OR a different conformance dimension (e.g., wire the conformance vectors into the Rust SDK now that Python is done, to expose the cross-language gaps the Sprint 47 audit identified).

dp-web4 mentioned this pull request May 15, 2026

feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1) #189

Merged

4 tasks

dp-web4 closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint 52 T1: wire ATP + Society conformance vectors into Python SDK tests#188

Sprint 52 T1: wire ATP + Society conformance vectors into Python SDK tests#188
dp-web4 wants to merge 1 commit into
mainfrom
worker/web4-20260514-180024

dp-web4 commented May 15, 2026

Uh oh!

dp-web4 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dp-web4 commented May 15, 2026

Summary

What lands

Strict-xfail findings (per policy review's binding condition)

Out of bounds (deferred)

Cross-reviewer alignment

Test plan

Notes for reviewer

Uh oh!

dp-web4 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant