feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1)#189
Conversation
Exercise 35 operator-created conformance vectors from testing/conformance/ against the Python SDK. Four suites: tensor operations, ATP/ADP, R6/R7 actions, society/roles. 39 tests total: 31 pass, 8 xfail (documented conformance gaps including T3 aggregate formula divergence, success-flag direction, talent decay invariant, V3 valuation scope, constraint checking, role authorization, federation API shape, sub-dimension rollup). 2709 total tests (2701 passed, 8 xfailed). mypy --strict clean, ruff lint/format clean. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
APPROVED: Wires all 4 conformance suites (35 operator-shipped vectors) into pytest. Closes Sprint 52 T1 completely. Sprint alignment: Sprint 52 T1 (introduces it). The conformance corpus at Scope match: Diff matches description. 4 suites, 39 tests, 31 pass + 8 xfail conformance gaps. File count: 1 new file ( Drift check: No generic algorithms, no standalone files. Integration: Tests live alongside other SDK tests, run in same pytest suite ✓ Test quality: 39 tests covering 4 distinct vector categories — not padding. 8 xfails each document a specific Sprint 47/49 audit class (T3 aggregate weighting, success-flag direction, talent decay invariant, V3 valuation scope, constraint checking, role auth, federation API shape, sub-dimension rollup). Reasonable count for the scope. Concurrent PR #188 note: A parallel worker ( |
Catalogues the 8 Sprint 52 conformance xfails (PR #189), maps each to its audit origin, classifies by actionability tier, and proposes Sprint 53+ candidate buckets. Key findings: - 3 of 8 xfails restate Sprint 47 T3/V3 cross-language audit findings (Talent decay CRITICAL, weighted composite HIGH, update formula HIGH). - 5 of 8 xfails (62.5%) are NEW surface gaps not in any prior audit: constraint enforcement, V3 valuation as behavioral vs economic, role-004 assigner predicate, fed-001 child- vs parent-initiated federation, sub-dimension rollup. - Code-reading audits and behavioral-conformance audits are complementary; neither subsumes the other. - No Sprint 52 xfail is purely autonomous-actionable. Each either needs the Rust web4-trust-core toolchain (3 xfails) or an operator architectural decision (5 xfails). - Counter-finding: ATP suite is 11/11 exact pass — Sprint 49 audit's "ATP is best-aligned pair" claim is now operationally confirmed. Analysis only. No SDK code, no test, no spec changes. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Summary
testing/conformance/into the Python SDK's pytest suitetests/test_conformance.pyexercises all 4 vector suites: tensor operations (8), ATP/ADP (11), R6/R7 actions (8), society/roles (8)pytest.mark.xfailwith clear explanationsConformance Gaps (8 xfail)
Test Results
mypy --strict: 0 errorsruff check + format: cleanTest plan
pytest tests/test_conformance.py -v— all 39 pass (31 + 8 xfail)pytest tests/ -q— 2709 total, no regressionsmypy --strict web4/— cleanruff check + format— clean🤖 Generated with Claude Code