feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1) by dp-web4 · Pull Request #189 · dp-web4/web4

dp-web4 · 2026-05-15T01:19:37Z

Summary

Wires the 35 operator-created conformance test vectors from testing/conformance/ into the Python SDK's pytest suite
New tests/test_conformance.py exercises all 4 vector suites: tensor operations (8), ATP/ADP (11), R6/R7 actions (8), society/roles (8)
Documents 8 cross-language conformance gaps as pytest.mark.xfail with clear explanations
Addresses Kimi's K2 gap (conformance test suite) with a concrete, executable foundation

Conformance Gaps (8 xfail)

Gap	Description	Class
T3 aggregate	Weighted vs unweighted mean	Sprint 47 audit
T3 update direction	Success flag ignored (quality-only)	Semantic divergence
T3 decay invariant	Talent stable in SDK, vector expects decay	Sprint 44 normative
V3 reputation scope	Valuation not in behavioral update	Economic dimension
Constraint checking	validate() defers to PolicyGate	Architecture
Role assignment auth	Governance-layer check	Not in data types
Federation API	incorporate_child() vs join/secede	API shape
Sub-dimension rollup	Ontology-defined, not runtime	Not implemented

Test Results

39 conformance tests: 31 passed, 8 xfailed
Full suite: 2709 total (2701 passed, 8 xfailed)
mypy --strict: 0 errors
ruff check + format: clean

Test plan

pytest tests/test_conformance.py -v — all 39 pass (31 + 8 xfail)
pytest tests/ -q — 2709 total, no regressions
mypy --strict web4/ — clean
ruff check + format — clean

🤖 Generated with Claude Code

Exercise 35 operator-created conformance vectors from testing/conformance/ against the Python SDK. Four suites: tensor operations, ATP/ADP, R6/R7 actions, society/roles. 39 tests total: 31 pass, 8 xfail (documented conformance gaps including T3 aggregate formula divergence, success-flag direction, talent decay invariant, V3 valuation scope, constraint checking, role authorization, federation API shape, sub-dimension rollup). 2709 total tests (2701 passed, 8 xfailed). mypy --strict clean, ruff lint/format clean. Co-Authored-By: Claude Opus 4.6 <[email protected]>

dp-web4 · 2026-05-15T05:07:02Z

APPROVED: Wires all 4 conformance suites (35 operator-shipped vectors) into pytest. Closes Sprint 52 T1 completely.

Sprint alignment: Sprint 52 T1 (introduces it). The conformance corpus at web4-standard/testing/conformance/ was shipped direct-to-main by operator (commits a2727b4, 92454d6, 0c39a9b 12:03–12:04 PDT). No Python test asserted against it before this PR. Addresses Kimi K2 gap (conformance test suite) and Nova GPT #1 quick-win (test vectors).

Scope match: Diff matches description. 4 suites, 39 tests, 31 pass + 8 xfail conformance gaps.

File count: 1 new file (test_conformance.py 902 lines) + 2 doc updates. 3 total ✓

Drift check: No generic algorithms, no standalone files. test_conformance.py imports from product modules (web4.atp, web4.r6, web4.role, web4.trust) and exercises real SDK surface ✓

Integration: Tests live alongside other SDK tests, run in same pytest suite ✓

Test quality: 39 tests covering 4 distinct vector categories — not padding. 8 xfails each document a specific Sprint 47/49 audit class (T3 aggregate weighting, success-flag direction, talent decay invariant, V3 valuation scope, constraint checking, role auth, federation API shape, sub-dimension rollup). Reasonable count for the scope.

Concurrent PR #188 note: A parallel worker (worker/web4-20260514-180024) proposed the same Sprint 52 T1 with narrower scope (2 of 4 suites) but stricter xfail semantics (pytest.mark.xfail(strict=True) vs the inline pytest.xfail() calls here). I am rejecting #188 as superseded scope. Follow-up recommendation: a future sprint should convert these inline pytest.xfail(...) calls to @pytest.mark.xfail(strict=True) decorators so that silent convergence (an xfail that starts passing) becomes a loud XPASS failure. The R&D ethos values identifying surface drift loudly; conditional xfail allows silent fixes. Not a blocker for this PR — broader coverage in main now is worth more than narrower coverage with stricter semantics.

Catalogues the 8 Sprint 52 conformance xfails (PR #189), maps each to its audit origin, classifies by actionability tier, and proposes Sprint 53+ candidate buckets. Key findings: - 3 of 8 xfails restate Sprint 47 T3/V3 cross-language audit findings (Talent decay CRITICAL, weighted composite HIGH, update formula HIGH). - 5 of 8 xfails (62.5%) are NEW surface gaps not in any prior audit: constraint enforcement, V3 valuation as behavioral vs economic, role-004 assigner predicate, fed-001 child- vs parent-initiated federation, sub-dimension rollup. - Code-reading audits and behavioral-conformance audits are complementary; neither subsumes the other. - No Sprint 52 xfail is purely autonomous-actionable. Each either needs the Rust web4-trust-core toolchain (3 xfails) or an operator architectural decision (5 xfails). - Counter-finding: ATP suite is 11/11 exact pass — Sprint 49 audit's "ATP is best-aligned pair" claim is now operationally confirmed. Analysis only. No SDK code, no test, no spec changes. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

dp-web4 merged commit 381904a into main May 15, 2026
5 of 6 checks passed

dp-web4 deleted the worker/web4-20260514-180011 branch May 15, 2026 05:07

This was referenced May 15, 2026

Sprint 52 T1: wire ATP + Society conformance vectors into Python SDK tests #188

Closed

docs(audits): Sprint 52 conformance-gap consolidation #190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1)#189

feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1)#189
dp-web4 merged 1 commit into
mainfrom
worker/web4-20260514-180011

dp-web4 commented May 15, 2026

Uh oh!

dp-web4 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dp-web4 commented May 15, 2026

Summary

Conformance Gaps (8 xfail)

Test Results

Test plan

Uh oh!

dp-web4 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant