Skip to content

feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1)#189

Merged
dp-web4 merged 1 commit into
mainfrom
worker/web4-20260514-180011
May 15, 2026
Merged

feat(sdk): wire conformance test vectors into pytest (Sprint 52 T1)#189
dp-web4 merged 1 commit into
mainfrom
worker/web4-20260514-180011

Conversation

@dp-web4
Copy link
Copy Markdown
Owner

@dp-web4 dp-web4 commented May 15, 2026

Summary

  • Wires the 35 operator-created conformance test vectors from testing/conformance/ into the Python SDK's pytest suite
  • New tests/test_conformance.py exercises all 4 vector suites: tensor operations (8), ATP/ADP (11), R6/R7 actions (8), society/roles (8)
  • Documents 8 cross-language conformance gaps as pytest.mark.xfail with clear explanations
  • Addresses Kimi's K2 gap (conformance test suite) with a concrete, executable foundation

Conformance Gaps (8 xfail)

Gap Description Class
T3 aggregate Weighted vs unweighted mean Sprint 47 audit
T3 update direction Success flag ignored (quality-only) Semantic divergence
T3 decay invariant Talent stable in SDK, vector expects decay Sprint 44 normative
V3 reputation scope Valuation not in behavioral update Economic dimension
Constraint checking validate() defers to PolicyGate Architecture
Role assignment auth Governance-layer check Not in data types
Federation API incorporate_child() vs join/secede API shape
Sub-dimension rollup Ontology-defined, not runtime Not implemented

Test Results

  • 39 conformance tests: 31 passed, 8 xfailed
  • Full suite: 2709 total (2701 passed, 8 xfailed)
  • mypy --strict: 0 errors
  • ruff check + format: clean

Test plan

  • pytest tests/test_conformance.py -v — all 39 pass (31 + 8 xfail)
  • pytest tests/ -q — 2709 total, no regressions
  • mypy --strict web4/ — clean
  • ruff check + format — clean

🤖 Generated with Claude Code

Exercise 35 operator-created conformance vectors from
testing/conformance/ against the Python SDK. Four suites:
tensor operations, ATP/ADP, R6/R7 actions, society/roles.

39 tests total: 31 pass, 8 xfail (documented conformance gaps
including T3 aggregate formula divergence, success-flag direction,
talent decay invariant, V3 valuation scope, constraint checking,
role authorization, federation API shape, sub-dimension rollup).

2709 total tests (2701 passed, 8 xfailed). mypy --strict clean,
ruff lint/format clean.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dp-web4
Copy link
Copy Markdown
Owner Author

dp-web4 commented May 15, 2026

APPROVED: Wires all 4 conformance suites (35 operator-shipped vectors) into pytest. Closes Sprint 52 T1 completely.

Sprint alignment: Sprint 52 T1 (introduces it). The conformance corpus at web4-standard/testing/conformance/ was shipped direct-to-main by operator (commits a2727b4, 92454d6, 0c39a9b 12:03–12:04 PDT). No Python test asserted against it before this PR. Addresses Kimi K2 gap (conformance test suite) and Nova GPT #1 quick-win (test vectors).

Scope match: Diff matches description. 4 suites, 39 tests, 31 pass + 8 xfail conformance gaps.

File count: 1 new file (test_conformance.py 902 lines) + 2 doc updates. 3 total ✓

Drift check: No generic algorithms, no standalone files. test_conformance.py imports from product modules (web4.atp, web4.r6, web4.role, web4.trust) and exercises real SDK surface ✓

Integration: Tests live alongside other SDK tests, run in same pytest suite ✓

Test quality: 39 tests covering 4 distinct vector categories — not padding. 8 xfails each document a specific Sprint 47/49 audit class (T3 aggregate weighting, success-flag direction, talent decay invariant, V3 valuation scope, constraint checking, role auth, federation API shape, sub-dimension rollup). Reasonable count for the scope.

Concurrent PR #188 note: A parallel worker (worker/web4-20260514-180024) proposed the same Sprint 52 T1 with narrower scope (2 of 4 suites) but stricter xfail semantics (pytest.mark.xfail(strict=True) vs the inline pytest.xfail() calls here). I am rejecting #188 as superseded scope. Follow-up recommendation: a future sprint should convert these inline pytest.xfail(...) calls to @pytest.mark.xfail(strict=True) decorators so that silent convergence (an xfail that starts passing) becomes a loud XPASS failure. The R&D ethos values identifying surface drift loudly; conditional xfail allows silent fixes. Not a blocker for this PR — broader coverage in main now is worth more than narrower coverage with stricter semantics.

@dp-web4 dp-web4 merged commit 381904a into main May 15, 2026
5 of 6 checks passed
@dp-web4 dp-web4 deleted the worker/web4-20260514-180011 branch May 15, 2026 05:07
dp-web4 added a commit that referenced this pull request May 15, 2026
Catalogues the 8 Sprint 52 conformance xfails (PR #189), maps each to
its audit origin, classifies by actionability tier, and proposes Sprint
53+ candidate buckets.

Key findings:
- 3 of 8 xfails restate Sprint 47 T3/V3 cross-language audit findings
  (Talent decay CRITICAL, weighted composite HIGH, update formula HIGH).
- 5 of 8 xfails (62.5%) are NEW surface gaps not in any prior audit:
  constraint enforcement, V3 valuation as behavioral vs economic,
  role-004 assigner predicate, fed-001 child- vs parent-initiated
  federation, sub-dimension rollup.
- Code-reading audits and behavioral-conformance audits are
  complementary; neither subsumes the other.
- No Sprint 52 xfail is purely autonomous-actionable. Each either
  needs the Rust web4-trust-core toolchain (3 xfails) or an operator
  architectural decision (5 xfails).
- Counter-finding: ATP suite is 11/11 exact pass — Sprint 49 audit's
  "ATP is best-aligned pair" claim is now operationally confirmed.

Analysis only. No SDK code, no test, no spec changes.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant