dp-web4 · dp-web4 · May 15, 2026 · May 15, 2026
diff --git a/docs/audits/sprint-52-conformance-gap-consolidation-2026-05-15.md b/docs/audits/sprint-52-conformance-gap-consolidation-2026-05-15.md
@@ -0,0 +1,229 @@
+# Sprint 52 Conformance Gap Consolidation
+
+**Date**: 2026-05-15
+**Sprint context**: Follow-up to Sprint 52 T1 (PR #189, merged 2026-05-15T05:07Z)
+**Scope**: Catalogue the 8 conformance xfails wired by Sprint 52, map each to its
+audit origin, classify by actionability tier, and propose Sprint 53+ candidates.
+**Source**: `web4-standard/implementation/sdk/tests/test_conformance.py` and
+`web4-standard/testing/conformance/{tensor,atp,r6-r7,society}-*.json`.
+
+## Why this memo exists
+
+Sprint 52 wired the operator-shipped conformance vectors into pytest. Of the
+39 tests, **8 land as `pytest.mark.xfail` (strict) with documented divergence
+reasons**. Each xfail is a real surface gap, but their nature is heterogeneous:
+some restate findings from prior code-reading audits (Sprint 47, Sprint 49),
+some surface gaps that no prior audit caught, and some are architectural design
+splits where it isn't yet clear whether the SDK or the vector should change.
+
+A "fix the conformance xfails" sprint would be a category error: the xfails
+are not a uniform fix queue. This memo separates the queue.
+
+## The 8 xfails
+
+| # | Test | Suite | Failure source |
+|---|------|-------|----------------|
+| 1 | `t3-002` weighted vs unweighted aggregate | tensor-operations | `pytest.xfail()` at `test_conformance.py:115` |
+| 2 | `t3-004` update direction (quality vs success flag) | tensor-operations | `pytest.xfail()` at `test_conformance.py:174` |
+| 3 | `t3-006` talent decay vs talent invariant | tensor-operations | `pytest.xfail()` at `test_conformance.py:213` |
+| 4 | `r6-val-004` witness quorum constraint enforcement | r6-r7-actions | `@pytest.mark.xfail` at `test_conformance.py:526` |
+| 5 | `r7-rep-001` V3 valuation in reputation delta | r6-r7-actions | `pytest.xfail()` at `test_conformance.py:575` |
+| 6 | `role-004` assigner authorization predicate | society-roles | `@pytest.mark.xfail` at `test_conformance.py:807` |
+| 7 | `fed-001` join/secede vs incorporate_child | society-roles | `@pytest.mark.xfail` at `test_conformance.py:826` |
+| 8 | `sub-001` T3 sub-dimension rollup | tensor-operations | `@pytest.mark.xfail` at `test_conformance.py:887` |
+
+## Audit-origin mapping
+
+### Known-class (3 of 8): Sprint 47 T3/V3 cross-language audit
+
+Sprint 47 documented 8 divergences between the Rust web4-trust-core
+implementation and the spec/Python SDK. Of those 8, three correspond directly
+to Sprint 52 xfails. In each case the Python SDK matches the spec; the Rust
+SDK and the conformance vectors (authored alongside Rust) diverge.
+
+| Sprint 52 xfail | Sprint 47 audit finding | Severity in Sprint 47 |
+|-----------------|-------------------------|-----------------------|
+| #1 t3-002 weighted vs unweighted | Finding #2 (T3 composite: unweighted) | HIGH |
+| #2 t3-004 update direction | Finding #4 (T3 update: wrong formula) | HIGH |
+| #3 t3-006 talent decay | Finding #1 (Talent decay applied — spec violation) | CRITICAL |
+
+These three xfails do not surface new information. They are conformance-runner
+restatements of the Sprint 47 findings. The relevant fact is **which side aligns
+to the canonical spec**: the Python SDK matches the spec's normative invariants
+(weighted composites, quality-based update, no Talent decay per Sprint 44).
+The vectors were authored against the Rust implementation's behavior, which
+diverges from spec. Closing these xfails by changing the Python SDK would
+move the SDK away from canonical alignment.
+
+### New surface gaps (5 of 8): not in any prior audit
+
+The remaining five xfails surface gaps that neither Sprint 47 nor Sprint 49
+caught. This is the more informative half of the consolidation.
+
+| Sprint 52 xfail | Why this isn't in Sprint 47/49 audits |
+|-----------------|---------------------------------------|
+| #4 r6-val-004 constraint enforcement | Sprint 49 #10 covered Constraint **shape** (closed by Sprint 51 PR #187). Constraint **enforcement at validate-time** was not examined — the audit treated Constraint as a data type, not an enforcement point. |
+| #5 r7-rep-001 V3 valuation in reputation | Neither audit examined which V3 dimensions are updated by `compute_reputation()`. The SDK splits V3 into a behavioral subset (veracity, validity) and an economic dimension (valuation, updated via ATP settlement). The vector treats all three as behavioral. This is a **conceptual split** the audits missed. |
+| #6 role-004 assigner authorization | Sprint 49 mapped Society/Role data types; it did not model an "is_allowed_to_assign_roles(role) → bool" predicate. The vector introduces one. **Not flagged anywhere in the audits.** |
+| #7 fed-001 join/secede vs incorporate_child | Sprint 49 #5 noted "composite architecture difference" but did not enumerate the federation API. The conformance vector concretizes the difference: child-initiated `join()`/`secede()` vs parent-initiated `incorporate_child()`. |
+| #8 sub-001 sub-dimension rollup | The ontology defines `web4:subDimensionOf` (T3/V3 ontology TTL). Neither audit examined whether the runtime implements rollup. The vector exposes this as an ontology-vs-runtime gap. |
+
+**Headline finding**: 5 of 8 xfails (62.5%) are new surface gaps not surfaced
+by either code-reading audit. The conformance-vector instrument finds gaps that
+code-reading audits don't, because vectors encode **behavioral expectations**
+while audits compare **structural shapes**. The two instruments are complementary;
+neither subsumes the other.
+
+## Actionability tier classification
+
+Three tiers, with one note: no Sprint 52 xfail is purely autonomous-actionable.
+Each either depends on external toolchain (Rust web4-trust-core) or on operator
+architectural decisions. This means a Sprint 53 framed as "fix the xfails"
+would block on inputs the current track cannot provide.
+
+### Tier A — CROSS-LANGUAGE-EXTERNAL-TOOLCHAIN (3 xfails)
+
+Sprint 47's audit recommendations live in the Rust web4-trust-core repo, not
+this one. Until that toolchain is exercised, these xfails persist by design.
+
+- **#1 t3-002 weighted composite** — Fix Rust to use weighted average (Sprint 47 Recommendation 2).
+- **#2 t3-004 update formula** — Fix Rust to adopt `0.02 * (quality - 0.5)` (Sprint 47 Recommendation 3).
+- **#3 t3-006 talent decay** — Fix Rust to honor Talent no-decay invariant (Sprint 47 Recommendation 1; also vector author should regenerate the vector to match the invariant).
+
+Resolution cost from this track: zero (cannot resolve). Resolution cost from
+Rust track: small (line-level edits per Sprint 47 audit), but requires the
+Rust toolchain. The xfails are the right outcome here — they make the cross-language
+divergence executable on every Python test run.
+
+### Tier B — DESIGN-QUESTION-NEEDS-OPERATOR (4 xfails)
+
+These are not "implement the missing thing." They are "decide which side is
+right, then implement." The decision is architectural and lives above the
+autonomous track.
+
+- **#4 r6-val-004 constraint enforcement** — Decision: should `R7Action.validate()` enforce Constraint satisfaction, or is enforcement strictly PolicyGate's responsibility?
+  - The current SDK splits responsibility: `validate()` checks structural correctness, PolicyGate checks policy/constraint satisfaction. This is a defensible architecture.
+  - If the decision is "validate() enforces constraints," the implementation is **small** (one method on R7Action that iterates `self.constraints` and emits errors).
+  - If the decision is "PolicyGate-only," the conformance vector itself should be retargeted (or the xfail rephrased as a documented design split, not a gap).
+
+- **#5 r7-rep-001 V3 valuation in reputation** — Decision: is V3.valuation a behavioral or an economic dimension?
+  - The SDK answer: economic (updated via ATP settlement, not via R7Action quality).
+  - The vector answer: behavioral (`compute_reputation()` should produce a valuation delta).
+  - One answer is wrong. This is a single decision with implementation consequences either way.
+
+- **#6 role-004 assigner authorization** — Decision: is "who can assign which role" data (lives in role.py) or governance (lives in PolicyGate/SocietyState)?
+  - The conformance vector implies a data-layer predicate: `is_allowed_to_assign_roles(role) → bool`, returning True for Sovereign/Administrator and False for others.
+  - The SDK comment in the xfail block places this in the governance layer, not in `web4/role.py`.
+  - Autonomous implementation cost in role.py: **small** (5 lines + tests). But this would lock in the data-layer answer to the architectural question.
+
+- **#7 fed-001 join/secede vs incorporate_child** — Decision: is federation membership child-initiated or parent-initiated?
+  - The SDK chose parent-initiated (`incorporate_child(parent_state, child_state, timestamp)`).
+  - The vector chose child-initiated (`join(parent) → is_constituent=True`, `secede() → is_constituent=False`).
+  - The two patterns express different governance semantics. Web4's sovereignty stance (sub-society sovereignty is intrinsic, not granted) leans **child-initiated** — which makes the SDK's parent-initiated API the candidate to revisit.
+
+Resolution cost from this track once the decision lands: small to medium per item.
+Without the decision, autonomous work risks locking in the wrong side.
+
+### Tier C — NEEDS-SPEC-SCOPING (1 xfail)
+
+- **#8 sub-001 T3 sub-dimension rollup** — Decision: is sub-dimensional T3 a runtime construct or an ontology-metadata construct?
+  - The ontology TTL defines `web4:subDimensionOf` (e.g., `talent:python web4:subDimensionOf talent`).
+  - The runtime T3 dataclass has three scalar fields (talent, training, temperament); it does not accept sub-dimensional attestations and project them via the ontology.
+  - If sub-dimensions are purely metadata for human consumption: the vector itself is over-reaching, and the xfail should be reframed as a documented non-gap (closed via vector retraction or test removal).
+  - If sub-dimensions are runtime constructs: this is a substantial T3 redesign (~medium-large autonomous-session cost — new data structure, ingestion path, rollup math, persistence implications). It also needs to define how sub-dimension attestations compose with the existing scalar fields.
+
+This is the only xfail where the choice isn't "implement vs not" but "is the
+feature scoped at runtime at all" — which is a spec-level question.
+
+## Counter-finding from Sprint 52 (worth preserving)
+
+Of the **27 non-xfail conformance assertions in tensor + ATP + R6 + Society
+suites that the Python SDK passes**, the **11 ATP vectors all pass exactly**
+(no near-misses, no behavioral-but-not-numeric equivalence). The Sprint 49 audit's
+documentary claim "ATP is the best-aligned cross-language pair (identical core
+semantics)" is now operationally confirmed: every conformance vector the operator
+could author against Rust/spec matches what the Python SDK produces. This is a
+stronger statement than the audit made — it's the audit's claim hardened into
+a continuously-executed check.
+
+This counter-finding belongs alongside the xfail catalogue because it answers
+the natural follow-up: "OK, but does anything actually work cross-language?"
+Yes. ATP works fully. T3/V3 works behaviorally (level classifications agree;
+numeric composites diverge). R7 works for happy-path reputation. Society/Role
+bootstrap and rotation work. The xfails are the edge surface, not the whole surface.
+
+## Sprint 53+ candidate buckets
+
+### Autonomous-pickable bucket (no operator unblocking needed)
+
+Most natural candidates are not "fix xfails" but adjacent work that benefits
+from Sprint 52's now-executable conformance baseline:
+
+- **C1 — MCP-as-inter-society-protocol audit** (memory candidate D1): compare
+  the Python SDK's MCP/protocol module against `mcp-protocol.md` §7.3–7.6
+  (added in v0.1.3 amendment) to identify missing implementation surface for
+  inter-society R7 transactions, LCT envelopes, witnessing, reputation
+  propagation, and failure modes. Pure audit; produces one memo. Independent
+  of any operator decision.
+
+- **C2 — mcp-protocol.md internal-consistency audit** (memory candidate D4):
+  read mcp-protocol.md end-to-end and surface internal inconsistencies between
+  §1.1 framing, §7.3–7.6 normative content, and §7.7 WIP. Pure documentation
+  audit. Produces one memo.
+
+- **C3 — §7.7 promotion tracking stub** (memory candidate D2): write a small
+  memo capturing what would need to be true to promote §7.7 (referent-grounded
+  exchange rate negotiation) from v0.1.0-draft to v0.1.0-normative. Two-page
+  artifact. Independent of any other work.
+
+- **C4 — Pre-merge conformance-vector freshness check process**: define a
+  lightweight process (probably a markdown doc + a CI hook design memo) that
+  flags conformance vectors that depend on data structures the SDK has recently
+  changed shape on. This was the open question from session 180024: vectors
+  were authored against R6 Constraint shape and Constraint shape changed in
+  PR #187 the same day vectors landed. The process avoids that re-emerging.
+
+### Operator-blocked bucket (cannot pick autonomously)
+
+- **B1 — Decide #4 r6-val-004**: validate() enforces constraints, or
+  PolicyGate-only? (Decides whether to add `R7Action.check_constraints()`.)
+
+- **B2 — Decide #5 r7-rep-001**: V3.valuation behavioral or economic?
+  (Decides reputation delta API shape.)
+
+- **B3 — Decide #6 role-004**: assigner authorization data or governance layer?
+  (Decides whether `role.py` gets the predicate or `PolicyGate` does.)
+
+- **B4 — Decide #7 fed-001**: federation child-initiated or parent-initiated?
+  (Affects API design across society + federation modules.)
+
+- **B5 — Decide #8 sub-001**: sub-dimensional T3 runtime or metadata-only?
+  (Spec-level scoping question; precedes any implementation.)
+
+- **B6 — P4 carryover**: MetabolicState 5-state vs 7-state reconciliation
+  (Sprint 49 audit; operator-blocked since Sprint 49 surfaced it).
+
+- **B7 — P7 carryover**: SocietyState role-integration architecture (Sprint
+  49 audit; operator-blocked since Sprint 49 surfaced it).
+
+### External-track-blocked bucket (needs Rust toolchain or vector regeneration)
+
+- **R1, R2, R3** — Sprint 47 recommendations #1, #2, #4 (Talent no-decay,
+  weighted composites, update formula) — fix Rust web4-trust-core to match
+  spec. Cannot be done from web4 repo. Vector regeneration follows.
+
+## Closing observation
+
+The Sprint 47 audit identified 8 cross-language T3/V3 divergences; we now see
+that 3 of them surface as Sprint 52 conformance xfails. The Sprint 49 audit
+identified 14 cross-language Society/Role/ATP/R6 items; only 1 of them (Constraint
+shape, P6 / Sprint 51) directly mapped to a Sprint 52 vector, and **none** of
+the new Sprint 52 surface gaps were in Sprint 49's queue. This asymmetry says
+something about the audit methodology: **code-reading audits and behavioral
+conformance audits are not redundant**. Doing one well does not predict what
+the other will find.
+
+A reasonable governance posture going forward: run both periodically, and treat
+disagreements between them (vectors find a gap an audit missed, or an audit
+finds a gap no vector exercises) as signal about which surfaces are
+under-instrumented.