Skip to content

Epic: Schema-First Capability Metadata Redesign #37

@normenmueller

Description

@normenmueller

Requirements Pack — Schema-First Capability Metadata Redesign

Origin: adm/pbl/ai4x-cap-metadata-schema-redesign.md

1. Problem Statement

The ai4X capability metadata schema is currently implicit — defined only through scattered validation logic (capability-meta.mjs), rendering code (capability-indexes.mjs), and test fixtures. Schema evolution requires synchronized changes across all three locations, producing recurring deficits (Issues #29, #31). Additionally, multiple metadata fields duplicate information already canonically owned by Git history or filesystem structure, violating the Single Authoritative Source principle. Before curate or other features build on the capability corpus, the metadata foundation must be explicit, minimal, and machine-verifiable from a single schema definition.

2. In-Scope and Out-of-Scope

In-Scope

  • Central declarative schema definition file under adm/gdl/dev/schemas/.
  • Field removal: approved_by, approved_at, scope, status, review_due, migration_note.
  • Field reclassification: owner becomes required; do_not_use_when, distinguish_from, sources become optional.
  • Update of utl/cap/checks/capability-meta.mjs to validate against the central schema file.
  • Update of ai4x doctor to validate against the central schema file.
  • Update of utl/cap/checks/capability-indexes.mjs to consume the new schema.
  • Big-Bang migration of all existing .meta.yaml files (~20) in one dedicated Story.
  • Update of capability-authoring-governance.md Metadata Contract section to reflect the new field inventory.
  • Schema reference documentation in doc/arc/08_concepts.md.
  • Architecture Decision Record in doc/arc/09_architecture_decisions.md.
  • Addition of token_estimate optional field (integer) for deterministic token estimation per capability.
  • CI enforcement of conflicts symmetry invariant (if A conflicts B, B must conflict A).

Out-of-Scope

  • Deprecation/retirement lifecycle mechanisms (superseded_by, status-driven lifecycle).
  • Introduction of a tags field or any tagging taxonomy.
  • Per-file schema_version fields.
  • Changes to capability text contract (section shape, prose rules).
  • curate command implementation (downstream consumer of schema, not part of this Epic).
  • Context budget ceiling and tiered truncation strategy (depend on curate pipeline design; parked until curate is conceptually defined).
  • Runtime selection index generation logic changes beyond schema-driven field consumption.
  • Changes to the Admission Process or Portfolio Review Cadence governance sections.

3. Constraints and Assumptions

Constraints

ID Constraint Source
C1 Schema file must be the single authoritative source for validation in both CI and CLI. Engineering Quality Contract — Single Authoritative Source
C2 Text (.md) remains canonical for the cognitive contract; metadata (.meta.yaml) remains canonical for machine-checkable signals. No duplication. Capability Authoring Governance — Metadata-Text Boundary
C3 Consumer priority for design conflicts: Agent-Selection > Portfolio-Governance > curate-Materialization. Interview Decision A1
C4 Migration must be Big-Bang in a single Story. Interview Decision E11
C5 All artifacts in English. Engineering Quality Contract — Language Rule

Assumptions

ID Assumption Risk if Wrong
A1 The capability portfolio remains small (~20 capabilities) at migration time. If the portfolio grows significantly before migration, the Big-Bang approach may require additional coordination, but the approach remains viable since there are no external consumers.
A2 No external consumers depend on the current .meta.yaml field structure. If external consumers exist, migration would require a deprecation notice period. Known to be safe — ai4X is a single-repository project with no published API.
A3 arc42 template structure (PR #36) is merged before documentation Stories begin. If not merged, documentation Stories are blocked. Mitigation: PR #36 is a prerequisite dependency.

4. Acceptance Criteria

Schema Definition

AC-01 (Ubiquitous): The system shall maintain a central schema definition file at adm/gdl/dev/schemas/capability-meta.schema.yaml that declares all valid metadata fields, their types, and their required/optional status.

AC-02 (Ubiquitous): The schema definition file shall declare exactly three required fields: id, version, owner.

AC-03 (Ubiquitous): The schema definition file shall declare exactly six optional fields: do_not_use_when (string array), distinguish_from (array of objects with properties id and boundary), requires (string array), conflicts (string array), sources (array of objects), token_estimate (integer).

AC-04 (Unwanted): If a .meta.yaml file contains a field not declared in the central schema definition, then the validation tooling shall report it as an error.

Field Removal

AC-05 (Ubiquitous): The schema definition file shall not declare the fields approved_by, approved_at, scope, status, review_due, or migration_note.

AC-06 (Ubiquitous): No .meta.yaml file under dev/cap/ shall contain the fields approved_by, approved_at, scope, status, review_due, or migration_note after migration.

Field Retention and Reclassification

AC-07 (Ubiquitous): The owner field shall be required in the schema definition and shall represent "Semantic Authority over the cognitive contract".

AC-08 (Ubiquitous): The do_not_use_when field shall be typed as a string array in the schema definition.

AC-09 (Ubiquitous): The distinguish_from field shall be typed as an array of objects, each containing exactly the properties id (string) and boundary (string).

AC-10 (Unwanted): If a required field (id, version, owner) is absent from a .meta.yaml file, then the validation tooling shall report it as an error.

Validation Tooling

AC-11 (Ubiquitous): The CI validation script utl/cap/checks/capability-meta.mjs shall derive its validation rules from the central schema definition file, not from inline constants.

AC-12 (Ubiquitous): The ai4x doctor command shall validate .meta.yaml files against the central schema definition file.

AC-13 (Event-driven): When the central schema definition file is modified, both utl/cap/checks/capability-meta.mjs and ai4x doctor shall enforce the updated schema without code changes to their validation logic.

AC-14 (Unwanted): If the central schema definition file is absent or unparseable, then both validation tools shall fail with an explicit error message naming the missing or malformed file.

Index Generation

AC-15 (Ubiquitous): The index generation script utl/cap/checks/capability-indexes.mjs shall consume field definitions from the central schema definition file to determine which metadata fields to render.

Migration

AC-16 (Ubiquitous): All existing .meta.yaml files under dev/cap/ shall be migrated to conform to the new schema in a single atomic commit.

AC-17 (Event-driven): When migration is complete, the CI pipeline shall pass with zero validation errors against the new schema.

Governance Documentation

AC-18 (Ubiquitous): The Metadata Contract section of adm/gdl/dev/contracts/capability-authoring-governance.md shall list only the fields declared in the central schema definition file, with correct required/optional classification.

AC-19 (Ubiquitous): The doc/arc/08_concepts.md file shall contain a schema reference section documenting the capability metadata schema structure, field semantics, and ownership rules.

AC-20 (Ubiquitous): The doc/arc/09_architecture_decisions.md file shall contain an ADR documenting the Schema-First Capability Metadata Redesign decision, including context, decision, and consequences.

5. Rejected Alternatives and Rationale

Area Rejected Alternative Rationale
Schema location Per-file schema_version field in each .meta.yaml Violates Single Authoritative Source. Central schema file eliminates version drift and synchronization burden.
Lifecycle modeling Metadata status field with explicit state machine Duplicates information already canonical in Git branching (trunk = released, feature branch = WIP). Violates Single Authoritative Source.
Approval tracking approved_by / approved_at metadata fields Git commit metadata (author, timestamp, merge evidence) is the canonical approval record. Duplicating it in YAML creates drift.
Scope declaration Explicit scope metadata field Directory path under dev/cap/ already encodes scope unambiguously. Redundant field.
Migration strategy Incremental migration with backward-compatible transition period Portfolio is small (~20), no external consumers. Big-Bang eliminates transition complexity and dual-schema validation overhead.
Disambiguation Tags/taxonomy field for capability categorization Semantic disambiguation via Purpose + Trigger + do_not_use_when + distinguish_from is richer and more precise than flat tags.

6. Risks and Unresolved Decisions

Risks

ID Risk Likelihood Impact Mitigation
R1 Schema file format choice (YAML vs. JSON Schema) may affect tooling compatibility with ai4x doctor (TypeScript). Medium Low Both formats are parseable in TypeScript. Format decision is a Story-level implementation detail, not an Epic constraint.
R2 capability-indexes.mjs depends on fields being present that are now removed (e.g., status for filtering). Medium Medium Change Impact analysis during Story implementation must identify all field consumers before migration.
R3 Governance document update (AC-18) creates a merge dependency between the migration Story and the governance update Story. Low Low Sequence Stories so governance update follows migration.

Unresolved Decisions

ID Decision Resolution Path
U1 Exact schema file format (JSON Schema subset in YAML, custom YAML DSL, or pure JSON Schema). Resolve during Story-level design of the schema definition Story. Constraint: must be parseable by both Node.js (CI scripts) and the TypeScript CLI without external schema-validation libraries beyond what is already in package.json.
U2 Whether sources objects require all five properties (title, organization, url, kind, accessed_at) or a subset. Resolve during schema definition Story. Current governance states all five, but the redesign makes sources optional — the internal structure of each entry needs explicit decision.

AC Coverage Matrix

AC Description Story
AC-01 Central schema definition file #38 (S1)
AC-02 Three required fields: id, version, owner #38 (S1)
AC-03 Five optional fields #38 (S1)
AC-04 Unknown fields → error #39 (S2)
AC-05 Removed fields not in schema #38 (S1)
AC-06 Removed fields not in any .meta.yaml after migration #40 (S3)
AC-07 owner required, Semantic Authority #38 (S1)
AC-08 do_not_use_when typed as string array #38 (S1)
AC-09 distinguish_from typed as [{id, boundary}] #38 (S1)
AC-10 Missing required field → error #39 (S2)
AC-11 CI validation derives rules from schema #39 (S2)
AC-12 ai4x doctor validates against schema #41 (S4)
AC-13 Schema change → no code change needed #39 (S2)
AC-14 Missing/unparseable schema → explicit error #39 (S2)
AC-15 Index generation consumes schema #39 (S2)
AC-16 All .meta.yaml migrated in single atomic commit #40 (S3)
AC-17 CI passes with zero errors after migration #40 (S3)
AC-18 Governance Metadata Contract updated #42 (S5)
AC-19 Schema reference in doc/arc/08_concepts.md #42 (S5)
AC-20 ADR in doc/arc/09_architecture_decisions.md #42 (S5)
AC-21 token_estimate optional field in schema #72 (S6)
AC-22 CI computes and verifies token_estimate #72 (S6)
AC-23 All .meta.yaml populated with token_estimate #72 (S6)
AC-24 token_estimate deviation → error #72 (S6)
AC-25 conflicts symmetry invariant in CI #39 (S2)

Coverage: 25/25 ACs covered. No gaps, no orphans.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic: refined requirement scope with acceptance criteria

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions