Skip to content

Analysis: Slot URI mapping coverage and alignment with Fairscape/ROCrate #130

@cmungall

Description

@cmungall

Summary

An analysis of how D4D schema slots map to standard vocabularies, and how those mappings align (or conflict) with the Fairscape/ROCrate Pydantic models that already have a D4D conversion layer (fairscape_models/conversion/mapping/d4d.py).

D4D Slot URI Coverage

Of ~414 domain-specific attributes across the D4D modules, approximately 28% are mapped to standard vocabulary URIs via slot_uri:

Module Mapped/Total Coverage
Distribution 3/3 100%
Ethics 7/7 100%
Motivation 8/9 88%
Base (core) 41/65 63%
Maintenance 7/13 53%
Uses 4/10 40%
Variables 9/27 33%
Composition 20/62 32%
Preprocessing 6/22 27%
Collection 5/20 25%
Data Governance 6/39 15%
Evaluation Summary 0/122 0%
Human 0/14 0%

Vocabularies used in D4D

  • dcterms: (Dublin Core) — 70 mappings (dominant)
  • schema: (Schema.org) — 33
  • dcat: (Data Catalog) — 9
  • prov:, skos:, qudt:, DUO: — 1 each

Notable gaps

  • Evaluation Summary (122 slots, 0 mapped) — the largest module with zero URI mappings
  • Human (14 slots, 0 mapped) — human subjects data with no standard vocab links
  • Data Governance (39 slots, 15%) — governance/consent terms could map to DUO/ODRL
  • No mappings to DATS, OBI, IAO, or other biomedical metadata standards

Fairscape URI Approach

Fairscape uses JSON-LD with @vocab: "https://schema.org/" as default namespace plus evi: "https://w3id.org/EVI#" for extensions and rai: for responsible AI fields. Key mappings in Fairscape:

Fairscape field Effective URI Standard
name schema:name Schema.org
description schema:description Schema.org
@id schema:identifier Schema.org
dateCreated schema:dateCreated Schema.org
dateModified schema:dateModified Schema.org
contentUrl schema:contentUrl Schema.org
license schema:license Schema.org
author schema:author Schema.org
contentSize schema:contentSize Schema.org
evi:formats https://w3id.org/EVI#formats EVI (custom)
rai:dataUseCases RAI namespace Custom
rai:dataBiases RAI namespace Custom

D4D ↔ Fairscape Alignment at the URI Level

Both schemas use Schema.org for core metadata. Where they overlap:

D4D slot_uri Fairscape JSON-LD Alignment
schema:name schema:name ✅ Exact
schema:description schema:description ✅ Exact
schema:identifier schema:identifier ✅ Exact
schema:license schema:license ✅ Exact
schema:url schema:url ✅ Exact
dcterms:created schema:dateCreated ⚠️ Same concept, different vocab
dcterms:modified schema:dateModified ⚠️ Same concept, different vocab
dcterms:creator schema:author ⚠️ Same concept, different vocab
dcat:downloadURL schema:contentUrl ⚠️ Same concept, different vocab

The core tension

D4D leans on Dublin Core (dcterms:) for provenance and dates, while Fairscape uses Schema.org for everything. Both are valid standard vocabularies, but this creates unnecessary mapping friction. For example:

  • dcterms:created vs schema:dateCreated — semantically identical
  • dcterms:creator vs schema:author — nearly identical
  • dcat:downloadURL vs schema:contentUrl — same concept

Fairscape's existing D4D mapping

fairscape_models/conversion/mapping/d4d.py already contains a ROCRATE_TO_D4D_MAPPING dict that maps D4D field names to ROCrate/Fairscape source keys. However, this mapping operates at the field name level, not at the URI level. A formal URI-level alignment (e.g., via SSSOM) would:

  1. Make the mapping vocabulary-aware and auditable
  2. Capture the dcterms↔schema.org equivalences explicitly
  3. Identify D4D slots that have no Fairscape equivalent (and vice versa)
  4. Enable automated interoperability tooling

Proposed Next Steps

  1. Add slot_uri mappings to the currently unmapped D4D modules, prioritizing Evaluation Summary (0%) and Human (0%)
  2. Consider harmonizing the dcterms vs schema.org choice — or at minimum, add exact_mappings cross-references between them
  3. Produce a formal SSSOM mapping between D4D slot URIs and Fairscape/ROCrate URIs
  4. Consider adding mappings to domain-relevant standards: DUO (consent), OBI (assays), IAO (information artifacts)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions