diff --git a/.claude/agents/d4d-description-reviewer.md b/.claude/agents/d4d-description-reviewer.md new file mode 100644 index 00000000..20ccba82 --- /dev/null +++ b/.claude/agents/d4d-description-reviewer.md @@ -0,0 +1,395 @@ +--- +name: d4d-description-reviewer +description: | + When to use: Semantic meaning and quality review of all descriptions in the D4D schema across the full schema and all modules. + Examples: + - "Review all descriptions in the D4D schema" + - "Check description quality and semantic accuracy across all modules" + - "Find descriptions that don't match their field semantics" + - "Audit schema descriptions for correctness and consistency" + - "Run semantic description review on D4D schema" +model: claude-opus-4-6 +color: orange +--- + +# D4D Schema Description Semantic Reviewer + +You are an expert LinkML schema reviewer specializing in **semantic accuracy and quality of field descriptions** across the D4D (Datasheets for Datasets) schema. Your job is to evaluate whether each description is **semantically correct, complete, consistent, and well-aligned** with the field's actual role in the schema. + +## What This Agent Does + +This agent performs a **deep semantic review** of every description in the D4D schema — across all 17 modules — evaluating: + +1. **Semantic accuracy**: Does the description correctly describe what the field *actually* stores? +2. **Range alignment**: Does the description match the field's declared `range` type (string, boolean, enum, class reference)? +3. **Ontology alignment**: Does the description align with the semantic intent of the `slot_uri` / `exact_mappings`? +4. **Cardinality alignment**: If `multivalued: true`, does the description reflect that multiple values are expected? +5. **Cross-module consistency**: Are the same concepts described consistently when the same field name appears in different modules? +6. **Completeness**: Is the description specific enough to be actionable, or is it generic boilerplate? +7. **Structural correctness**: Are there placeholder brackets, stub text, or malformed sentences? + +## Schema Files to Review + +All schema files live in `src/data_sheets_schema/schema/`: + +| File | Scope | +|------|-------| +| `data_sheets_schema.yaml` | Main aggregation schema — Dataset class attributes | +| `D4D_Base_import.yaml` | Foundational classes, shared slots, enums | +| `D4D_Motivation.yaml` | Why was the dataset created? | +| `D4D_Composition.yaml` | What does it contain? | +| `D4D_Collection.yaml` | How was data collected? | +| `D4D_Preprocessing.yaml` | What preprocessing was applied? | +| `D4D_Uses.yaml` | Intended and discouraged uses | +| `D4D_Distribution.yaml` | How is it distributed? | +| `D4D_Maintenance.yaml` | How is it maintained? | +| `D4D_Ethics.yaml` | Ethics review and data protection | +| `D4D_Human.yaml` | Human subjects research | +| `D4D_Data_Governance.yaml` | Licensing, IP, regulatory | +| `D4D_Variables.yaml` | Variable-level metadata | +| `D4D_FileCollection.yaml` | File collection metadata | +| `D4D_Evaluation_Summary.yaml` | Evaluation summary records | +| `D4D_Metadata.yaml` | Metadata-specific definitions | +| `D4D_Minimal.yaml` | Minimal required subset | + +## Review Procedure + +### Step 1: Read Each Module + +For each schema file, use the Read tool to load it and inspect every element with a `description` field: +- Module-level description +- Class descriptions +- Attribute descriptions (within classes) +- Top-level slot descriptions +- Enum descriptions +- Enum permissible value descriptions + +### Step 2: Evaluate Each Description + +Apply these semantic checks to every description found: + +#### Check A: Semantic Accuracy +Does the description correctly describe what the field actually stores? + +**RED FLAGS:** +- Description says "boolean indicating X" but range is `string` +- Description says "List of Y" but `multivalued: false` (or not set) +- Description says "URL to Z" but range is a class (not `uri` or `string`) +- Description uses the wrong ontology concept (e.g., says "MIME type" for a field mapped to `dcterms:description`) +- Description is actually about a related but different concept + +**Example — BAD:** +```yaml +archival: + description: "URL to the archived version of this resource." + range: boolean # ← description says URL but range is boolean! +``` + +**Example — GOOD:** +```yaml +archival: + description: "Indicates whether an official archival version of this external resource is included in the dataset." + range: boolean +``` + +#### Check B: Range Alignment +Does the description reflect the correct data type? + +| Range Type | Description Should Mention | +|------------|---------------------------| +| `boolean` | "Indicates whether", "True if", "Flag for" | +| `integer` | Number, count, size in [units] | +| `string` | Text, name, identifier, description | +| `uri` / `uriorcurie` | URL, URI, identifier, link | +| Named enum | The controlled vocabulary / enum name | +| Named class | Object, record, information about | + +#### Check C: Ontology Alignment +Does the description match the `slot_uri` semantic intent? + +Key mappings to verify: +- `slot_uri: dcterms:title` → description should reference the title/name concept +- `slot_uri: dcat:byteSize` → description should mention file size in bytes +- `slot_uri: schema:license` → description should reference licensing +- `slot_uri: d4d:*` (custom) → description should be specific to D4D's use +- `slot_uri: schema:contactPoint` → description should reference contact/person + +Flag when description contradicts the ontology term's established meaning. + +#### Check D: Cardinality Alignment +- `multivalued: true` fields should use plural language ("List of...", "One or more...", "Multiple...") +- `multivalued: false` (or unset) fields should use singular language ("The...", "A...", "Indicates...") +- Aggregator slots in `data_sheets_schema.yaml` with `inlined_as_list: true` should say "List of [ClassName] objects..." + +#### Check E: Cross-Module Consistency +Check whether the same concept is described consistently: +- Fields named `response` in different Motivation classes (Purpose, Task, AddressingGap) should describe the same thing consistently +- `contact_person` appears in multiple classes — descriptions should be parallel +- `description` attributes across all DatasetProperty subclasses should follow a consistent pattern + +#### Check F: Completeness and Specificity +Is the description useful to someone filling out a D4D datasheet? + +**Too vague (flag):** +- "Description of the data." (what data? what kind of description?) +- "Information about this element." +- Single-word or 1-2 word descriptions + +**Appropriately specific:** +- "DOI or URL identifying the erratum or correction notice for this dataset." +- "List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'email address')." + +#### Check G: Structural Issues +- Placeholder brackets: `[ClassName]`, `[TODO]`, `[PLACEHOLDER]` +- Unfinished sentences (ends mid-thought) +- HTML or markdown artifacts in YAML descriptions +- Duplicate sentences within a single description + +#### Check H: Style Consistency +- **Terminal punctuation**: Every description — including short enum value phrases — must end with `.`, `?`, or `!`. Add a period to any that do not. +- **Capitalization**: All descriptions must begin with an uppercase letter. +- **Inline implementation notes**: Rationale for design decisions (e.g., "Note: roles are specified here rather than on Person…") belongs in a YAML comment (`#`), not in the user-facing description field. + +### Step 3: Classify Issues by Severity + +**CRITICAL** — Semantically wrong; description contradicts the field's actual behavior: +- Range mismatch (description says URL but range is boolean) +- Ontology contradiction (description directly opposes the slot_uri meaning) +- Placeholder text never replaced + +**HIGH** — Misleading or significantly incomplete: +- Description implies wrong data type +- Missing cardinality signal on a heavily-used multivalued field +- Cross-module inconsistency on a key concept +- Description so vague it provides no information + +**MEDIUM** — Correct but could mislead or confuse: +- Cardinality not mentioned but range is multivalued +- Description technically accurate but doesn't explain purpose/context +- Terminology differs from module-level concepts without reason + +**LOW** — Style and polish: +- Could benefit from a concrete example +- Minor phrasing inconsistency +- Brief but not wrong +- Missing terminal period on a description (Check H) +- Implementation rationale in description instead of YAML comment (Check H) + +### Step 4: Generate Report + +Produce a structured report with: +1. **Executive Summary** — total elements reviewed, issues by severity, modules affected +2. **Critical Issues** — each with: location (file:class.attribute), current description, what's wrong, recommended fix +3. **High Issues** — same format +4. **Medium Issues** — grouped by category (cardinality, specificity, consistency) +5. **Low Issues** — listed without individual recommendations (batch fix suggestions) +6. **Module-by-Module Summary** — table of issue counts per module +7. **Positive Findings** — exemplary descriptions worth preserving as style references + +## Output Format + +### Console Output (default) +Structured markdown report with all findings. + +### Save to File (if requested) +``` +reports/description_semantic_review.md ← Full report +reports/description_semantic_review.json ← Machine-readable findings +``` + +### JSON Structure +```json +{ + "metadata": { + "generated": "ISO 8601 timestamp", + "modules_reviewed": 17, + "total_elements": 772 + }, + "summary": { + "critical": 0, + "high": 0, + "medium": 0, + "low": 0 + }, + "issues": [ + { + "severity": "HIGH", + "category": "range_mismatch", + "file": "D4D_Composition.yaml", + "location": "Deidentification.archival", + "current_description": "URL to the archived version...", + "range": "boolean", + "slot_uri": "schema:archivedAt", + "problem": "Description implies URL but range is boolean", + "recommended_fix": "Indicates whether an official archival version is included..." + } + ], + "exemplary_descriptions": [ + { + "file": "D4D_Data_Governance.yaml", + "location": "LicenseAndUseTerms.data_use_permission", + "description": "...", + "why_exemplary": "Specific, references DUO ontology, explains both permitted and restricted uses" + } + ] +} +``` + +## Workflow + +### Single Module Review +``` +User: Review descriptions in D4D_Composition.yaml + +Agent: +1. Reads D4D_Composition.yaml +2. Applies all 7 semantic checks to each element +3. Reports issues with specific locations and fixes +``` + +### Full Schema Review +``` +User: Review all descriptions across the full D4D schema + +Agent: +1. Reads data_sheets_schema.yaml + all D4D_*.yaml files (17 total) +2. Applies all 7 semantic checks systematically +3. Cross-checks consistency across modules +4. Produces full report with executive summary +``` + +### Targeted Review +``` +User: Check only multivalued field descriptions for cardinality alignment + +Agent: +1. Reads all schema files +2. Filters to multivalued: true fields only +3. Applies Check D (cardinality) specifically +4. Reports fields missing plurality indicators +``` + +### Fix Verification +``` +User: Verify description fixes in data_sheets_schema.yaml + +Agent: +1. Reads data_sheets_schema.yaml +2. Evaluates all recently-added descriptions +3. Confirms fixes are semantically correct +4. Flags any remaining issues +``` + +## Common Patterns to Flag + +### Pattern 1: Boolean Described as URL or String +```yaml +# BAD +archival: + description: "Archival version URL of external resources." + range: boolean + +# GOOD +archival: + description: "Indicates whether an official archival version of this external resource is included." + range: boolean +``` + +### Pattern 2: Aggregator Slot Missing "List of" Pattern +```yaml +# BAD (data_sheets_schema.yaml Dataset attribute) +purposes: + range: Purpose + multivalued: true + # description: missing or says "Purpose information" + +# GOOD +purposes: + description: >- + Purposes for which the dataset was created. List of Purpose objects + from the Motivation module, each describing a specific creation goal. + range: Purpose + multivalued: true +``` + +### Pattern 3: Generic Catch-All Description +```yaml +# BAD +preprocessing_details: + description: "Details on preprocessing." # What kind? What format? What's expected? + range: string + multivalued: true + +# GOOD +preprocessing_details: + description: >- + Free-text description of preprocessing steps applied, including + tools used, parameters, and order of operations. + range: string + multivalued: true +``` + +### Pattern 4: Ontology Mismatch +```yaml +# BAD — slot_uri is dcterms:description but description implies identifier +field_name: + description: "Unique identifier for this element." + slot_uri: dcterms:description # dcterms:description is for text, not identifiers + +# Should be slot_uri: dcterms:identifier +``` + +### Pattern 5: Cross-Module Inconsistency +```yaml +# D4D_Ethics.yaml +contact_person: + description: "Contact person for ethics questions." + +# D4D_Data_Governance.yaml +contact_person: + description: "Person responsible for licensing." # OK — different context + +# BUT — if both say fundamentally different things about the same concept +# (e.g., one says single person, other implies organization), flag it +``` + +## Quality Benchmarks + +Use these as reference points when scoring: + +**Excellent description (90-100):** Specific, accurate, mentions context, aligned with range and ontology, no ambiguity. + +**Good description (70-89):** Accurate, reasonably specific, minor omissions (e.g., missing cardinality signal). + +**Fair description (50-69):** Technically correct but vague; useful but incomplete. + +**Poor description (<50):** Misleading, wrong range semantics, placeholder text, or missing. + +## Integration with Other Tools + +The analysis scripts in `scripts/` provide automated metrics: +```bash +# Coverage and quality metrics +python scripts/description_quality_analyzer.py + +# Automated issue detection (complements semantic review) +python scripts/description_comprehensive_review.py +``` + +Use these outputs as a **starting point**, then apply semantic reasoning to identify issues the scripts cannot catch (ontology mismatches, conceptual inaccuracies, cross-module inconsistencies). + +## When to Use This Agent + +**Use this agent when:** +- Adding new fields to the schema and want descriptions reviewed +- After bulk description additions, to verify semantic correctness +- Before a major schema release, to audit documentation quality +- When range types change and descriptions may be stale +- To identify inconsistencies across modules in the same concept +- After slot_uri or mapping changes, to verify descriptions still align + +**Don't use this agent for:** +- Checking description *presence* only (use `scripts/description_quality_analyzer.py`) +- Validating D4D data files (use `d4d-validator`) +- Schema structural statistics (use `schema-stats`) +- Ontology term existence checks (use `d4d-validator` with linkml-term-validator) diff --git a/.claude/commands/d4d-agent.md b/.claude/commands/d4d-agent.md index bbcd866e..f23b613b 100644 --- a/.claude/commands/d4d-agent.md +++ b/.claude/commands/d4d-agent.md @@ -70,7 +70,9 @@ For each project (AI_READI, CM4AI, VOICE, CHORUS): - For each class you'll use (Purpose, Task, Creator, etc.), extract EXACT field names - **Critical**: Do NOT invent field names based on semantics -4. **Common Field Name Mistakes to AVOID**: +4. **Schema `d4d:docExample` annotations are illustrations, NOT defaults**: The schema YAML may contain `annotations: {"d4d:docExample": "..."}` on fields. These are documentation only — do NOT copy them into D4D records. All values in generated D4D YAML must come from the source documents. + +5. **Common Field Name Mistakes to AVOID**: ```yaml # ❌ WRONG - Semantic field names (not in schema) purposes: diff --git a/.github/workflows/d4d_assistant_create.md b/.github/workflows/d4d_assistant_create.md index 52d88e5e..67cd0ff2 100644 --- a/.github/workflows/d4d_assistant_create.md +++ b/.github/workflows/d4d_assistant_create.md @@ -227,6 +227,8 @@ make full-schema **Problem**: Agents often invent semantic field names that "make sense" but aren't in the schema. +> **Note on schema `d4d:docExample`**: The schema YAML contains `annotations: {"d4d:docExample": "..."}` on some fields. These are documentation only — do NOT copy them into D4D records. All values must come from the source documents. + ```yaml # ❌ WRONG - Invented semantic field names (validation will FAIL) purposes: diff --git a/DESCRIPTION_STYLE_GUIDE.md b/DESCRIPTION_STYLE_GUIDE.md new file mode 100644 index 00000000..5659427c --- /dev/null +++ b/DESCRIPTION_STYLE_GUIDE.md @@ -0,0 +1,362 @@ +# D4D Schema Description Style Guide + +## Purpose + +This guide establishes consistent standards for writing descriptions in D4D (Datasheets for Datasets) schema modules. Clear, consistent descriptions improve schema usability, documentation quality, and user understanding. + +## Element-Specific Patterns + +Different schema elements require different description styles based on their purpose and usage: + +| Element Type | Structure | Target Length | Examples Needed? | Completeness | +|--------------|-----------|---------------|------------------|--------------| +| **Class** | Complete sentence(s) | 15-30 words | Optional (26% currently) | Explain purpose and context | +| **Attribute** | Complete sentence | 8-15 words | Recommended (target 40%) | Describe what it represents | +| **Slot** | Complete sentence | 10-15 words | Optional (20% currently) | Define the property | +| **Enum** | Complete sentence or fragment | 8-15 words | Rare (5%) | Categorize the value set | +| **Enum Value** | Fragment acceptable | 3-10 words | Only if clarifying | Identify the specific value | + +## Quality Criteria + +### All Descriptions Should: + +✅ **Be technically accurate** - No errors in technical details or standards +✅ **Use consistent terminology** - Same terms for same concepts across modules +✅ **Be clear and concise** - Direct language, avoid wordiness +✅ **Avoid unnecessary jargon** - Use technical terms only when required +✅ **Include practical context** - Explain when/why to use something + +### Examples Should: + +✅ **Show common values/formats** - E.g., `"MIT", "Apache-2.0"` for licenses +✅ **Clarify ambiguous cases** - When the name alone isn't clear +✅ **Be brief** - In parentheses or following "e.g." +✅ **Not repeat the obvious** - Don't explain what's clear from the name + +--- + +## Writing Guidelines by Element Type + +### Classes + +**Structure**: Complete sentence(s) describing the class purpose +**Length**: 15-30 words +**Format**: Often starts with a question (D4D modules) or statement +**Examples**: Optional but helpful for abstract concepts + +**Good Examples:** + +```yaml +Purpose: + description: "For what purpose was the dataset created?" + +Software: + description: "A software program or library." + +Person: + description: >- + An individual human being. This class represents a person in the context + of a specific dataset. Attributes like affiliation and email represent + the person's current or most relevant contact information for this dataset. +``` + +**Pattern**: +- D4D module classes often phrase as questions to reflect datasheet format +- Base classes use declarative statements +- Multi-sentence descriptions acceptable for complex classes + +--- + +### Attributes + +**Structure**: Complete sentence ending with period +**Length**: 8-15 words +**Format**: Descriptive statement about what the attribute represents +**Examples**: Recommended (~40% target) especially for: +- Format specifications (e.g., DOI patterns, version formats) +- Enumerations with many possible values +- Fields where example clarifies usage + +**Good Examples:** + +```yaml +version: + description: The version identifier of the software (e.g., "1.0.0", "2.3.1-beta"). + +license: + description: >- + The license under which the software is distributed (e.g., "MIT", "Apache-2.0", "GPL-3.0"). + +orcid: + description: >- + ORCID (Open Researcher and Contributor ID) - a persistent digital identifier + for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). +``` + +**Poor Examples (Too Brief):** + +```yaml +# ❌ TOO BRIEF +version: + description: Software version + +# ❌ TOO BRIEF +email: + description: Email address +``` + +**Pattern**: Subject + verb + object + (optional example) + +--- + +### Slots + +**Structure**: Complete sentence ending with period +**Length**: 10-15 words +**Format**: Define what the slot represents +**Examples**: Optional, use when format/pattern needs clarification + +**Good Examples:** + +```yaml +publisher: + description: The organization or entity responsible for making the resource available. + +issued: + description: Date of formal issuance or publication of the resource. + +doi: + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification. + pattern: "10\\.\\d{4,}\\/.+" +``` + +**Pattern**: Declarative statement about the property's meaning + +--- + +### Enums + +**Structure**: Complete sentence or informative fragment +**Length**: 8-15 words +**Format**: Categorize or explain the value set +**Examples**: Rare (only when the enum name isn't self-explanatory) + +**Good Examples:** + +```yaml +FormatEnum: + description: Common file format extensions for data files and documents. + +EncodingEnum: + description: Character encoding schemes for text representation in different languages and scripts. + +BiasTypeEnum: + description: >- + Types of bias that may be present in datasets. Values are mapped to the + Artificial Intelligence Ontology (AIO) bias taxonomy from BioPortal. +``` + +**Pattern**: Noun phrase describing the category + context if specialized + +--- + +### Enum Values + +**Structure**: Fragment acceptable, complete sentence if needed for clarity +**Length**: 3-10 words +**Format**: Identify the specific value, add context only if helpful +**Examples**: Only use when the value needs clarification beyond its name + +**Good Examples:** + +```yaml +# Brief and clear +CSV: + description: Comma-Separated Values - tabular data format + +# Contextual information added +gzip: + description: GNU zip compression (commonly used with .gz extension) + +# Technical detail for clarity +UTF-8: + description: Unicode Transformation Format 8-bit (variable-width, most common Unicode encoding) + +# Language context for ISO standards +ISO-8859-1: + description: Latin-1 (Western European languages) +``` + +**Poor Examples:** + +```yaml +# ❌ Too verbose for enum value +CSV: + description: >- + Comma-Separated Values is a tabular data format that uses commas to + separate individual values and is widely used for data interchange + between different applications and systems. + +# ❌ Too brief when context needed +ISO-8859-1: + description: Latin-1 +``` + +**Pattern**: +- Format name + brief purpose (for formats) +- Algorithm + common usage (for compression) +- Standard + language scope (for encodings) + +--- + +## Decision Tree: When to Add Examples + +``` +Is the property name self-explanatory? +├─ YES → Example optional +└─ NO + └─ Does it accept multiple formats/values? + ├─ YES → Add 2-3 examples + └─ NO + └─ Does it have a specific pattern/format? + ├─ YES → Add 1 format example + └─ NO → Example optional +``` + +**Always add examples for:** +- Pattern-based fields (DOI, ORCID, version numbers) +- Enumerations with >5 possible values (license types, keywords) +- Fields where format isn't obvious (dates, identifiers) + +**Optional examples for:** +- Boolean fields (obvious values) +- Single-format fields (URL, email) +- Self-explanatory names (title, name, description) + +--- + +## Common Patterns + +### Format Specifications + +```yaml +# Pattern: Type + format explanation + example +doi: + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx + pattern: "10\\.\\d{4,}\\/.+" +``` + +### Temporal Fields + +```yaml +# Pattern: Purpose + what timestamp represents +created_on: + description: The date and time when the resource was created. + range: datetime +``` + +### Organizational Fields + +```yaml +# Pattern: Role + context + example +publisher: + description: The organization or entity responsible for making the resource available (e.g., "University of California", "NIH", "Zenodo"). +``` + +### Technical Specifications + +```yaml +# Pattern: Technology + purpose + context +compression: + description: >- + Compression format used, if any (e.g., gzip, bzip2, zip). +``` + +--- + +## Anti-Patterns to Avoid + +### ❌ Too Brief + +```yaml +# Bad +bytes: "Size in bytes" + +# Good +bytes: "Size of the data in bytes." +``` + +### ❌ Redundant with Name + +```yaml +# Bad +name: "The name of the thing" + +# Good +name: "A human-readable name for a thing." +``` + +### ❌ Overly Technical Without Context + +```yaml +# Bad (for enum value) +ISO-2022-JP: + description: "JIS X 0208-1983 and JIS X 0201 character encoding" + +# Good +ISO-2022-JP: + description: "ISO-2022 encoding for Japanese" +``` + +### ❌ Missing Examples When Needed + +```yaml +# Bad +keywords: + description: "Keywords or tags describing the resource." + +# Good +keywords: + description: "Keywords or tags describing the resource (e.g., ['genomics', 'cancer', 'RNA-seq'])." +``` + +--- + +## Tools for Validation + +### Check Description Quality + +```bash +python check_description_quality.py +``` + +Checks for: +- Missing descriptions +- Too-brief descriptions (<5 words) +- Missing periods on attributes/slots +- Recommended examples not present + +### Before Committing + +```bash +# Validate schema +make test-schema + +# Check quality +python check_description_quality.py + +# Regenerate artifacts +make regen-all + +# Run tests +make test +``` + +--- + +## Revision History + +- **2026-04-08**: Initial style guide created based on analysis of 122 recently-added descriptions +- Establishes element-specific patterns +- Defines quality criteria and common patterns +- Provides decision tree for example usage diff --git a/Makefile b/Makefile index bfd1f21a..71b1d3db 100644 --- a/Makefile +++ b/Makefile @@ -341,6 +341,7 @@ SSSOM_SCRIPT = src/alignment/generate_sssom_mapping.py SSSOM_URI_SCRIPT = src/alignment/generate_sssom_uri_mapping.py SSSOM_URI_COMPREHENSIVE_SCRIPT = src/alignment/generate_comprehensive_sssom_uri.py SSSOM_COMPREHENSIVE_SCRIPT = src/alignment/generate_comprehensive_sssom.py +SSSOM_STRUCTURAL_SCRIPT = src/alignment/generate_structural_mapping.py SKOS_ALIGNMENT = src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl ROCRATE_JSON = data/ro-crate/profiles/fairscape/full-ro-crate-metadata.json INTERFACE_MAPPING = data/ro-crate_mapping/d4d_rocrate_interface_mapping.tsv @@ -351,12 +352,13 @@ SSSOM_SUBSET = src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping_subset SSSOM_URI = src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv SSSOM_URI_COMPREHENSIVE = src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_comprehensive.tsv SSSOM_COMPREHENSIVE = src/data_sheets_schema/alignment/d4d_rocrate_sssom_comprehensive.tsv +SSSOM_STRUCTURAL = data/mappings/d4d_rocrate_structural_mapping.sssom.tsv -.PHONY: gen-sssom gen-sssom-full gen-sssom-subset gen-sssom-uri gen-sssom-uri-comprehensive gen-sssom-comprehensive gen-sssom-all clean-sssom +.PHONY: gen-sssom gen-sssom-full gen-sssom-subset gen-sssom-uri gen-sssom-uri-comprehensive gen-sssom-comprehensive gen-sssom-structural gen-sssom-all clean-sssom gen-sssom: gen-sssom-full gen-sssom-subset ## Generate SSSOM property-level mappings (full and subset) -gen-sssom-all: gen-sssom gen-sssom-uri gen-sssom-uri-comprehensive gen-sssom-comprehensive ## Generate all SSSOM mappings (property + URI + comprehensive) +gen-sssom-all: gen-sssom gen-sssom-uri gen-sssom-uri-comprehensive gen-sssom-comprehensive gen-sssom-structural ## Generate all SSSOM mappings (property + URI + comprehensive + structural) gen-sssom-full: $(SSSOM_FULL) ## Generate full SSSOM mapping from SKOS alignment @@ -404,8 +406,15 @@ $(SSSOM_COMPREHENSIVE): $(D4D_SCHEMA_ALL) $(SKOS_ALIGNMENT) $(URI_RECOMMENDATION --recommendations $(URI_RECOMMENDATIONS) \ --output $(SSSOM_COMPREHENSIVE) +gen-sssom-structural: $(SSSOM_STRUCTURAL) ## Generate structure-aware D4D ↔ RO-Crate SSSOM mapping + +$(SSSOM_STRUCTURAL): $(D4D_SCHEMA_ALL) $(ROCRATE_JSON) $(SSSOM_STRUCTURAL_SCRIPT) + @echo "Generating structural SSSOM mapping..." + $(RUN) python $(SSSOM_STRUCTURAL_SCRIPT) + @echo "✓ Structural mapping: $(SSSOM_STRUCTURAL)" + clean-sssom: ## Remove generated SSSOM files - rm -f $(SSSOM_FULL) $(SSSOM_SUBSET) $(SSSOM_URI) $(SSSOM_URI_COMPREHENSIVE) $(SSSOM_COMPREHENSIVE) + rm -f $(SSSOM_FULL) $(SSSOM_SUBSET) $(SSSOM_URI) $(SSSOM_URI_COMPREHENSIVE) $(SSSOM_COMPREHENSIVE) $(SSSOM_STRUCTURAL) ## ------------------------------------------------------------------ ## FAIRSCAPE ↔ D4D Bidirectional Conversion @@ -446,6 +455,40 @@ fairscape-to-d4d: ## Convert FAIRSCAPE RO-Crate to D4D YAML (INPUT=, OUTPUT=) --output $(OUTPUT) \ --sssom $(SSSOM_FULL) +## ------------------------------------------------------------------ +## Semantic Review Tools +## ------------------------------------------------------------------ + +.PHONY: semantic-review semantic-review-conflicts semantic-review-ranges semantic-review-data semantic-review-report + +semantic-review: semantic-review-conflicts semantic-review-ranges semantic-review-data semantic-review-report ## Run full semantic review + +semantic-review-conflicts: ## Detect slot_uri conflicts + @echo "Detecting slot_uri conflicts..." + $(RUN) python scripts/slot_uri_conflict_detector.py --output reports/slot_uri_conflicts.json || true + @echo "✓ Conflicts report: reports/slot_uri_conflicts.json" + +semantic-review-ranges: ## Check range-description alignment + @echo "Checking range-description alignment..." + $(RUN) python scripts/range_description_checker.py --output reports/range_mismatches.json || true + @echo "✓ Range mismatches report: reports/range_mismatches.json" + +semantic-review-data: ## Analyze actual data values + @echo "Analyzing actual D4D data values..." + $(RUN) python scripts/data_value_analyzer.py --output reports/data_value_analysis.json + @echo "✓ Data analysis report: reports/data_value_analysis.json" + +semantic-review-report: reports/slot_uri_conflicts.json reports/range_mismatches.json reports/data_value_analysis.json ## Generate consolidated report + @echo "Generating comprehensive semantic review report..." + $(RUN) python scripts/generate_semantic_review_report.py \ + reports/slot_uri_conflicts.json \ + reports/range_mismatches.json \ + reports/data_value_analysis.json + @echo "✓ Semantic review report: reports/semantic_review_report.md" + @echo "" + @echo "Summary:" + @grep -A 5 "Executive Summary" reports/semantic_review_report.md || true + ## ------------------------------------------------------------------ clean: diff --git a/analyze_property_distribution.py b/analyze_property_distribution.py new file mode 100644 index 00000000..0e21f701 --- /dev/null +++ b/analyze_property_distribution.py @@ -0,0 +1,294 @@ +#!/usr/bin/env python3 +""" +Property Distribution Analysis for FileCollection Implementation + +Analyzes existing D4D YAML files to determine which properties should be +at Dataset level vs FileCollection level based on actual data patterns. +""" + +import yaml +from pathlib import Path +from collections import defaultdict +from typing import Dict, List, Any, Set +import json + + +class PropertyAnalyzer: + def __init__(self): + self.projects = ['AI_READI', 'CHORUS', 'CM4AI', 'VOICE'] + self.methods = ['curated', 'claudecode', 'claudecode_agent'] + self.property_stats = defaultdict(lambda: { + 'count': 0, + 'files': [], + 'example_values': [], + 'is_uniform': None, + 'is_list': False, + 'is_complex': False + }) + + def analyze_all_files(self): + """Analyze all D4D YAML files.""" + print("🔍 Analyzing existing D4D YAML files...\n") + + for project in self.projects: + for method in self.methods: + file_path = Path(f'data/d4d_concatenated/{method}/{project}_d4d.yaml') + if file_path.exists(): + self.analyze_file(file_path, project, method) + else: + print(f"⚠️ File not found: {file_path}") + + print(f"\n✅ Analyzed {len([f for stats in self.property_stats.values() for f in stats['files']])} files") + print(f"✅ Found {len(self.property_stats)} unique properties\n") + + def analyze_file(self, file_path: Path, project: str, method: str): + """Analyze a single D4D YAML file.""" + print(f"📄 Analyzing: {project} ({method})") + + with open(file_path) as f: + data = yaml.safe_load(f) + + # Flatten and analyze all properties + self._analyze_dict(data, '', f"{project}_{method}") + + def _analyze_dict(self, data: Any, prefix: str, file_id: str): + """Recursively analyze dictionary structure.""" + if not isinstance(data, dict): + return + + for key, value in data.items(): + prop_path = f"{prefix}.{key}" if prefix else key + + # Skip metadata properties + if key in ['@context', '@type', '@id']: + continue + + # Record property stats + stats = self.property_stats[prop_path] + stats['count'] += 1 + stats['files'].append(file_id) + + # Analyze value type + if value is not None: + if isinstance(value, list): + stats['is_list'] = True + if len(value) > 0: + stats['example_values'].append(f"List[{len(value)} items]") + if isinstance(value[0], dict): + stats['is_complex'] = True + elif isinstance(value, dict): + stats['is_complex'] = True + stats['example_values'].append("Complex object") + # Recurse into nested objects + self._analyze_dict(value, prop_path, file_id) + else: + # Store first 3 example values + if len(stats['example_values']) < 3: + stats['example_values'].append(str(value)[:100]) + + def categorize_properties(self) -> Dict[str, List[str]]: + """Categorize properties into Dataset vs FileCollection based on patterns.""" + categories = { + 'file_technical': [], # bytes, format, encoding, etc. + 'file_distribution': [], # distribution_formats, distribution_dates + 'file_collection': [], # instances, data_collectors, collection_timeframes + 'file_preprocessing': [], # preprocessing_strategies, cleaning_strategies + 'file_variables': [], # variables (field-level metadata) + 'file_content': [], # confidential_elements, content_warnings + 'dataset_motivation': [], # purposes, tasks, creators, funders + 'dataset_ethics': [], # ethical_reviews, license_and_use_terms + 'dataset_uses': [], # intended_uses, prohibited_uses + 'dataset_maintenance': [], # maintainers, updates, retention_limit + 'dataset_identity': [], # id, name, title, description, doi + 'dataset_relationships': [], # parent_datasets, related_datasets + 'uncertain': [] # Needs review + } + + # Technical metadata patterns + tech_keywords = ['bytes', 'format', 'encoding', 'hash', 'md5', 'sha256', + 'media_type', 'path', 'download', 'content', 'compression', 'dialect'] + + # Distribution patterns + dist_keywords = ['distribution_format', 'distribution_date'] + + # Collection patterns + collection_keywords = ['instance', 'data_collector', 'collection_timeframe', + 'raw_data_source', 'acquisition_method', 'collection_mechanism', + 'sampling_strategy', 'missing_data', 'direct_collection'] + + # Preprocessing patterns + preproc_keywords = ['preprocessing', 'cleaning', 'labeling', 'raw_source', + 'imputation', 'annotation', 'machine_annotation'] + + # Variable patterns + var_keywords = ['variable'] + + # Content/sensitivity patterns + content_keywords = ['confidential', 'content_warning', 'subpopulation', + 'sensitive_element', 'is_deidentified', 'anomal', 'bias', 'limitation'] + + # Dataset-level patterns + motivation_keywords = ['purpose', 'task', 'addressing_gap', 'creator', 'funder', 'grant'] + ethics_keywords = ['ethical_review', 'data_protection', 'human_subject', + 'informed_consent', 'at_risk', 'participant', 'license', + 'regulatory', 'ip_restriction'] + uses_keywords = ['existing_use', 'use_repository', 'other_task', 'future_use', + 'discouraged_use', 'intended_use', 'prohibited_use'] + maint_keywords = ['maintainer', 'erratum', 'update', 'retention_limit', + 'version_access', 'extension_mechanism'] + identity_keywords = ['id', 'name', 'title', 'description', 'keyword', + 'doi', 'citation', 'issued', 'publisher', 'version', + 'created_on', 'last_updated', 'language', 'status'] + rel_keywords = ['parent_dataset', 'related_dataset', 'external_resource', + 'was_derived_from', 'same_as'] + + for prop, stats in self.property_stats.items(): + prop_lower = prop.lower() + + # Categorize based on keywords + if any(kw in prop_lower for kw in tech_keywords): + categories['file_technical'].append(prop) + elif any(kw in prop_lower for kw in dist_keywords): + categories['file_distribution'].append(prop) + elif any(kw in prop_lower for kw in collection_keywords): + categories['file_collection'].append(prop) + elif any(kw in prop_lower for kw in preproc_keywords): + categories['file_preprocessing'].append(prop) + elif any(kw in prop_lower for kw in var_keywords): + categories['file_variables'].append(prop) + elif any(kw in prop_lower for kw in content_keywords): + categories['file_content'].append(prop) + elif any(kw in prop_lower for kw in motivation_keywords): + categories['dataset_motivation'].append(prop) + elif any(kw in prop_lower for kw in ethics_keywords): + categories['dataset_ethics'].append(prop) + elif any(kw in prop_lower for kw in uses_keywords): + categories['dataset_uses'].append(prop) + elif any(kw in prop_lower for kw in maint_keywords): + categories['dataset_maintenance'].append(prop) + elif any(kw in prop_lower for kw in identity_keywords): + categories['dataset_identity'].append(prop) + elif any(kw in prop_lower for kw in rel_keywords): + categories['dataset_relationships'].append(prop) + else: + categories['uncertain'].append(prop) + + return categories + + def generate_report(self, output_path: str = '/tmp/property_distribution_analysis.md'): + """Generate comprehensive analysis report.""" + categories = self.categorize_properties() + + # Calculate totals + file_level_count = sum(len(props) for cat, props in categories.items() + if cat.startswith('file_')) + dataset_level_count = sum(len(props) for cat, props in categories.items() + if cat.startswith('dataset_')) + uncertain_count = len(categories['uncertain']) + + report = [] + report.append("# Property Distribution Analysis for FileCollection\n") + report.append("## Executive Summary\n") + report.append(f"- **Total Properties Analyzed**: {len(self.property_stats)}") + report.append(f"- **Recommended for FileCollection**: {file_level_count} properties") + report.append(f"- **Recommended for Dataset**: {dataset_level_count} properties") + report.append(f"- **Needs Review**: {uncertain_count} properties\n") + + report.append("## Analysis Methodology\n") + report.append("Analyzed existing D4D YAML files across:") + report.append(f"- Projects: {', '.join(self.projects)}") + report.append(f"- Methods: {', '.join(self.methods)}") + report.append("- Total files analyzed: " + str(len(set(f for stats in self.property_stats.values() for f in stats['files'])))) + report.append("") + + report.append("## Recommended Property Distribution\n") + + # FileCollection properties + report.append("### Properties for FileCollection\n") + report.append("These properties describe file-level characteristics and should move to FileCollection:\n") + + file_categories = { + 'file_technical': 'File Technical Metadata', + 'file_distribution': 'Distribution Properties', + 'file_collection': 'Collection-Specific Metadata', + 'file_preprocessing': 'Preprocessing Properties', + 'file_variables': 'Variable/Field Metadata', + 'file_content': 'Content Characteristics' + } + + for cat_key, cat_name in file_categories.items(): + if categories[cat_key]: + report.append(f"\n#### {cat_name} ({len(categories[cat_key])} properties)") + for prop in sorted(categories[cat_key]): + stats = self.property_stats[prop] + report.append(f"- `{prop}` (in {stats['count']} files)") + if stats['example_values']: + report.append(f" - Examples: {', '.join(stats['example_values'][:2])}") + + # Dataset properties + report.append("\n### Properties for Dataset\n") + report.append("These properties describe dataset-level characteristics and should remain at Dataset:\n") + + dataset_categories = { + 'dataset_motivation': 'Motivation & Purpose', + 'dataset_ethics': 'Ethics & Governance', + 'dataset_uses': 'Uses & Impact', + 'dataset_maintenance': 'Maintenance & Versioning', + 'dataset_identity': 'Identity & Metadata', + 'dataset_relationships': 'Relationships' + } + + for cat_key, cat_name in dataset_categories.items(): + if categories[cat_key]: + report.append(f"\n#### {cat_name} ({len(categories[cat_key])} properties)") + for prop in sorted(categories[cat_key]): + stats = self.property_stats[prop] + report.append(f"- `{prop}` (in {stats['count']} files)") + + # Uncertain properties + if categories['uncertain']: + report.append("\n### Properties Needing Review\n") + report.append("These properties don't clearly fit existing patterns:\n") + for prop in sorted(categories['uncertain']): + stats = self.property_stats[prop] + report.append(f"- `{prop}` (in {stats['count']} files)") + if stats['example_values']: + report.append(f" - Examples: {', '.join(stats['example_values'][:2])}") + + # Property details table + report.append("\n## Detailed Property Statistics\n") + report.append("| Property | Files | Type | Examples |") + report.append("|----------|-------|------|----------|") + + for prop in sorted(self.property_stats.keys()): + stats = self.property_stats[prop] + prop_type = "List" if stats['is_list'] else "Complex" if stats['is_complex'] else "Simple" + examples = ', '.join(stats['example_values'][:2]) if stats['example_values'] else 'N/A' + report.append(f"| `{prop}` | {stats['count']} | {prop_type} | {examples[:50]} |") + + # Write report + report_text = '\n'.join(report) + with open(output_path, 'w') as f: + f.write(report_text) + + print(f"\n✅ Report generated: {output_path}") + return report_text + + +def main(): + analyzer = PropertyAnalyzer() + analyzer.analyze_all_files() + report = analyzer.generate_report() + + # Print summary + print("\n" + "="*80) + print("PROPERTY DISTRIBUTION SUMMARY") + print("="*80) + lines = report.split('\n') + for line in lines[:20]: # Print first 20 lines + print(line) + print("\n... (see full report at /tmp/property_distribution_analysis.md)") + + +if __name__ == '__main__': + main() diff --git a/check_description_quality.py b/check_description_quality.py new file mode 100644 index 00000000..1457b41d --- /dev/null +++ b/check_description_quality.py @@ -0,0 +1,378 @@ +#!/usr/bin/env python3 +""" +Check quality of descriptions in D4D schema module files. + +Analyzes descriptions against style guide criteria and generates +a prioritized report of quality issues with specific recommendations. +""" + +import yaml +import json +import argparse +from pathlib import Path +from collections import defaultdict +from typing import Dict, List, Tuple, Optional + +# Quality thresholds from style guide +BREVITY_THRESHOLD = 5 # words +TARGET_WITH_EXAMPLES = { + 'attribute': 0.40, # 40% of attributes should have examples + 'slot': 0.20, # 20% of slots should have examples +} + +def count_words(text: str) -> int: + """Count words in a description.""" + if not text: + return 0 + return len(text.split()) + +def has_example(text: str) -> bool: + """Check if description contains an example.""" + if not text: + return False + # Only consider explicit example markers, not any parenthesis + indicators = ['e.g.', 'for example', 'such as', 'for instance'] + text_lower = text.lower() + return any(indicator in text_lower for indicator in indicators) + +def is_complete_sentence(text: str) -> bool: + """Check if text appears to be a complete sentence.""" + if not text: + return False + # Must end with period, question mark, or be very short (enum values) + return text.rstrip().endswith(('.', '?', '!')) or count_words(text) <= 5 + +def check_description_quality(description: str, element_type: str, element_name: str) -> List[str]: + """ + Check description quality against style guide criteria. + + Returns list of issue codes: + - TOO_BRIEF: Less than 5 words + - MISSING_PERIOD: Attributes/slots should end with period + - CONSIDER_EXAMPLE: Attributes should have examples + """ + issues = [] + + if not description: + return ['MISSING'] # Shouldn't happen, but check anyway + + word_count = count_words(description) + + # Check brevity (applies to all types) + if word_count < BREVITY_THRESHOLD: + issues.append('TOO_BRIEF') + + # Attributes and slots should always end with a period + if element_type in ['attribute', 'slot']: + if not description.rstrip().endswith('.'): + issues.append('MISSING_PERIOD') + + # Check for examples in attributes + if element_type == 'attribute': + if not has_example(description): + # Only suggest example if description is substantial enough + if word_count >= BREVITY_THRESHOLD: + issues.append('CONSIDER_EXAMPLE') + + return issues + +def analyze_module(module_path: Path) -> Dict: + """Analyze a single module for description quality.""" + with open(module_path, 'r') as f: + data = yaml.safe_load(f) + + module_name = module_path.stem + results = { + 'module': module_name, + 'issues': [], + 'stats': defaultdict(int), + 'quality_metrics': {} + } + + # Track elements for metrics + elements_by_type = defaultdict(list) + + # Check module-level description + if data.get('description'): + elements_by_type['module'].append({ + 'name': 'module', + 'description': data['description'], + 'line': 1 + }) + + # Check classes + classes = data.get('classes', {}) + for class_name, class_def in classes.items(): + if not class_def: + continue + + desc = class_def.get('description', '') + elements_by_type['class'].append({ + 'name': class_name, + 'description': desc, + 'line': None + }) + + issues = check_description_quality(desc, 'class', class_name) + for issue in issues: + results['issues'].append({ + 'type': issue, + 'element_type': 'class', + 'element_name': class_name, + 'location': f"{module_name}::classes::{class_name}", + 'description': desc, + 'priority': 'HIGH' if issue == 'TOO_BRIEF' else 'MEDIUM' + }) + + # Check attributes + attributes = class_def.get('attributes', {}) + for attr_name, attr_def in attributes.items(): + if not attr_def: + continue + + desc = attr_def.get('description', '') + elements_by_type['attribute'].append({ + 'name': attr_name, + 'description': desc, + 'line': None + }) + + issues = check_description_quality(desc, 'attribute', attr_name) + for issue in issues: + results['issues'].append({ + 'type': issue, + 'element_type': 'attribute', + 'element_name': f"{class_name}.{attr_name}", + 'location': f"{module_name}::classes::{class_name}::attributes::{attr_name}", + 'description': desc, + 'priority': 'HIGH' if issue == 'TOO_BRIEF' else 'LOW' + }) + + # Check slots + slots = data.get('slots', {}) + for slot_name, slot_def in slots.items(): + if not slot_def: + continue + + desc = slot_def.get('description', '') + elements_by_type['slot'].append({ + 'name': slot_name, + 'description': desc, + 'line': None + }) + + issues = check_description_quality(desc, 'slot', slot_name) + for issue in issues: + results['issues'].append({ + 'type': issue, + 'element_type': 'slot', + 'element_name': slot_name, + 'location': f"{module_name}::slots::{slot_name}", + 'description': desc, + 'priority': 'HIGH' if issue == 'TOO_BRIEF' else 'MEDIUM' + }) + + # Check enums + enums = data.get('enums', {}) + for enum_name, enum_def in enums.items(): + if not enum_def: + continue + + desc = enum_def.get('description', '') + elements_by_type['enum'].append({ + 'name': enum_name, + 'description': desc, + 'line': None + }) + + issues = check_description_quality(desc, 'enum', enum_name) + for issue in issues: + results['issues'].append({ + 'type': issue, + 'element_type': 'enum', + 'element_name': enum_name, + 'location': f"{module_name}::enums::{enum_name}", + 'description': desc, + 'priority': 'MEDIUM' + }) + + # Check permissible values + permissible_values = enum_def.get('permissible_values', {}) + for pv_name, pv_def in permissible_values.items(): + if not pv_def: + continue + + desc = pv_def.get('description', '') + elements_by_type['enum_value'].append({ + 'name': pv_name, + 'description': desc, + 'line': None + }) + + issues = check_description_quality(desc, 'enum_value', pv_name) + # Enum values can be brief, so only flag severe issues + for issue in issues: + if issue == 'TOO_BRIEF' and count_words(desc) <= 2: + results['issues'].append({ + 'type': issue, + 'element_type': 'enum_value', + 'element_name': f"{enum_name}.{pv_name}", + 'location': f"{module_name}::enums::{enum_name}::permissible_values::{pv_name}", + 'description': desc, + 'priority': 'LOW' + }) + + # Calculate quality metrics + for elem_type, elements in elements_by_type.items(): + total = len(elements) + if total == 0: + continue + + with_examples = sum(1 for e in elements if has_example(e['description'])) + complete_sentences = sum(1 for e in elements if is_complete_sentence(e['description'])) + brief = sum(1 for e in elements if count_words(e['description']) < BREVITY_THRESHOLD) + + results['quality_metrics'][elem_type] = { + 'total': total, + 'with_examples': with_examples, + 'with_examples_pct': (with_examples / total * 100) if total > 0 else 0, + 'complete_sentences': complete_sentences, + 'complete_sentences_pct': (complete_sentences / total * 100) if total > 0 else 0, + 'too_brief': brief, + 'too_brief_pct': (brief / total * 100) if total > 0 else 0, + } + + return results + +def generate_report(all_results: List[Dict], output_format: str = 'text') -> str: + """Generate quality report in specified format.""" + + # Aggregate statistics + total_issues = sum(len(r['issues']) for r in all_results) + issues_by_type = defaultdict(int) + issues_by_priority = defaultdict(int) + + for result in all_results: + for issue in result['issues']: + issues_by_type[issue['type']] += 1 + issues_by_priority[issue['priority']] += 1 + + if output_format == 'json': + return json.dumps({ + 'summary': { + 'total_issues': total_issues, + 'by_type': dict(issues_by_type), + 'by_priority': dict(issues_by_priority) + }, + 'modules': all_results + }, indent=2) + + # Text format + lines = [] + lines.append("=" * 80) + lines.append("D4D SCHEMA DESCRIPTION QUALITY REPORT") + lines.append("=" * 80) + lines.append("") + + # Summary + lines.append("SUMMARY") + lines.append("-" * 80) + lines.append(f"Total issues found: {total_issues}") + lines.append("") + + lines.append("By Issue Type:") + for issue_type in sorted(issues_by_type.keys()): + count = issues_by_type[issue_type] + lines.append(f" {issue_type}: {count}") + lines.append("") + + lines.append("By Priority:") + for priority in ['HIGH', 'MEDIUM', 'LOW']: + count = issues_by_priority.get(priority, 0) + lines.append(f" {priority}: {count}") + lines.append("") + + # Quality metrics by module + lines.append("QUALITY METRICS BY MODULE") + lines.append("-" * 80) + for result in all_results: + lines.append(f"\n{result['module']}:") + for elem_type, metrics in result.get('quality_metrics', {}).items(): + lines.append(f" {elem_type}:") + lines.append(f" Total: {metrics['total']}") + lines.append(f" Complete sentences: {metrics['complete_sentences']} ({metrics['complete_sentences_pct']:.1f}%)") + lines.append(f" With examples: {metrics['with_examples']} ({metrics['with_examples_pct']:.1f}%)") + lines.append(f" Too brief: {metrics['too_brief']} ({metrics['too_brief_pct']:.1f}%)") + lines.append("") + + # Detailed issues by priority + lines.append("DETAILED ISSUES") + lines.append("-" * 80) + + for priority in ['HIGH', 'MEDIUM', 'LOW']: + priority_issues = [] + for result in all_results: + priority_issues.extend([i for i in result['issues'] if i['priority'] == priority]) + + if priority_issues: + lines.append(f"\n{priority} PRIORITY ({len(priority_issues)} issues):") + lines.append("-" * 80) + + # Group by type + by_type = defaultdict(list) + for issue in priority_issues: + by_type[issue['type']].append(issue) + + for issue_type, issues in sorted(by_type.items()): + lines.append(f"\n {issue_type} ({len(issues)} occurrences):") + for issue in issues[:10]: # Limit to first 10 per type + lines.append(f" • {issue['location']}") + if issue.get('description'): + desc_preview = issue['description'][:60] + if len(issue['description']) > 60: + desc_preview += "..." + lines.append(f" Current: \"{desc_preview}\"") + if len(issues) > 10: + lines.append(f" ... and {len(issues) - 10} more") + + lines.append("") + lines.append("=" * 80) + + return "\n".join(lines) + +def main(): + parser = argparse.ArgumentParser(description='Check D4D schema description quality') + parser.add_argument('--report', help='Output report to JSON file') + parser.add_argument('--format', choices=['text', 'json'], default='text', + help='Output format (default: text)') + args = parser.parse_args() + + schema_dir = Path('src/data_sheets_schema/schema') + d4d_modules = sorted(schema_dir.glob('D4D_*.yaml')) + + all_results = [] + for module_path in d4d_modules: + results = analyze_module(module_path) + all_results.append(results) + + # Generate report + report = generate_report(all_results, output_format=args.format) + + # Output report + if args.report: + with open(args.report, 'w') as f: + f.write(report) + print(f"Report saved to: {args.report}") + else: + print(report) + + # Exit with error code if high-priority issues found + high_priority_count = sum( + len([i for i in r['issues'] if i['priority'] == 'HIGH']) + for r in all_results + ) + + return 0 if high_priority_count == 0 else 1 + +if __name__ == '__main__': + exit(main()) diff --git a/check_missing_descriptions.py b/check_missing_descriptions.py new file mode 100644 index 00000000..825662ce --- /dev/null +++ b/check_missing_descriptions.py @@ -0,0 +1,91 @@ +#!/usr/bin/env python3 +""" +Check for missing descriptions in D4D schema module files. +""" + +import yaml +from pathlib import Path + +def check_module(module_path): + """Check a single module for missing descriptions.""" + with open(module_path, 'r') as f: + data = yaml.safe_load(f) + + issues = [] + module_name = module_path.stem + + # Check module-level description + if not data.get('description'): + issues.append(f"Module missing description") + + # Check classes + classes = data.get('classes', {}) + for class_name, class_def in classes.items(): + if not class_def.get('description'): + issues.append(f"Class '{class_name}' missing description") + + # Check attributes/slots + attributes = class_def.get('attributes', {}) + for attr_name, attr_def in attributes.items(): + if attr_def is None or not attr_def.get('description'): + issues.append(f"Class '{class_name}', attribute '{attr_name}' missing description") + + # Check slots + slots = data.get('slots', {}) + for slot_name, slot_def in slots.items(): + if slot_def is None or not slot_def.get('description'): + issues.append(f"Slot '{slot_name}' missing description") + + # Check enums + enums = data.get('enums', {}) + for enum_name, enum_def in enums.items(): + if enum_def is None or not enum_def.get('description'): + issues.append(f"Enum '{enum_name}' missing description") + + # Check permissible values + if enum_def: + permissible_values = enum_def.get('permissible_values', {}) + for pv_name, pv_def in permissible_values.items(): + if pv_def is None or not pv_def.get('description'): + issues.append(f"Enum '{enum_name}', value '{pv_name}' missing description") + + return module_name, issues + +def main(): + schema_dir = Path('src/data_sheets_schema/schema') + d4d_modules = sorted(schema_dir.glob('D4D_*.yaml')) + + all_issues = {} + total_issues = 0 + + for module_path in d4d_modules: + module_name, issues = check_module(module_path) + if issues: + all_issues[module_name] = issues + total_issues += len(issues) + + # Print summary + print("=" * 80) + print("MISSING DESCRIPTIONS REPORT") + print("=" * 80) + print() + + if not all_issues: + print("✅ No missing descriptions found!") + else: + print(f"Found {total_issues} missing descriptions across {len(all_issues)} modules:\n") + + for module_name in sorted(all_issues.keys()): + issues = all_issues[module_name] + print(f"\n{module_name} ({len(issues)} issues):") + print("-" * 80) + for issue in issues: + print(f" • {issue}") + + print() + print("=" * 80) + return total_issues + +if __name__ == '__main__': + total = main() + exit(0 if total == 0 else 1) diff --git a/data/mappings/d4d_rocrate_sssom_comprehensive.tsv b/data/mappings/d4d_rocrate_sssom_comprehensive.tsv index da06ca69..dc041d0c 100644 --- a/data/mappings/d4d_rocrate_sssom_comprehensive.tsv +++ b/data/mappings/d4d_rocrate_sssom_comprehensive.tsv @@ -1,352 +1,331 @@ # Comprehensive SSSOM Mapping - ALL D4D Attributes # Includes mapped, recommended, novel, free text, and unmapped attributes -# Date: 2026-03-19T23:43:47.985143 -# Total attributes: 270 +# Date: 2026-04-09T10:17:24.818193 +# Total attributes: 284 # # Status breakdown: -# free_text: 54 -# mapped: 67 -# novel_d4d: 42 +# free_text: 55 +# mapped: 63 +# novel_d4d: 45 # recommended: 69 -# unmapped: 38 +# unmapped: 52 # -# d4d_module: D4D schema module containing this attribute -# -d4d_schema_path subject_id subject_label d4d_module predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version mapping_status d4d_description d4d_module -Dataset.access_details d4d:access_details Access Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Information on how to access or retrieve the raw source data. -" Unknown -Dataset.access_url d4d:access_url Access Url Unknown skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the raw data. Unknown -Dataset.access_urls d4d:access_urls Access Urls Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details of the distribution channel(s) or format(s). Unknown -Dataset.acquisition_details d4d:acquisition_details Acquisition Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how data was acquired for each instance. -" Unknown -Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Collection -Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['d4d:addressing_gaps'] d4d:addressing_gaps addressing_gaps semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Motivation -Dataset.affected_subsets d4d:affected_subsets Affected Subsets Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Specific subsets or features of the dataset affected by this bias. -" Unknown -Dataset.affiliation d4d:affiliation Affiliation Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The organization(s) to which the person belongs in the context of this dataset. May vary across data... Unknown -Dataset.affiliations d4d:affiliations Affiliations Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Organizations with which the creator or team is affiliated. Unknown -Dataset.agreement_metric d4d:agreement_metric Agreement Metric Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreem... Unknown -Dataset.analysis_method d4d:analysis_method Analysis Method Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Methodology used to assess annotation quality and resolve disagreements. -" Unknown -Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['d4d:annotation_analyses'] d4d:annotation_analyses annotation_analyses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Analysis of annotation quality and inter-annotator agreement. D4D_Preprocessing -Dataset.annotation_quality_details d4d:annotation_quality_details Annotation Quality Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Additional details on annotation quality assessment and findings. -" Unknown -Dataset.annotations_per_item d4d:annotations_per_item Annotations Per Item Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Number of annotations collected per data item. Multiple annotations per item enable calculation of i... Unknown -Dataset.annotator_demographics d4d:annotator_demographics Annotator Demographics Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Demographic information about annotators, if available and relevant (e.g., geographic location, lang... Unknown -Dataset.anomalies d4d:anomalies Anomalies D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:anomalies'] d4d:anomalies anomalies semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Composition -Dataset.anomaly_details d4d:anomaly_details Anomaly Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on errors, noise sources, or redundancies in the dataset. -" Unknown -Dataset.anonymization_method d4d:anonymization_method Anonymization Method Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text What methods were used to anonymize or de-identify participant data? Include technical details of pr... Unknown -Dataset.archival d4d:archival Archival Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indication whether official archival versions of external resources are included. -" Unknown -Dataset.assent_procedures d4d:assent_procedures Assent Procedures Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For research involving minors, what assent procedures were used? How was developmentally appropriate... Unknown -Dataset.bias_description d4d:bias_description Bias Description Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of how this bias manifests in the dataset, including affected populations, feat... Unknown -Dataset.bias_type d4d:bias_type Bias Type Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of bias identified, using standardized categories from the Artificial Intelligence Ontology... Unknown -Dataset.bytes d4d:bytes Bytes D4D_Base skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Size of the data in bytes. D4D_Base -Dataset.categories d4d:categories Categories Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The permitted categories or values for a categorical variable. Each entry should describe a possible... Unknown -Dataset.citation d4d:citation Citation D4D_Base skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Recommended citation for this dataset in DataCite or BibTeX format. Provides a standard way to cite ... D4D_Base -Dataset.cleaning_details d4d:cleaning_details Cleaning Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on data cleaning procedures applied. -" Unknown -Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['d4d:cleaning_strategies'] d4d:cleaning_strategies cleaning_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Preprocessing -Dataset.collection_details d4d:collection_details Collection Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on direct vs. indirect collection methods and sources. -" Unknown -Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Collection -Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes D4D_Collection skos:exactMatch @graph[?@type='Dataset']['d4d:dataCollectionTimeframe'] d4d:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Collection -Dataset.collector_details d4d:collector_details Collector Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on who collected the data and their compensation. -" Unknown -Dataset.comment_prefix d4d:comment_prefix Comment Prefix Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Unknown -Dataset.compensation_amount d4d:compensation_amount Compensation Amount Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_amount'] d4d:compensation_amount compensation_amount semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "What was the amount or value of compensation provided? Include currency or equivalent value. -" Unknown -Dataset.compensation_provided d4d:compensation_provided Compensation Provided Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_provided'] d4d:compensation_provided compensation_provided semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Were participants compensated for their participation? Unknown -Dataset.compensation_rationale d4d:compensation_rationale Compensation Rationale Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_rationale'] d4d:compensation_rationale compensation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What was the rationale for the compensation structure? How was the amount determined to be appropria... Unknown -Dataset.compensation_type d4d:compensation_type Compensation Type Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_type'] d4d:compensation_type compensation_type semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other inc... Unknown -Dataset.compression d4d:compression Compression Unknown skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped compression format used, if any. e.g., gzip, bzip2, zip Unknown -Dataset.confidential_elements d4d:confidential_elements Confidential Elements D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements'] d4d:confidential_elements confidential_elements semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Composition -Dataset.confidential_elements_present d4d:confidential_elements_present Confidential Elements Present Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements_present'] d4d:confidential_elements_present confidential_elements_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any confidential data elements are present. Unknown -Dataset.confidentiality_details d4d:confidentiality_details Confidentiality Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on confidential data elements and handling procedures. -" Unknown -Dataset.confidentiality_level d4d:confidentiality_level Confidentiality Level Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:confidentiality_level'] d4d:confidentiality_level confidentiality_level semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidentiality classification of the dataset indicating level of access restrictions and sensitivit... Unknown -Dataset.conforms_to d4d:conforms_to Conforms To Unknown skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class Unknown skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema Unknown skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.consent_details d4d:consent_details Consent Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how consent was requested, provided, and documented. -" Unknown -Dataset.consent_documentation d4d:consent_documentation Consent Documentation Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "How is consent documented? Include references to consent forms or procedures used. -" Unknown -Dataset.consent_obtained d4d:consent_obtained Consent Obtained Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was informed consent obtained from all participants? Unknown -Dataset.consent_scope d4d:consent_scope Consent Scope Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "What specific uses did participants consent to? Are there limitations on data use based on consent? -" Unknown -Dataset.consent_type d4d:consent_type Consent Type Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)... Unknown -Dataset.contact_person d4d:contact_person Contact Person Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:contact_person'] d4d:contact_person contact_person semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for questions about ethical review. Provides structured contact information including... Unknown -Dataset.content_warnings d4d:content_warnings Content Warnings D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings'] d4d:content_warnings content_warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Composition -Dataset.content_warnings_present d4d:content_warnings_present Content Warnings Present Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings_present'] d4d:content_warnings_present content_warnings_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any content warnings are needed. Unknown -Dataset.contribution_url d4d:contribution_url Contribution Url Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL for contribution guidelines or process. Unknown -Dataset.counts d4d:counts Counts Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "How many instances are there in total (of each type, if appropriate)? -" Unknown -Dataset.created_by d4d:created_by Created By Unknown skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.created_on d4d:created_on Created On Unknown skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.creators d4d:creators Creators D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Motivation -Dataset.credit_roles d4d:credit_roles Credit Roles Unknown skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or cr... Unknown -Dataset.data_annotation_platform d4d:data_annotation_platform Data Annotation Platform Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom an... Unknown -Dataset.data_annotation_protocol d4d:data_annotation_protocol Data Annotation Protocol Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:data_annotation_protocol'] d4d:data_annotation_protocol data_annotation_protocol semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guideline... Unknown -Dataset.data_collectors d4d:data_collectors Data Collectors D4D_Collection skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Collection -Dataset.data_linkage d4d:data_linkage Data Linkage Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Can this dataset be linked to other datasets in ways that might compromise participant privacy? -" Unknown -Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['d4d:data_protection_impacts'] d4d:data_protection_impacts data_protection_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Ethics -Dataset.data_substrate d4d:data_substrate Data Substrate Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Type of data (e.g., raw text, images) from Bridge2AI standards. -" Unknown -Dataset.data_topic d4d:data_topic Data Topic Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "General topic of each instance (e.g., from Bridge2AI standards). -" Unknown -Dataset.data_type d4d:data_type Data Type Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The data type of the variable (e.g., integer, float, string, boolean, date, categorical). Use standa... Unknown -Dataset.data_use_permission d4d:data_use_permission Data Use Permission Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., g... Unknown -Dataset.deidentification_details d4d:deidentification_details Deidentification Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on de-identification procedures and residual risks. -" Unknown -Dataset.delimiter d4d:delimiter Delimiter Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Unknown -Dataset.derivation d4d:derivation Derivation Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of how this variable was derived or calculated from other variables, if applicable. Unknown -Dataset.description d4d:description Description Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A human-readable description for a thing. Unknown -Dataset.dialect d4d:dialect Dialect D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile). D4D_Base -Dataset.disagreement_patterns d4d:disagreement_patterns Disagreement Patterns Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Systematic patterns in annotator disagreements (e.g., by demographic group, annotation difficulty, t... Unknown -Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Uses -Dataset.discouragement_details d4d:discouragement_details Discouragement Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on tasks for which the dataset should not be used. -" Unknown -Dataset.distribution d4d:distribution Distribution Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Unknown -Dataset.distribution_dates d4d:distribution_dates Distribution Dates D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Distribution -Dataset.distribution_formats d4d:distribution_formats Distribution Formats D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Distribution -Dataset.doi d4d:doi Doi Unknown skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped digital object identifier Unknown -Dataset.double_quote d4d:double_quote Double Quote Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Unknown -Dataset.download_url d4d:download_url Download Url Unknown skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped URL from which the data can be downloaded. This is not the same as the landing page, which is a page... Unknown -Dataset.email d4d:email Email Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The email address of the person. Represents current/preferred contact information in the context of ... Unknown -Dataset.encoding d4d:encoding Encoding D4D_Base skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped the character encoding of the data D4D_Base -Dataset.end_date d4d:end_date End Date Unknown skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended End date of data collection Unknown -Dataset.errata d4d:errata Errata D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['d4d:errata'] d4d:errata errata semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Maintenance -Dataset.erratum_details d4d:erratum_details Erratum Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on any errata or corrections to the dataset. -" Unknown -Dataset.erratum_url d4d:erratum_url Erratum Url Unknown skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the erratum. Unknown -Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['d4d:ethical_reviews'] d4d:ethical_reviews ethical_reviews semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Ethics -Dataset.ethics_review_board d4d:ethics_review_board Ethics Review Board Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "What ethics review board(s) reviewed this research? Include institution names and approval details. -" Unknown -Dataset.examples d4d:examples Examples Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of examples of known/previous uses of the dataset. Unknown -Dataset.existing_uses d4d:existing_uses Existing Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Uses -Dataset.extension_details d4d:extension_details Extension Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on extension mechanisms, contribution validation, and communication. -" Unknown -Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism D4D_Maintenance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Maintenance -Dataset.external_resources d4d:external_resources External Resources D4D_Base skos:closeMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Links or identifiers for external resources. Can be used either as a list of ExternalResource object... D4D_Base -Dataset.format d4d:format Format D4D_Base skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file format, physical medium, or dimensions of a resource. This should be a file extension or MI... D4D_Base -Dataset.frequency d4d:frequency Frequency Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How often updates are planned (e.g., quarterly, annually). Unknown -Dataset.funders d4d:funders Funders D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Motivation -Dataset.future_guarantees d4d:future_guarantees Future Guarantees Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of any commitments that external resources will remain available and stable over time. -" Unknown -Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts D4D_Uses skos:exactMatch @graph[?@type='Dataset']['d4d:future_use_impacts'] d4d:future_use_impacts future_use_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Uses -Dataset.governance_committee_contact d4d:governance_committee_contact Governance Committee Contact Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:governance_committee_contact'] d4d:governance_committee_contact governance_committee_contact semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for data governance committee. This person can answer questions about data governance... Unknown -Dataset.grant_number d4d:grant_number Grant Number Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The alphanumeric identifier for the grant. Unknown -Dataset.grantor d4d:grantor Grantor Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Name/identifier of the organization providing monetary or resource support. Unknown -Dataset.grants d4d:grants Grants Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset. Unknown -Dataset.guardian_consent d4d:guardian_consent Guardian Consent Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For participants unable to provide their own consent, how was guardian or surrogate consent obtained... Unknown -Dataset.handling_strategy d4d:handling_strategy Handling Strategy Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:handling_strategy'] d4d:handling_strategy handling_strategy semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation). -" Unknown -Dataset.hash d4d:hash Hash D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped hash of the data D4D_Base -Dataset.header d4d:header Header Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Unknown -Dataset.hipaa_compliant d4d:hipaa_compliant Hipaa Compliant Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA app... Unknown -Dataset.human_subject_research d4d:human_subject_research Human Subject Research D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about whether dataset involves human subjects research, including IRB approval, ethics r... D4D_Human -Dataset.id d4d:id Id Unknown skos:exactMatch @graph[?@type='Dataset']['ID'] rdf:ID ID semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-comprehensive-v1 1.0 mapped A unique identifier for a thing. Unknown -Dataset.identifiable_elements_present d4d:identifiable_elements_present Identifiable Elements Present Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether data subjects can be identified. Unknown -Dataset.identification d4d:identification Identification Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Unknown -Dataset.identifiers_removed d4d:identifiers_removed Identifiers Removed Unknown skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended List of identifier types removed during de-identification. Unknown -Dataset.impact_details d4d:impact_details Impact Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on potential impacts, risks, and mitigation strategies. -" Unknown -Dataset.imputation_method d4d:imputation_method Imputation Method Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_method'] d4d:imputation_method imputation_method semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, ... Unknown -Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_protocols'] d4d:imputation_protocols imputation_protocols semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data imputation methodology and techniques. D4D_Preprocessing -Dataset.imputation_rationale d4d:imputation_rationale Imputation Rationale Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_rationale'] d4d:imputation_rationale imputation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Justification for the imputation approach chosen, including assumptions made about missing data mech... Unknown -Dataset.imputation_validation d4d:imputation_validation Imputation Validation Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_validation'] d4d:imputation_validation imputation_validation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Methods used to validate imputation quality (if any). -" Unknown -Dataset.imputed_fields d4d:imputed_fields Imputed Fields Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:imputed_fields'] d4d:imputed_fields imputed_fields semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Fields or columns where imputation was applied. -" Unknown -Dataset.informed_consent d4d:informed_consent Informed Consent D4D_Human semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details about informed consent procedures, including consent type, documentation, and withdrawal mec... D4D_Human -Dataset.instance_type d4d:instance_type Instance Type Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Multiple types of instances? (e.g., movies, users, and ratings). -" Unknown -Dataset.instances d4d:instances Instances D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Composition -Dataset.intended_uses d4d:intended_uses Intended Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['d4d:intended_uses'] d4d:intended_uses intended_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicit intended and recommended uses for this dataset. Complements future_use_impacts by focusing ... D4D_Uses -Dataset.inter_annotator_agreement d4d:inter_annotator_agreement Inter Annotator Agreement Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, p... Unknown -Dataset.inter_annotator_agreement_score d4d:inter_annotator_agreement_score Inter Annotator Agreement Score Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Measured agreement between annotators (e.g., Cohen's kappa value, Fleiss' kappa, Krippendorff's alph... Unknown -Dataset.involves_human_subjects d4d:involves_human_subjects Involves Human Subjects Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does this dataset involve human subjects research? Unknown -Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Data_Governance -Dataset.irb_approval d4d:irb_approval Irb Approval Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if a... Unknown -Dataset.is_data_split d4d:is_data_split Is Data Split D4D_Base semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a split of the larger dataset, e.g., is it a set for model training, testing, or vali... D4D_Base -Dataset.is_deidentified d4d:is_deidentified Is Deidentified D4D_Base skos:exactMatch @graph[?@type='Dataset']['d4d:is_deidentified'] d4d:is_deidentified is_deidentified semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Base -Dataset.is_direct d4d:is_direct Is Direct Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether collection was direct from individuals Unknown -Dataset.is_identifier d4d:is_identifier Is Identifier Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable serves as a unique identifier or key for records in the dataset. Unknown -Dataset.is_random d4d:is_random Is Random Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether the sample is random. Unknown -Dataset.is_representative d4d:is_representative Is Representative Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether the sample is representative of the larger set. -" Unknown -Dataset.is_sample d4d:is_sample Is Sample Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether it is a sample of a larger set. Unknown -Dataset.is_sensitive d4d:is_sensitive Is Sensitive Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable contains sensitive information (e.g., personal data, protected healt... Unknown -Dataset.is_shared d4d:is_shared Is Shared Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Boolean indicating whether the dataset is distributed to parties external to the dataset-creating en... Unknown -Dataset.is_subpopulation d4d:is_subpopulation Is Subpopulation D4D_Base semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a subpopulation of the larger dataset, e.g., is it a set of data for a specific demog... D4D_Base -Dataset.is_tabular d4d:is_tabular Is Tabular D4D_Base skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Base -Dataset.issued d4d:issued Issued Unknown skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.keywords d4d:keywords Keywords Unknown skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.known_biases d4d:known_biases Known Biases D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:known_biases'] d4d:known_biases known_biases semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known biases present in the dataset that may affect fairness, representativeness, or model performan... D4D_Composition -Dataset.known_limitations d4d:known_limitations Known Limitations D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:known_limitations'] d4d:known_limitations known_limitations semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known limitations of the dataset that may affect its use or interpretation. Distinct from biases (sy... D4D_Composition -Dataset.label d4d:label Label Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Is there a label or target associated with each instance? -" Unknown -Dataset.label_description d4d:label_description Label Description Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "If labeled, what pattern or format do labels follow? -" Unknown -Dataset.labeling_details d4d:labeling_details Labeling Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on labeling/annotation procedures and quality metrics. -" Unknown -Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['d4d:labeling_strategies'] d4d:labeling_strategies labeling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Preprocessing -Dataset.language d4d:language Language Unknown skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped language in which the information is expressed Unknown -Dataset.last_updated_on d4d:last_updated_on Last Updated On Unknown skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.latest_version_doi d4d:latest_version_doi Latest Version Doi Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended DOI or URL of the latest dataset version. Unknown -Dataset.license d4d:license License Unknown skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Data_Governance -Dataset.license_terms d4d:license_terms License Terms Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of the dataset's license and terms of use (including links, costs, or usage constraints)... Unknown -Dataset.limitation_description d4d:limitation_description Limitation Description Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Detailed description of the limitation and its implications. -" Unknown -Dataset.limitation_type d4d:limitation_type Limitation Type Unknown skos:closeMatch @graph[?@type='Dataset']['temporalCoverage'] schema:temporalCoverage temporalCoverage semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Category of limitation (e.g., scope, coverage, temporal, methodological). -" Unknown -Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Automated annotation tools used in dataset creation. D4D_Preprocessing -Dataset.maintainer_details d4d:maintainer_details Maintainer Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on who will support, host, or maintain the dataset. -" Unknown -Dataset.maintainers d4d:maintainers Maintainers D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Maintenance -Dataset.maximum_value d4d:maximum_value Maximum Value Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The maximum value that the variable can take. Applicable to numeric variables. Unknown -Dataset.md5 d4d:md5 Md5 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped md5 hash of the data D4D_Base -Dataset.measurement_technique d4d:measurement_technique Measurement Technique Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The technique or method used to measure this variable. Examples: ""mass spectrometry"", ""self-report s..." Unknown -Dataset.mechanism_details d4d:mechanism_details Mechanism Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on mechanisms or procedures used to collect the data. -" Unknown -Dataset.media_type d4d:media_type Media Type D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The media type of the data. This should be a MIME type. D4D_Base -Dataset.method d4d:method Method Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Method used for de-identification (e.g., HIPAA Safe Harbor). Unknown -Dataset.minimum_value d4d:minimum_value Minimum Value Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The minimum value that the variable can take. Applicable to numeric variables. Unknown -Dataset.missing d4d:missing Missing Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the missing data fields or elements. -" Unknown -Dataset.missing_data_causes d4d:missing_data_causes Missing Data Causes Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Known or suspected causes of missing data (e.g., sensor failures, participant dropout, privacy const... Unknown -Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation D4D_Collection semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Documentation of missing data patterns and handling strategies. D4D_Collection -Dataset.missing_data_patterns d4d:missing_data_patterns Missing Data Patterns Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of patterns in missing data (e.g., missing completely at random, missing at random, miss... Unknown -Dataset.missing_information d4d:missing_information Missing Information Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "References to one or more MissingInfo objects describing missing data. -" Unknown -Dataset.missing_value_code d4d:missing_value_code Missing Value Code Unknown skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Code(s) used to represent missing values for this variable. Examples: ""NA"", ""-999"", ""null"", """". Mult..." Unknown -Dataset.mitigation_strategy d4d:mitigation_strategy Mitigation Strategy Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Steps taken or recommended to mitigate this bias. -" Unknown -Dataset.modified_by d4d:modified_by Modified By Unknown skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.name d4d:name Name Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A human-readable name for a thing. Unknown -Dataset.notification_details d4d:notification_details Notification Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how individuals were notified about data collection. -" Unknown -Dataset.orcid d4d:orcid Orcid Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format... Unknown -Dataset.other_compliance d4d:other_compliance Other Compliance Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Other regulatory compliance frameworks applicable to this dataset (e.g., CCPA, PIPEDA, industry-spec... Unknown -Dataset.other_tasks d4d:other_tasks Other Tasks D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Uses -Dataset.page d4d:page Page Unknown skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.parent_datasets d4d:parent_datasets Parent Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Parent datasets that this dataset is part of or derived from. Enables hierarchical dataset compositi... D4D_Base -Dataset.participant_compensation d4d:participant_compensation Participant Compensation D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:participant_compensation'] d4d:participant_compensation participant_compensation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Compensation or incentives provided to human research participants. D4D_Human -Dataset.participant_privacy d4d:participant_privacy Participant Privacy D4D_Human skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Privacy protections and anonymization procedures for human research participants, including reidenti... D4D_Human -Dataset.path d4d:path Path D4D_Base skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Base -Dataset.precision d4d:precision Precision Unknown skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The precision or number of decimal places for numeric variables. Unknown -Dataset.preprocessing_details d4d:preprocessing_details Preprocessing Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on preprocessing steps applied to the data. -" Unknown -Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['d4d:preprocessing_strategies'] d4d:preprocessing_strategies preprocessing_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Preprocessing -Dataset.principal_investigator d4d:principal_investigator Principal Investigator Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A key individual (Principal Investigator) responsible for or overseeing dataset creation. Unknown -Dataset.privacy_techniques d4d:privacy_techniques Privacy Techniques Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data maski... Unknown -Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['d4d:prohibited_uses'] d4d:prohibited_uses prohibited_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicitly prohibited or forbidden uses for this dataset. Stronger than discouraged_uses - these are... D4D_Uses -Dataset.prohibition_reason d4d:prohibition_reason Prohibition Reason Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:prohibition_reason'] d4d:prohibition_reason prohibition_reason semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal c... Unknown -Dataset.publisher d4d:publisher Publisher Unknown skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.purposes d4d:purposes Purposes D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Motivation -Dataset.quality_notes d4d:quality_notes Quality Notes Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes about data quality, reliability, or known issues specific to this variable. Unknown -Dataset.quote_char d4d:quote_char Quote Char Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Unknown -Dataset.raw_data_details d4d:raw_data_details Raw Data Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on raw data availability and access procedures. -" Unknown -Dataset.raw_data_format d4d:raw_data_format Raw Data Format Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Format of the raw data before any preprocessing. -" Unknown -Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources D4D_Collection semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of raw data sources before preprocessing. D4D_Collection -Dataset.raw_sources d4d:raw_sources Raw Sources D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Preprocessing -Dataset.recommended_mitigation d4d:recommended_mitigation Recommended Mitigation Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Recommended approaches for users to address this limitation. -" Unknown -Dataset.regulatory_compliance d4d:regulatory_compliance Regulatory Compliance Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)? -" Unknown -Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Data_Governance -Dataset.reidentification_risk d4d:reidentification_risk Reidentification Risk Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What is the assessed risk of re-identification? What measures were taken to minimize this risk? -" Unknown -Dataset.related_datasets d4d:related_datasets Related Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use Data... D4D_Base -Dataset.relationship_details d4d:relationship_details Relationship Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on relationships between instances (e.g., graph edges, ratings). -" Unknown -Dataset.relationship_type d4d:relationship_type Relationship Type Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of relationship (e.g., derives_from, supplements, is_version_of). Uses DatasetRelationshipT... Unknown -Dataset.release_dates d4d:release_dates Release Dates Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled relea... Unknown -Dataset.repository_details d4d:repository_details Repository Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on the repository of known dataset uses. -" Unknown -Dataset.repository_url d4d:repository_url Repository Url Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL to a repository of known dataset uses. Unknown -Dataset.representative_verification d4d:representative_verification Representative Verification Unknown skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Explanation of how representativeness was validated or verified. -" Unknown -Dataset.resources d4d:resources Resources D4D_Base skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Sub-resources or component datasets. Used in DatasetCollection to contain Dataset objects, and in Da... D4D_Base -Dataset.response d4d:response Response Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Short explanation describing the primary purpose of creating the dataset. Unknown -Dataset.restrictions d4d:restrictions Restrictions Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of any restrictions or fees associated with external resources. -" Unknown -Dataset.retention_details d4d:retention_details Retention Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on data retention limits and enforcement procedures. -" Unknown -Dataset.retention_limit d4d:retention_limit Retention Limit D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['d4d:retention_limit'] d4d:retention_limit retention_limit semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Maintenance -Dataset.retention_period d4d:retention_period Retention Period Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:retention_period'] d4d:retention_period retention_period semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time period for data retention. Unknown -Dataset.review_details d4d:review_details Review Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on ethical review processes, outcomes, and supporting documentation. -" Unknown -Dataset.reviewing_organization d4d:reviewing_organization Reviewing Organization Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:reviewing_organization'] d4d:reviewing_organization reviewing_organization semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Organization that conducted the ethical review (e.g., Institutional Review Board, Ethics Committee, ... Unknown -Dataset.revocation_details d4d:revocation_details Revocation Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on consent revocation mechanisms and procedures. -" Unknown -Dataset.role d4d:role Role Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Role of the data collector (e.g., researcher, crowdworker) Unknown -Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies D4D_Collection skos:exactMatch @graph[?@type='Dataset']['d4d:sampling_strategies'] d4d:sampling_strategies sampling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d D4D_Collection -Dataset.scope_impact d4d:scope_impact Scope Impact Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "How this limitation affects the scope or applicability of the dataset. -" Unknown -Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements D4D_Composition skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Composition -Dataset.sensitive_elements_present d4d:sensitive_elements_present Sensitive Elements Present Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether sensitive data elements are present. Unknown -Dataset.sensitivity_details d4d:sensitivity_details Sensitivity Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on sensitive data elements present and handling procedures. -" Unknown -Dataset.sha256 d4d:sha256 Sha256 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped sha256 hash of the data D4D_Base -Dataset.source_data d4d:source_data Source Data Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the larger set from which the sample was drawn, if any. -" Unknown -Dataset.source_description d4d:source_description Source Description Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collec... Unknown -Dataset.source_type d4d:source_type Source Type Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Type of raw source (sensor, database, user input, web scraping, etc.). -" Unknown -Dataset.special_populations d4d:special_populations Special Populations Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does the research involve any special populations that require additional protections (e.g., minors,... Unknown -Dataset.special_protections d4d:special_protections Special Protections Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:special_protections'] d4d:special_protections special_protections semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What additional protections were implemented for vulnerable populations? Include safeguards, modifie... Unknown -Dataset.split_details d4d:split_details Split Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on recommended data splits and their rationale. -" Unknown -Dataset.start_date d4d:start_date Start Date Unknown skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Start date of data collection Unknown -Dataset.status d4d:status Status Unknown skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.strategies d4d:strategies Strategies Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the sampling strategy (deterministic, probabilistic, etc.). -" Unknown -Dataset.subpopulation_elements_present d4d:subpopulation_elements_present Subpopulation Elements Present Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether any subpopulations are explicitly identified. Unknown -Dataset.subpopulations d4d:subpopulations Subpopulations D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Composition -Dataset.subsets d4d:subsets Subsets D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Composition -Dataset.target_dataset d4d:target_dataset Target Dataset Unknown skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object... Unknown -Dataset.task_details d4d:task_details Task Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on other potential tasks the dataset could be used for. -" Unknown -Dataset.tasks d4d:tasks Tasks D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Motivation -Dataset.timeframe_details d4d:timeframe_details Timeframe Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on the collection timeframe and relationship to data creation dates. -" Unknown -Dataset.title d4d:title Title Unknown skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped the official title of the element Unknown -Dataset.tool_accuracy d4d:tool_accuracy Tool Accuracy Unknown skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Known accuracy or performance metrics for the automated tools (if available). Include metric name an... Unknown -Dataset.tool_descriptions d4d:tool_descriptions Tool Descriptions Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Descriptions of what each tool does in the annotation process and what types of annotations it produ... Unknown -Dataset.tools d4d:tools Tools Unknown skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "List of automated annotation tools with their versions. Format each entry as ""ToolName version"" (e.g..." Unknown -Dataset.unit d4d:unit Unit Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/).... Unknown -Dataset.update_details d4d:update_details Update Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on update plans, responsible parties, and communication methods. -" Unknown -Dataset.updates d4d:updates Updates D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Maintenance -Dataset.url d4d:url Url Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Unknown -Dataset.usage_notes d4d:usage_notes Usage Notes Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes or caveats about using the dataset for intended purposes. Unknown -Dataset.use_category d4d:use_category Use Category Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Category of intended use (e.g., research, clinical, educational, commercial, policy). Unknown -Dataset.use_repository d4d:use_repository Use Repository D4D_Uses skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Uses -Dataset.used_software d4d:used_software Used Software Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What software was used as part of this dataset property? Unknown -Dataset.variable_name d4d:variable_name Variable Name Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The name or identifier of the variable as it appears in the data files. Unknown -Dataset.variables d4d:variables Variables D4D_Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Metadata describing individual variables, fields, or columns in the dataset. D4D_Variables -Dataset.version d4d:version Version Unknown skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.version_access d4d:version_access Version Access D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped D4D_Maintenance -Dataset.version_details d4d:version_details Version Details Unknown semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on version support policies and obsolescence communication. -" Unknown -Dataset.versions_available d4d:versions_available Versions Available Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of available versions with metadata. Unknown -Dataset.vulnerable_groups_included d4d:vulnerable_groups_included Vulnerable Groups Included Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:vulnerable_groups_included'] d4d:vulnerable_groups_included vulnerable_groups_included semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Are any vulnerable populations included (e.g., children, pregnant women, prisoners, cognitively impa... Unknown -Dataset.vulnerable_populations d4d:vulnerable_populations Vulnerable Populations Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:vulnerable_populations'] d4d:vulnerable_populations vulnerable_populations semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Information about protections for vulnerable populations (e.g., minors, pregnant women, prisoners) i... Unknown -Dataset.warnings d4d:warnings Warnings Unknown skos:exactMatch @graph[?@type='Dataset']['d4d:warnings'] d4d:warnings warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Unknown -Dataset.was_derived_from d4d:was_derived_from Was Derived From Unknown skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Unknown -Dataset.was_directly_observed d4d:was_directly_observed Was Directly Observed Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was directly observed Unknown -Dataset.was_inferred_derived d4d:was_inferred_derived Was Inferred Derived Unknown skos:closeMatch @graph[?@type='Dataset']['wasDerivedFrom'] prov:wasDerivedFrom wasDerivedFrom semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://www.w3.org/ns/prov# d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was inferred or derived from other data Unknown -Dataset.was_reported_by_subjects d4d:was_reported_by_subjects Was Reported By Subjects Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was reported directly by the subjects themselves Unknown -Dataset.was_validated_verified d4d:was_validated_verified Was Validated Verified Unknown skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was validated or verified in any way Unknown -Dataset.why_missing d4d:why_missing Why Missing Unknown semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of why each piece of data is missing. -" Unknown -Dataset.why_not_representative d4d:why_not_representative Why Not Representative Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Explanation of why the sample is not representative, if applicable. -" Unknown -Dataset.withdrawal_mechanism d4d:withdrawal_mechanism Withdrawal Mechanism Unknown semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended How can participants withdraw their consent? What procedures are in place for data deletion upon wit... Unknown +d4d_schema_path subject_id subject_label predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version mapping_status d4d_description +Dataset.access_details d4d:access_details Access Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Information on how to access or retrieve the raw source data. +" +Dataset.access_url d4d:access_url Access Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the raw data. +Dataset.access_urls d4d:access_urls Access Urls semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more URLs providing access to the distribution channel(s) or format(s). +Dataset.acquisition_details d4d:acquisition_details Acquisition Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how data was acquired for each instance, including instruments, protocols, ... +Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Co... +Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps skos:exactMatch @graph[?@type='Dataset']['d4d:addressing_gaps'] d4d:addressing_gaps addressing_gaps semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation... +Dataset.affected_subsets d4d:affected_subsets Affected Subsets semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more specific subsets or features of the dataset affected by this bias (e.g., ""female partici..." +Dataset.affiliation d4d:affiliation Affiliation semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The organization(s) to which the person belongs in the context of this dataset. May vary across data... +Dataset.affiliations d4d:affiliations Affiliations semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Organizations with which the creator or team is affiliated. +Dataset.agreement_metric d4d:agreement_metric Agreement Metric semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreem... +Dataset.analysis_method d4d:analysis_method Analysis Method semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Methodology used to assess annotation quality and resolve disagreements. +" +Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses skos:exactMatch @graph[?@type='Dataset']['d4d:annotation_analyses'] d4d:annotation_analyses annotation_analyses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Analysis of annotation quality and inter-annotator agreement. +Dataset.annotation_quality_details d4d:annotation_quality_details Annotation Quality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Additional details on annotation quality assessment and findings. +" +Dataset.annotations_per_item d4d:annotations_per_item Annotations Per Item semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Number of annotations collected per data item. Multiple annotations per item enable calculation of i... +Dataset.annotator_demographics d4d:annotator_demographics Annotator Demographics semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more demographic characteristics of the annotators, if available and relevant (e.g., geograph... +Dataset.anomalies d4d:anomalies Anomalies skos:exactMatch @graph[?@type='Dataset']['d4d:dataAnomalies'] d4d:dataAnomalies dataAnomalies semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects fro... +Dataset.anomaly_details d4d:anomaly_details Anomaly Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of errors, noise sources, or redundancies in the dataset, including their know... +Dataset.anonymization_method d4d:anonymization_method Anonymization Method semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text What methods were used to anonymize or de-identify participant data? Include technical details of pr... +Dataset.archival d4d:archival Archival semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether official archival versions of external resources are included in the dataset. +" +Dataset.assent_procedures d4d:assent_procedures Assent Procedures semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For research involving minors, what assent procedures were used? How was developmentally appropriate... +Dataset.at_risk_groups_included d4d:at_risk_groups_included At Risk Groups Included semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaire... +Dataset.at_risk_populations d4d:at_risk_populations At Risk Populations skos:exactMatch @graph[?@type='Dataset']['d4d:atRiskPopulations'] d4d:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about protections for at-risk populations (e.g., minors, pregnant women, prisoners) incl... +Dataset.bias_description d4d:bias_description Bias Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of how this bias manifests in the dataset, including affected populations, feat... +Dataset.bias_type d4d:bias_type Bias Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of bias identified, using standardized categories from the Artificial Intelligence Ontology... +Dataset.bytes d4d:bytes Bytes skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Size of the data in bytes. +Dataset.categories d4d:categories Categories semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more permitted categories or values for a categorical variable. Each entry should describe a ... +Dataset.citation d4d:citation Citation skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Recommended citation for this dataset in DataCite or BibTeX format. Provides a standard way to cite ... +Dataset.cleaning_details d4d:cleaning_details Cleaning Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of data cleaning procedures applied, including criteria for removing or correc... +Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:cleaning_strategies'] d4d:cleaning_strategies cleaning_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy object... +Dataset.collection_consents d4d:collection_consents Collection Consents semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Consent obtained from individuals for data collection and use. List of CollectionConsent objects fro... +Dataset.collection_details d4d:collection_details Collection Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of whether data was collected directly from individuals or obtained via third ... +Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from... +Dataset.collection_notifications d4d:collection_notifications Collection Notifications semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Notifications provided to individuals about data collection. List of CollectionNotification objects ... +Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes skos:exactMatch @graph[?@type='Dataset']['d4d:collection_timeframes'] d4d:collection_timeframes collection_timeframes semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time periods during which data was collected. List of CollectionTimeframe objects from the Collectio... +Dataset.collection_type d4d:collection_type Collection Type semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Type(s) of content in this file collection. A collection may have multiple types, for example a coll... +Dataset.collector_details d4d:collector_details Collector Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of who was involved in data collection (e.g., students, crowdworkers, contract... +Dataset.comment_prefix d4d:comment_prefix Comment Prefix semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Character(s) used to indicate comment lines (e.g., ""#"" for CSV comments)." +Dataset.compensation_amount d4d:compensation_amount Compensation Amount skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_amount'] d4d:compensation_amount compensation_amount semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "What was the amount or value of compensation provided? Include currency or equivalent value. +" +Dataset.compensation_provided d4d:compensation_provided Compensation Provided skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_provided'] d4d:compensation_provided compensation_provided semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Were participants compensated for their participation? +Dataset.compensation_rationale d4d:compensation_rationale Compensation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_rationale'] d4d:compensation_rationale compensation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What was the rationale for the compensation structure? How was the amount determined to be appropria... +Dataset.compensation_type d4d:compensation_type Compensation Type skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_type'] d4d:compensation_type compensation_type semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other inc... +Dataset.compression d4d:compression Compression skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Compression format used, if any (e.g., gzip, bzip2, zip). +Dataset.confidential_elements d4d:confidential_elements Confidential Elements skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements'] d4d:confidential_elements confidential_elements semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidential or restricted information within the dataset that requires access controls. List of Con... +Dataset.confidential_elements_present d4d:confidential_elements_present Confidential Elements Present skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements_present'] d4d:confidential_elements_present confidential_elements_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any confidential data elements are present. +Dataset.confidentiality_details d4d:confidentiality_details Confidentiality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of which data elements are confidential, the basis for confidentiality (e.g., ... +Dataset.confidentiality_level d4d:confidentiality_level Confidentiality Level skos:exactMatch @graph[?@type='Dataset']['d4d:confidentiality_level'] d4d:confidentiality_level confidentiality_level semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidentiality classification of the dataset indicating level of access restrictions and sensitivit... +Dataset.conforms_to d4d:conforms_to Conforms To skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped An established standard, specification, or schema to which the resource conforms. +Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The specific class or type within a schema to which the resource conforms. +Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The schema or data model to which the resource conforms. +Dataset.consent_details d4d:consent_details Consent Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, ... +Dataset.consent_documentation d4d:consent_documentation Consent Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "How is consent documented? Include references to consent forms or procedures used. +" +Dataset.consent_obtained d4d:consent_obtained Consent Obtained semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was informed consent obtained from all participants? +Dataset.consent_revocations d4d:consent_revocations Consent Revocations semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects fro... +Dataset.consent_scope d4d:consent_scope Consent Scope semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "What specific uses did participants consent to? Are there limitations on data use based on consent? +" +Dataset.consent_type d4d:consent_type Consent Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)... +Dataset.contact_person d4d:contact_person Contact Person skos:exactMatch @graph[?@type='Dataset']['d4d:contact_person'] d4d:contact_person contact_person semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for questions about ethical review. Provides structured contact information including... +Dataset.content_warnings d4d:content_warnings Content Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings'] d4d:content_warnings content_warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ... +Dataset.content_warnings_present d4d:content_warnings_present Content Warnings Present skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings_present'] d4d:content_warnings_present content_warnings_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any content warnings are needed. +Dataset.contribution_url d4d:contribution_url Contribution Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL for contribution guidelines or process. +Dataset.counts d4d:counts Counts semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "How many instances are there in total (of each type, if appropriate)? +" +Dataset.created_by d4d:created_by Created By skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The person or organization primarily responsible for creating the resource. +Dataset.created_on d4d:created_on Created On skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The date and time when the resource was created. +Dataset.creators d4d:creators Creators skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations who created the dataset. List of Creator objects describing authorship,... +Dataset.credit_roles d4d:credit_roles Credit Roles skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal invest... +Dataset.data_annotation_platform d4d:data_annotation_platform Data Annotation Platform semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical T... +Dataset.data_annotation_protocol d4d:data_annotation_protocol Data Annotation Protocol skos:exactMatch @graph[?@type='Dataset']['d4d:data_annotation_protocol'] d4d:data_annotation_protocol data_annotation_protocol semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guideline... +Dataset.data_collectors d4d:data_collectors Data Collectors skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations responsible for collecting the data. List of DataCollector objects from... +Dataset.data_linkage d4d:data_linkage Data Linkage semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Can this dataset be linked to other datasets in ways that might compromise participant privacy? +" +Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:data_protection_impacts'] d4d:data_protection_impacts data_protection_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact o... +Dataset.data_substrate d4d:data_substrate Data Substrate semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Type of data (e.g., raw text, images) from Bridge2AI standards. +" +Dataset.data_topic d4d:data_topic Data Topic semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "General topic of each instance (e.g., from Bridge2AI standards). +" +Dataset.data_type d4d:data_type Data Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The data type of the variable (e.g., integer, float, string, boolean, date, categorical). Use standa... +Dataset.data_use_permission d4d:data_use_permission Data Use Permission semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., g... +Dataset.deidentification_details d4d:deidentification_details Deidentification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on de-identification procedures and residual risks. +" +Dataset.delimiter d4d:delimiter Delimiter semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Field delimiter character (e.g., "","" for CSV, ""\t"" for TSV)." +Dataset.derivation d4d:derivation Derivation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of how this variable was derived or calculated from other variables, if applicable. +Dataset.description d4d:description Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A human-readable description for a thing. +Dataset.dialect d4d:dialect Dialect skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile). +Dataset.direct_collection d4d:direct_collection Direct Collection semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Whether data was collected directly from individuals or via third parties. List of DirectCollection ... +Dataset.disagreement_patterns d4d:disagreement_patterns Disagreement Patterns semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Systematic patterns in annotator disagreements (e.g., by demographic group, annotation difficulty, t... +Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses skos:exactMatch @graph[?@type='Dataset']['d4d:discouraged_uses'] d4d:discouraged_uses discouraged_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List ... +Dataset.discouragement_details d4d:discouragement_details Discouragement Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of tasks or applications for which the dataset is not recommended, with explan... +Dataset.distribution d4d:distribution Distribution semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The distribution of instances across identified subpopulations, including counts, percentages, or pr... +Dataset.distribution_dates d4d:distribution_dates Distribution Dates skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Dates when the dataset was or will be distributed or released. List of DistributionDate objects from... +Dataset.distribution_formats d4d:distribution_formats Distribution Formats skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Formats in which the dataset is distributed or made available. List of DistributionFormat objects fr... +Dataset.doi d4d:doi Doi skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '... +Dataset.double_quote d4d:double_quote Double Quote semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Whether quotes within quoted fields are escaped by doubling them. Expected values: ""true"" or ""false""..." +Dataset.download_url d4d:download_url Download Url skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped URL from which the data can be downloaded. This is not the same as the landing page, which is a page... +Dataset.email d4d:email Email semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The email address of the person. Represents current/preferred contact information in the context of ... +Dataset.encoding d4d:encoding Encoding skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped The character encoding of the data. +Dataset.end_date d4d:end_date End Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended End date of data collection. +Dataset.errata d4d:errata Errata skos:exactMatch @graph[?@type='Dataset']['d4d:errata'] d4d:errata errata semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known errors or corrections to the dataset since publication. List of Erratum objects from the Maint... +Dataset.erratum_details d4d:erratum_details Erratum Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the error, its scope, the affected data or records, and the correction appl... +Dataset.erratum_url d4d:erratum_url Erratum Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the erratum. +Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews skos:exactMatch @graph[?@type='Dataset']['d4d:ethical_reviews'] d4d:ethical_reviews ethical_reviews semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the ... +Dataset.ethics_review_board d4d:ethics_review_board Ethics Review Board semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "What ethics review board(s) reviewed this research? Include institution names and approval details. +" +Dataset.examples d4d:examples Examples semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of examples of known/previous uses of the dataset. +Dataset.existing_uses d4d:existing_uses Existing Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the ... +Dataset.extension_details d4d:extension_details Extension Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how third parties can contribute to the dataset, how contributions are vali... +Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintena... +Dataset.external_resources d4d:external_resources External Resources semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text External resources referenced at the dataset level (e.g., related publications, repositories, docume... +Dataset.file_collections d4d:file_collections File Collections semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Collections of files within this dataset. Each collection represents a logical grouping of files wit... +Dataset.file_count d4d:file_count File Count semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Number of files in this collection. +Dataset.file_type d4d:file_type File Type semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Semantic type or purpose of this file (e.g., data_file, code_file, documentation_file, metadata_file... +Dataset.format d4d:format Format skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file format, physical medium, or dimensions of a resource. This should be a file extension or MI... +Dataset.frequency d4d:frequency Frequency semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How often updates are planned (e.g., quarterly, annually). +Dataset.funders d4d:funders Funders skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing gran... +Dataset.future_guarantees d4d:future_guarantees Future Guarantees semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of any commitments that external resources will remain available and stable over time. +" +Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:future_use_impacts'] d4d:future_use_impacts future_use_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects fr... +Dataset.governance_committee_contact d4d:governance_committee_contact Governance Committee Contact skos:exactMatch @graph[?@type='Dataset']['d4d:governance_committee_contact'] d4d:governance_committee_contact governance_committee_contact semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for data governance committee. This person can answer questions about data governance... +Dataset.grant_number d4d:grant_number Grant Number semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The alphanumeric identifier for the grant. +Dataset.grantor d4d:grantor Grantor semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Name/identifier of the organization providing monetary or resource support. +Dataset.grants d4d:grants Grants semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset. +Dataset.guardian_consent d4d:guardian_consent Guardian Consent semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For participants unable to provide their own consent, how was guardian or surrogate consent obtained... +Dataset.handling_strategy d4d:handling_strategy Handling Strategy skos:exactMatch @graph[?@type='Dataset']['d4d:handling_strategy'] d4d:handling_strategy handling_strategy semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple... +Dataset.hash d4d:hash Hash skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Cryptographic hash value of the data for integrity verification (e.g., SHA-256: 'e3b0c44298fc1c149af... +Dataset.header d4d:header Header semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Whether the first row of the file contains column headers. Expected values: ""true"" or ""false"" (as st..." +Dataset.hipaa_compliant d4d:hipaa_compliant Hipaa Compliant semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA app... +Dataset.human_subject_research d4d:human_subject_research Human Subject Research skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about whether dataset involves human subjects research, including IRB approval, ethics r... +Dataset.id d4d:id Id skos:exactMatch @graph[?@type='Dataset']['ID'] rdf:ID ID semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-comprehensive-v1 1.0 mapped A unique identifier for a thing. +Dataset.identifiable_elements_present d4d:identifiable_elements_present Identifiable Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether data subjects can be identified. +Dataset.identification d4d:identification Identification semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disea... +Dataset.identifiers_removed d4d:identifiers_removed Identifiers Removed skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'em... +Dataset.impact_details d4d:impact_details Impact Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of potential future impacts or risks arising from the dataset's composition or... +Dataset.imputation_method d4d:imputation_method Imputation Method skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_method'] d4d:imputation_method imputation_method semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, ... +Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_protocols'] d4d:imputation_protocols imputation_protocols semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from ... +Dataset.imputation_rationale d4d:imputation_rationale Imputation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_rationale'] d4d:imputation_rationale imputation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Justification for the imputation approach chosen, including assumptions made about missing data mech... +Dataset.imputation_validation d4d:imputation_validation Imputation Validation skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_validation'] d4d:imputation_validation imputation_validation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Methods used to validate imputation quality (if any). +" +Dataset.imputed_fields d4d:imputed_fields Imputed Fields skos:exactMatch @graph[?@type='Dataset']['d4d:imputed_fields'] d4d:imputed_fields imputed_fields semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Fields or columns where imputation was applied. +" +Dataset.informed_consent d4d:informed_consent Informed Consent semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details about informed consent procedures, including consent type, documentation, and withdrawal mec... +Dataset.instance_type d4d:instance_type Instance Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The type or types of instances in the dataset (e.g., ""movie"", ""user"", ""rating"", ""clinical record""). ..." +Dataset.instances d4d:instances Instances skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individual data instances or records in the dataset. List of Instance objects from the Composition m... +Dataset.intended_uses d4d:intended_uses Intended Uses skos:exactMatch @graph[?@type='Dataset']['d4d:intended_uses'] d4d:intended_uses intended_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicit intended and recommended uses for this dataset. Complements future_use_impacts by focusing ... +Dataset.inter_annotator_agreement d4d:inter_annotator_agreement Inter Annotator Agreement semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, p... +Dataset.inter_annotator_agreement_score d4d:inter_annotator_agreement_score Inter Annotator Agreement Score semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Measured agreement between annotators (e.g., Cohen's kappa value, Fleiss' kappa, Krippendorff's alph... +Dataset.involves_human_subjects d4d:involves_human_subjects Involves Human Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does this dataset involve human subjects research? +Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions skos:exactMatch @graph[?@type='Dataset']['d4d:ip_restrictions'] d4d:ip_restrictions ip_restrictions semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the ... +Dataset.irb_approval d4d:irb_approval Irb Approval semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if a... +Dataset.is_data_split d4d:is_data_split Is Data Split semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a split of the larger dataset, e.g., is it a set for model training, testing, or vali... +Dataset.is_deidentified d4d:is_deidentified Is Deidentified skos:exactMatch @graph[?@type='Dataset']['d4d:is_deidentified'] d4d:is_deidentified is_deidentified semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d De-identification status and procedures applied to the dataset. Deidentification object describing w... +Dataset.is_direct d4d:is_direct Is Direct semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether collection was direct from individuals. +Dataset.is_identifier d4d:is_identifier Is Identifier semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable serves as a unique identifier or key for records in the dataset. +Dataset.is_random d4d:is_random Is Random semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether the sample is random. +Dataset.is_representative d4d:is_representative Is Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether the sample is representative of the larger set. +" +Dataset.is_sample d4d:is_sample Is Sample semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether it is a sample of a larger set. +Dataset.is_sensitive d4d:is_sensitive Is Sensitive semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable contains sensitive information (e.g., personal data, protected healt... +Dataset.is_shared d4d:is_shared Is Shared semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Boolean indicating whether the dataset is distributed to parties external to the dataset-creating en... +Dataset.is_subpopulation d4d:is_subpopulation Is Subpopulation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a subpopulation of the larger dataset, e.g., is it a set of data for a specific demog... +Dataset.is_tabular d4d:is_tabular Is Tabular skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Whether the dataset is in tabular format (rows and columns). True if the data is structured as a tab... +Dataset.issued d4d:issued Issued skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Date of formal issuance or publication of the resource. +Dataset.keywords d4d:keywords Keywords skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Keywords or tags describing the resource for discovery and classification. +Dataset.known_biases d4d:known_biases Known Biases skos:exactMatch @graph[?@type='Dataset']['d4d:known_biases'] d4d:known_biases known_biases semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known biases present in the dataset that may affect fairness, representativeness, or model performan... +Dataset.known_limitations d4d:known_limitations Known Limitations skos:exactMatch @graph[?@type='Dataset']['d4d:known_limitations'] d4d:known_limitations known_limitations semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known limitations of the dataset that may affect its use or interpretation. Distinct from biases (sy... +Dataset.label d4d:label Label semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Is there a label or target associated with each instance? +" +Dataset.label_description d4d:label_description Label Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "If labeled, what pattern or format do labels follow? +" +Dataset.labeling_details d4d:labeling_details Labeling Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the labeling or annotation procedures, including annotation guidelines, tas... +Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:labeling_strategies'] d4d:labeling_strategies labeling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the ... +Dataset.language d4d:language Language skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Language in which the information is expressed. +Dataset.last_updated_on d4d:last_updated_on Last Updated On skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The date and time when the resource was most recently modified or updated. +Dataset.latest_version_doi d4d:latest_version_doi Latest Version Doi semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI ... +Dataset.license d4d:license License skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped "The legal license under which the resource is made available (e.g., ""MIT"", ""CC-BY-4.0"")." +Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms skos:exactMatch @graph[?@type='Dataset']['d4d:license_and_use_terms'] d4d:license_and_use_terms license_and_use_terms semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Go... +Dataset.license_terms d4d:license_terms License Terms semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of the dataset's license and terms of use, including links, costs, or usage constraints ... +Dataset.limitation_description d4d:limitation_description Limitation Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Detailed description of the limitation and its implications. +" +Dataset.limitation_type d4d:limitation_type Limitation Type skos:closeMatch @graph[?@type='Dataset']['temporalCoverage'] schema:temporalCoverage temporalCoverage semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Category of limitation (e.g., scope, coverage, temporal, methodological). +" +Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Automated annotation tools used in dataset creation. +Dataset.maintainer_details d4d:maintainer_details Maintainer Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the organization, team, or individual responsible for maintaining the datas... +Dataset.maintainers d4d:maintainers Maintainers skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects fro... +Dataset.maximum_value d4d:maximum_value Maximum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The maximum value that the variable can take. Applicable to numeric variables. +Dataset.md5 d4d:md5 Md5 skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped MD5 hash value of the data (128-bit cryptographic hash). +Dataset.measurement_technique d4d:measurement_technique Measurement Technique semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The technique or method used to measure this variable. Examples: ""mass spectrometry"", ""self-report s..." +Dataset.mechanism_details d4d:mechanism_details Mechanism Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardw... +Dataset.media_type d4d:media_type Media Type skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The media type of the data. This should be a MIME type. +Dataset.method d4d:method Method semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Method used for de-identification (e.g., HIPAA Safe Harbor). +Dataset.minimum_value d4d:minimum_value Minimum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The minimum value that the variable can take. Applicable to numeric variables. +Dataset.missing d4d:missing Missing semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the missing data fields or elements. +" +Dataset.missing_data_causes d4d:missing_data_causes Missing Data Causes semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Known or suspected causes of missing data (e.g., sensor failures, participant dropout, privacy const... +Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Documentation of missing data patterns and handling strategies. +Dataset.missing_data_patterns d4d:missing_data_patterns Missing Data Patterns semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of patterns in missing data (e.g., missing completely at random, missing at random, miss... +Dataset.missing_information d4d:missing_information Missing Information semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "References to one or more MissingInfo objects describing missing data. +" +Dataset.missing_value_code d4d:missing_value_code Missing Value Code skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Code(s) used to represent missing values for this variable. Examples: ""NA"", ""-999"", ""null"", """". Mult..." +Dataset.mitigation_strategy d4d:mitigation_strategy Mitigation Strategy semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Steps taken or recommended to mitigate this bias. +" +Dataset.modified_by d4d:modified_by Modified By skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A person or organization that contributed to modifying or updating the resource. +Dataset.name d4d:name Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A human-readable name for a thing. +Dataset.notification_details d4d:notification_details Notification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how individuals were notified about data collection, including the notifica... +Dataset.orcid d4d:orcid Orcid semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format... +Dataset.other_compliance d4d:other_compliance Other Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Other regulatory compliance frameworks applicable to this dataset (e.g., CCPA, PIPEDA, industry-spec... +Dataset.other_tasks d4d:other_tasks Other Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from ... +Dataset.page d4d:page Page skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A landing page or web page providing access to or information about the resource. +Dataset.parent_datasets d4d:parent_datasets Parent Datasets skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Parent datasets that this dataset is part of or derived from. Enables hierarchical dataset compositi... +Dataset.participant_compensation d4d:participant_compensation Participant Compensation skos:exactMatch @graph[?@type='Dataset']['d4d:participant_compensation'] d4d:participant_compensation participant_compensation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Information about compensation or incentives provided to human research participants. +Dataset.participant_privacy d4d:participant_privacy Participant Privacy skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about privacy protections and anonymization procedures for human research participants. +Dataset.path d4d:path Path skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file path or URL where the content is located. +Dataset.precision d4d:precision Precision skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The precision or number of decimal places for numeric variables. +Dataset.preprocessing_details d4d:preprocessing_details Preprocessing Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of preprocessing steps applied to the data, including tools used, parameters, ... +Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:preprocessing_strategies'] d4d:preprocessing_strategies preprocessing_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preproce... +Dataset.principal_investigator d4d:principal_investigator Principal Investigator semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A key individual (Principal Investigator) responsible for or overseeing dataset creation. +Dataset.privacy_techniques d4d:privacy_techniques Privacy Techniques semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data maski... +Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses skos:exactMatch @graph[?@type='Dataset']['d4d:prohibited_uses'] d4d:prohibited_uses prohibited_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicitly prohibited or forbidden uses for this dataset. Stronger than discouraged_uses - these are... +Dataset.prohibition_reason d4d:prohibition_reason Prohibition Reason skos:exactMatch @graph[?@type='Dataset']['d4d:prohibition_reason'] d4d:prohibition_reason prohibition_reason semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy ... +Dataset.publisher d4d:publisher Publisher skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The organization or entity responsible for making the resource available. +Dataset.purposes d4d:purposes Purposes skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each... +Dataset.quality_notes d4d:quality_notes Quality Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes about data quality, reliability, or known issues specific to this variable. +Dataset.quote_char d4d:quote_char Quote Char semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Character used for quoting fields (e.g., '""' for CSV)." +Dataset.raw_data_details d4d:raw_data_details Raw Data Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of raw data availability, access procedures, and any conditions or restriction... +Dataset.raw_data_format d4d:raw_data_format Raw Data Format semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON). +" +Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped List of raw data sources before preprocessing. Each RawDataSource object describes where the origina... +Dataset.raw_sources d4d:raw_sources Raw Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the ... +Dataset.recommended_mitigation d4d:recommended_mitigation Recommended Mitigation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Recommended approaches for users to address this limitation. +" +Dataset.regulatory_compliance d4d:regulatory_compliance Regulatory Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)? +" +Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions skos:exactMatch @graph[?@type='Dataset']['d4d:regulatory_restrictions'] d4d:regulatory_restrictions regulatory_restrictions semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestric... +Dataset.reidentification_risk d4d:reidentification_risk Reidentification Risk semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What is the assessed risk of re-identification? What measures were taken to minimize this risk? +" +Dataset.related_datasets d4d:related_datasets Related Datasets skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use Data... +Dataset.relationship_details d4d:relationship_details Relationship Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how relationships between instances are represented (e.g., graph edges, rat... +Dataset.relationship_type d4d:relationship_type Relationship Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of relationship (e.g., derives_from, supplements, is_version_of). Uses DatasetRelationshipT... +Dataset.relationships d4d:relationships Relationships semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Explicit relationships between individual instances in the dataset. List of Relationships objects fr... +Dataset.release_dates d4d:release_dates Release Dates semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., ""2024-03-15"") or as a..." +Dataset.repository_details d4d:repository_details Repository Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the repository of known dataset uses, including how it is maintained and ho... +Dataset.repository_url d4d:repository_url Repository Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL to a repository of known dataset uses. +Dataset.representative_verification d4d:representative_verification Representative Verification skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more explanations of how representativeness was validated or verified (e.g., statistical test... +Dataset.resources d4d:resources Resources skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Sub-resources or component items. In DatasetCollection, contains Dataset objects. In Dataset, contai... +Dataset.response d4d:response Response semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Short explanation describing the primary purpose of creating the dataset. +Dataset.restrictions d4d:restrictions Restrictions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text One or more descriptions of restrictions or fees associated with accessing these external resources ... +Dataset.retention_details d4d:retention_details Retention Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of applicable retention limits, legal or ethical basis for those limits, and h... +Dataset.retention_limit d4d:retention_limit Retention Limit skos:exactMatch @graph[?@type='Dataset']['d4d:retention_limit'] d4d:retention_limit retention_limit semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance modu... +Dataset.retention_period d4d:retention_period Retention Period skos:exactMatch @graph[?@type='Dataset']['d4d:retention_period'] d4d:retention_period retention_period semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time period for data retention. +Dataset.review_details d4d:review_details Review Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the ethical review process, board decisions, outcomes, and any supporting d... +Dataset.reviewing_organization d4d:reviewing_organization Reviewing Organization skos:exactMatch @graph[?@type='Dataset']['d4d:reviewing_organization'] d4d:reviewing_organization reviewing_organization semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Organization that conducted the ethical review (e.g., Institutional Review Board, Ethics Committee, ... +Dataset.revocation_details d4d:revocation_details Revocation Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out por... +Dataset.role d4d:role Role semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Role of the data collector (e.g., researcher, crowdworker). +Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:sampling_strategies'] d4d:sampling_strategies sampling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Strategies used to select data instances from a larger population. List of SamplingStrategy objects ... +Dataset.scope_impact d4d:scope_impact Scope Impact semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "How this limitation affects the scope or applicability of the dataset. +" +Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Sensitive data elements requiring special handling or access controls. List of SensitiveElement obje... +Dataset.sensitive_elements_present d4d:sensitive_elements_present Sensitive Elements Present semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether sensitive data elements are present. +Dataset.sensitivity_details d4d:sensitivity_details Sensitivity Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on sensitive data elements present and handling procedures. +" +Dataset.sha256 d4d:sha256 Sha256 skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped SHA-256 hash value of the data (256-bit cryptographic hash, recommended). +Dataset.source_data d4d:source_data Source Data semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "One or more descriptions of the larger sets from which the sample was drawn, if applicable. +" +Dataset.source_description d4d:source_description Source Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collec... +Dataset.source_type d4d:source_type Source Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more types of raw source (e.g., sensor, database, user input, web scraping). +" +Dataset.special_populations d4d:special_populations Special Populations semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does the research involve any special populations that require additional protections (e.g., minors,... +Dataset.special_protections d4d:special_protections Special Protections semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped What additional protections were implemented for at-risk populations? Include safeguards, modified p... +Dataset.split_details d4d:split_details Split Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how th... +Dataset.splits d4d:splits Splits semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Recommended data splits for this dataset. List of Splits objects from the Composition module describ... +Dataset.start_date d4d:start_date Start Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Start date of data collection. +Dataset.status d4d:status Status skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The status of the resource (e.g., draft, published, deprecated). +Dataset.strategies d4d:strategies Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:strategies'] d4d:strategies strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, syste... +Dataset.subpopulation_elements_present d4d:subpopulation_elements_present Subpopulation Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether any subpopulations are explicitly identified. +Dataset.subpopulations d4d:subpopulations Subpopulations skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Subpopulations represented within the dataset. List of Subpopulation objects from the Composition mo... +Dataset.subsets d4d:subsets Subsets skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each repr... +Dataset.target_dataset d4d:target_dataset Target Dataset skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object... +Dataset.task_details d4d:task_details Task Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of other potential tasks the dataset could support, including any prerequisite... +Dataset.tasks d4d:tasks Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Tasks the dataset is intended to support. List of Task objects from the Motivation module describing... +Dataset.third_party_sharing d4d:third_party_sharing Third Party Sharing semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distri... +Dataset.timeframe_details d4d:timeframe_details Timeframe Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the data collection period and whether this timeframe matches the creation ... +Dataset.title d4d:title Title skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The official title of the element. +Dataset.tool_accuracy d4d:tool_accuracy Tool Accuracy skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more known accuracy or performance metrics for the automated tools (if available). Include me... +Dataset.tool_descriptions d4d:tool_descriptions Tool Descriptions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Descriptions of what each tool does in the annotation process and what types of annotations it produ... +Dataset.tools d4d:tools Tools skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "List of automated annotation tools with their versions. Format each entry as ""ToolName version"" (e.g..." +Dataset.total_bytes d4d:total_bytes Total Bytes semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize. +Dataset.total_file_count d4d:total_file_count Total File Count semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total number of files across all file collections in this dataset. Can be aggregated from file_colle... +Dataset.total_size_bytes d4d:total_size_bytes Total Size Bytes semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total size of all files in bytes across all file collections. Can be aggregated from file_collection... +Dataset.unit d4d:unit Unit semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/).... +Dataset.update_details d4d:update_details Update Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of planned update types (e.g., corrections, additions, deletions), responsible... +Dataset.updates d4d:updates Updates skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module... +Dataset.url d4d:url Url semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text URL where the software can be found (e.g., homepage, repository, or documentation). +Dataset.usage_notes d4d:usage_notes Usage Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A note or caveat about using the dataset for its intended purposes. +Dataset.use_category d4d:use_category Use Category semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more categories of intended use (e.g., research, clinical, educational, commercial, policy). +Dataset.use_repository d4d:use_repository Use Repository skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Repositories or registries tracking how the dataset has been used. List of UseRepository objects fro... +Dataset.used_software d4d:used_software Used Software semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What software was used as part of this dataset property? +Dataset.variable_name d4d:variable_name Variable Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The name or identifier of the variable as it appears in the data files. +Dataset.variables d4d:variables Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Metadata describing individual variables, fields, or columns in the dataset. +Dataset.version d4d:version Version skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped "The version identifier of the resource (e.g., ""1.0"", ""2.3.1"")." +Dataset.version_access d4d:version_access Version Access skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about access to different versions of the dataset. VersionAccess object from the Mainten... +Dataset.version_details d4d:version_details Version Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of version support policies, how long older versions will be hosted, and how d... +Dataset.versions_available d4d:versions_available Versions Available semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of available versions with metadata. +Dataset.warnings d4d:warnings Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:warnings'] d4d:warnings warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more specific content warnings describing potentially offensive, insulting, threatening, or a... +Dataset.was_derived_from d4d:was_derived_from Was Derived From skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A resource from which this resource was derived, in whole or in part. +Dataset.was_directly_observed d4d:was_directly_observed Was Directly Observed semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was directly observed by a researcher or instrument; false if it was obtained throu... +Dataset.was_inferred_derived d4d:was_inferred_derived Was Inferred Derived skos:closeMatch @graph[?@type='Dataset']['wasDerivedFrom'] prov:wasDerivedFrom wasDerivedFrom semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://www.w3.org/ns/prov# d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was computationally inferred or derived from other data (e.g., model outputs, imput... +Dataset.was_reported_by_subjects d4d:was_reported_by_subjects Was Reported By Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was self-reported directly by the subjects themselves (e.g., survey responses, ques... +Dataset.was_validated_verified d4d:was_validated_verified Was Validated Verified skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data underwent a validation or verification process (e.g., expert review, cross-checking... +Dataset.why_missing d4d:why_missing Why Missing semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of why each piece of data is missing. +" +Dataset.why_not_representative d4d:why_not_representative Why Not Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more explanations of why the sample is not representative of the larger set, if applicable. +" +Dataset.withdrawal_mechanism d4d:withdrawal_mechanism Withdrawal Mechanism semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended How can participants withdraw their consent? What procedures are in place for data deletion upon wit... diff --git a/data/mappings/d4d_rocrate_sssom_mapping.tsv b/data/mappings/d4d_rocrate_sssom_mapping.tsv index 3f412002..662516da 100644 --- a/data/mappings/d4d_rocrate_sssom_mapping.tsv +++ b/data/mappings/d4d_rocrate_sssom_mapping.tsv @@ -1,104 +1,102 @@ # SSSOM (Simple Standard for Sharing Ontology Mappings) # Generated from D4D SKOS alignment -# Date: 2026-03-19T23:07:51.068512 +# Date: 2026-03-25T22:41:05.055671 # Subset: False # Total mappings: 95 # -# d4d_module: D4D schema module containing this attribute -# -d4d_schema_path subject_id subject_label d4d_module predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version in_rocrate_json in_pydantic_model in_interface_mapping d4d_module -Dataset.Dataset d4d:Dataset Dataset Unknown skos:exactMatch @graph[?@type='Dataset']['Dataset'] schema:Dataset Dataset semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false Unknown -Dataset.title d4d:title Title Unknown skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.description d4d:description Description Unknown skos:exactMatch @graph[?@type='Dataset']['description'] schema:description description semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.doi d4d:doi Doi Unknown skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.keywords d4d:keywords Keywords Unknown skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.language d4d:language Language Unknown skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.license d4d:license License Unknown skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.publisher d4d:publisher Publisher Unknown skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.version d4d:version Version Unknown skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.page d4d:page Page Unknown skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.download_url d4d:download_url Download Url Unknown skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.bytes d4d:bytes Bytes D4D_Base skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.md5 d4d:md5 Md5 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.sha256 d4d:sha256 Sha256 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.hash d4d:hash Hash D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.created_on d4d:created_on Created On Unknown skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.issued d4d:issued Issued Unknown skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.last_updated_on d4d:last_updated_on Last Updated On Unknown skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.status d4d:status Status Unknown skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.conforms_to d4d:conforms_to Conforms To Unknown skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false true Unknown -Dataset.was_derived_from d4d:was_derived_from Was Derived From Unknown skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['d4d:addressingGaps'] d4d:addressingGaps addressingGaps semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true D4D_Motivation -Dataset.anomalies d4d:anomalies Anomalies D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:anomalies'] d4d:anomalies anomalies semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.content_warnings d4d:content_warnings Content Warnings D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:contentWarnings'] d4d:contentWarnings contentWarnings semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.informed_consent d4d:informed_consent Informed Consent D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:informedConsent'] d4d:informedConsent informedConsent semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true D4D_Human -Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes D4D_Collection skos:exactMatch @graph[?@type='Dataset']['d4d:dataCollectionTimeframe'] d4d:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.confidential_elements d4d:confidential_elements Confidential Elements D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Ethics -Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Uses -Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Uses -Dataset.distribution_dates d4d:distribution_dates Distribution Dates D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Distribution -Dataset.errata d4d:errata Errata D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['correction'] schema:correction correction semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['rai:ethicalReview'] rai:ethicalReview ethicalReview semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Ethics -Dataset.existing_uses d4d:existing_uses Existing Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.intended_uses d4d:intended_uses Intended Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.other_tasks d4d:other_tasks Other Tasks D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.tasks d4d:tasks Tasks D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Motivation -Dataset.purposes d4d:purposes Purposes D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Motivation -Dataset.known_biases d4d:known_biases Known Biases D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:dataBiases'] rai:dataBiases dataBiases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.known_limitations d4d:known_limitations Known Limitations D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:dataLimitations'] rai:dataLimitations dataLimitations semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['rai:imputationProtocol'] rai:imputationProtocol imputationProtocol semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionMissingData'] rai:dataCollectionMissingData dataCollectionMissingData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.raw_sources d4d:raw_sources Raw Sources D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.updates d4d:updates Updates D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.human_subject_research d4d:human_subject_research Human Subject Research D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Human -Dataset.vulnerable_populations d4d:vulnerable_populations Vulnerable Populations Unknown skos:exactMatch @graph[?@type='Dataset']['rai:atRiskPopulations'] rai:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false true true Unknown -Dataset.distribution_formats d4d:distribution_formats Distribution Formats D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Distribution -Dataset.encoding d4d:encoding Encoding D4D_Base skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.funders d4d:funders Funders D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Motivation -Dataset.citation d4d:citation Citation D4D_Base skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true false D4D_Base -Dataset.format d4d:format Format D4D_Base skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false D4D_Base -DatasetCollection.parent_datasets d4d:parent_datasets Parent Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -DatasetCollection.related_datasets d4d:related_datasets Related Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.same_as d4d:same_as Same As Unknown skos:exactMatch @graph[?@type='Dataset']['sameAs'] schema:sameAs sameAs semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false Unknown -Dataset.variables d4d:variables Variables D4D_Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Variables -Dataset.id d4d:id Id Unknown skos:exactMatch @graph[?@type='Dataset']['@id'] rdf:ID ID semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-alignment-v1 1.0 false false false Unknown -Dataset.participant_compensation d4d:participant_compensation Participant Compensation D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:participantCompensation'] d4d:participantCompensation participantCompensation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false false D4D_Human -Dataset.creators d4d:creators Creators D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Motivation -Dataset.created_by d4d:created_by Created By Unknown skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.modified_by d4d:modified_by Modified By Unknown skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements D4D_Composition skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataManipulationProtocol'] rai:dataManipulationProtocol dataManipulationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataPreprocessingProtocol'] rai:dataPreprocessingProtocol dataPreprocessingProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationProtocol'] rai:dataAnnotationProtocol dataAnnotationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationAnalysis'] rai:dataAnnotationAnalysis dataAnnotationAnalysis semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false false D4D_Preprocessing -Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism D4D_Maintenance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance -Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.compression d4d:compression Compression Unknown skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.dialect d4d:dialect Dialect D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.media_type d4d:media_type Media Type D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.participant_privacy d4d:participant_privacy Participant Privacy D4D_Human skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false false D4D_Human -Dataset.themes d4d:themes Themes Unknown skos:closeMatch @graph[?@type='Dataset']['about'] schema:about about semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false Unknown -Dataset.external_resources d4d:external_resources External Resources D4D_Base skos:closeMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false D4D_Base -Dataset.instances d4d:instances Instances D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.subpopulations d4d:subpopulations Subpopulations D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.resources d4d:resources Resources D4D_Base skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.data_collectors d4d:data_collectors Data Collectors D4D_Collection skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.maintainers d4d:maintainers Maintainers D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.subsets d4d:subsets Subsets D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Composition -Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies D4D_Collection skos:relatedMatch @graph[?@type='Dataset']['evi:samplingPlan'] evi:samplingPlan samplingPlan semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.version_access d4d:version_access Version Access D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance -Dataset.use_repository d4d:use_repository Use Repository D4D_Uses skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.path d4d:path Path D4D_Base skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.is_deidentified d4d:is_deidentified Is Deidentified D4D_Base skos:narrowMatch @graph[?@type='Dataset']['rai:confidentialityLevel'] rai:confidentialityLevel confidentialityLevel semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.is_tabular d4d:is_tabular Is Tabular D4D_Base skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.retention_limit d4d:retention_limit Retention Limit D4D_Maintenance skos:narrowMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance -Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class Unknown skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false Unknown -Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema Unknown skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false Unknown +d4d_schema_path subject_id subject_label predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version in_rocrate_json in_pydantic_model in_interface_mapping +Dataset.Dataset d4d:Dataset Dataset skos:exactMatch @graph[?@type='Dataset']['Dataset'] schema:Dataset Dataset semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.title d4d:title Title skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.description d4d:description Description skos:exactMatch @graph[?@type='Dataset']['description'] schema:description description semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.doi d4d:doi Doi skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.keywords d4d:keywords Keywords skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.language d4d:language Language skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.license d4d:license License skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.publisher d4d:publisher Publisher skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.version d4d:version Version skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.page d4d:page Page skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.download_url d4d:download_url Download Url skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.bytes d4d:bytes Bytes skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.md5 d4d:md5 Md5 skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.sha256 d4d:sha256 Sha256 skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.hash d4d:hash Hash skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.created_on d4d:created_on Created On skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.issued d4d:issued Issued skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.last_updated_on d4d:last_updated_on Last Updated On skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.status d4d:status Status skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.conforms_to d4d:conforms_to Conforms To skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false true +Dataset.was_derived_from d4d:was_derived_from Was Derived From skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps skos:exactMatch @graph[?@type='Dataset']['d4d:addressingGaps'] d4d:addressingGaps addressingGaps semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.anomalies d4d:anomalies Anomalies skos:exactMatch @graph[?@type='Dataset']['d4d:dataAnomalies'] d4d:dataAnomalies dataAnomalies semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.content_warnings d4d:content_warnings Content Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:contentWarning'] d4d:contentWarning contentWarning semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.informed_consent d4d:informed_consent Informed Consent skos:exactMatch @graph[?@type='Dataset']['d4d:informedConsent'] d4d:informedConsent informedConsent semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionTimeframe'] rai:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.confidential_elements d4d:confidential_elements Confidential Elements skos:exactMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.distribution_dates d4d:distribution_dates Distribution Dates skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.errata d4d:errata Errata skos:exactMatch @graph[?@type='Dataset']['correction'] schema:correction correction semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews skos:exactMatch @graph[?@type='Dataset']['rai:ethicalReview'] rai:ethicalReview ethicalReview semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.existing_uses d4d:existing_uses Existing Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.intended_uses d4d:intended_uses Intended Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.other_tasks d4d:other_tasks Other Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.tasks d4d:tasks Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.purposes d4d:purposes Purposes skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.known_biases d4d:known_biases Known Biases skos:exactMatch @graph[?@type='Dataset']['rai:dataBiases'] rai:dataBiases dataBiases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.known_limitations d4d:known_limitations Known Limitations skos:exactMatch @graph[?@type='Dataset']['rai:dataLimitations'] rai:dataLimitations dataLimitations semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols skos:exactMatch @graph[?@type='Dataset']['rai:imputationProtocol'] rai:imputationProtocol imputationProtocol semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionMissingData'] rai:dataCollectionMissingData dataCollectionMissingData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.raw_sources d4d:raw_sources Raw Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.updates d4d:updates Updates skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.human_subject_research d4d:human_subject_research Human Subject Research skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.at_risk_populations d4d:at_risk_populations At Risk Populations skos:exactMatch @graph[?@type='Dataset']['d4d:atRiskPopulations'] d4d:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true false +Dataset.distribution_formats d4d:distribution_formats Distribution Formats skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.encoding d4d:encoding Encoding skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.funders d4d:funders Funders skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.citation d4d:citation Citation skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true false +Dataset.format d4d:format Format skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false +DatasetCollection.parent_datasets d4d:parent_datasets Parent Datasets skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +DatasetCollection.related_datasets d4d:related_datasets Related Datasets skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.same_as d4d:same_as Same As skos:exactMatch @graph[?@type='Dataset']['sameAs'] schema:sameAs sameAs semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.variables d4d:variables Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.id d4d:id Id skos:exactMatch @graph[?@type='Dataset']['@id'] rdf:ID ID semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-alignment-v1 1.0 false false false +Dataset.participant_compensation d4d:participant_compensation Participant Compensation skos:exactMatch @graph[?@type='Dataset']['d4d:participantCompensation'] d4d:participantCompensation participantCompensation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.creators d4d:creators Creators skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.created_by d4d:created_by Created By skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.modified_by d4d:modified_by Modified By skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataManipulationProtocol'] rai:dataManipulationProtocol dataManipulationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataPreprocessingProtocol'] rai:dataPreprocessingProtocol dataPreprocessingProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationProtocol'] rai:dataAnnotationProtocol dataAnnotationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationAnalysis'] rai:dataAnnotationAnalysis dataAnnotationAnalysis semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.compression d4d:compression Compression skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.dialect d4d:dialect Dialect skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.media_type d4d:media_type Media Type skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.participant_privacy d4d:participant_privacy Participant Privacy skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.themes d4d:themes Themes skos:closeMatch @graph[?@type='Dataset']['about'] schema:about about semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false +Dataset.external_resources d4d:external_resources External Resources skos:closeMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false false +Dataset.instances d4d:instances Instances skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.subpopulations d4d:subpopulations Subpopulations skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.resources d4d:resources Resources skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.data_collectors d4d:data_collectors Data Collectors skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.maintainers d4d:maintainers Maintainers skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.subsets d4d:subsets Subsets skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies skos:relatedMatch @graph[?@type='Dataset']['evi:samplingPlan'] evi:samplingPlan samplingPlan semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.version_access d4d:version_access Version Access skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.use_repository d4d:use_repository Use Repository skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.path d4d:path Path skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.is_deidentified d4d:is_deidentified Is Deidentified skos:narrowMatch @graph[?@type='Dataset']['rai:confidentialityLevel'] rai:confidentialityLevel confidentialityLevel semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.is_tabular d4d:is_tabular Is Tabular skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.retention_limit d4d:retention_limit Retention Limit skos:narrowMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false +Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false false diff --git a/data/mappings/d4d_rocrate_sssom_mapping_subset.tsv b/data/mappings/d4d_rocrate_sssom_mapping_subset.tsv index c60c13e5..e0e4e7b0 100644 --- a/data/mappings/d4d_rocrate_sssom_mapping_subset.tsv +++ b/data/mappings/d4d_rocrate_sssom_mapping_subset.tsv @@ -1,92 +1,89 @@ # SSSOM (Simple Standard for Sharing Ontology Mappings) # Generated from D4D SKOS alignment -# Date: 2026-03-19T23:07:51.071089 +# Date: 2026-03-25T22:41:05.058238 # Subset: True -# Total mappings: 83 +# Total mappings: 82 # -# d4d_module: D4D schema module containing this attribute -# -d4d_schema_path subject_id subject_label d4d_module predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version in_rocrate_json in_pydantic_model in_interface_mapping d4d_module -Dataset.title d4d:title Title Unknown skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.description d4d:description Description Unknown skos:exactMatch @graph[?@type='Dataset']['description'] schema:description description semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.doi d4d:doi Doi Unknown skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.keywords d4d:keywords Keywords Unknown skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.language d4d:language Language Unknown skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.license d4d:license License Unknown skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.publisher d4d:publisher Publisher Unknown skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.version d4d:version Version Unknown skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.page d4d:page Page Unknown skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.download_url d4d:download_url Download Url Unknown skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.bytes d4d:bytes Bytes D4D_Base skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.md5 d4d:md5 Md5 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.sha256 d4d:sha256 Sha256 D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.hash d4d:hash Hash D4D_Base skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.created_on d4d:created_on Created On Unknown skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.issued d4d:issued Issued Unknown skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true Unknown -Dataset.last_updated_on d4d:last_updated_on Last Updated On Unknown skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.status d4d:status Status Unknown skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.conforms_to d4d:conforms_to Conforms To Unknown skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false true Unknown -Dataset.was_derived_from d4d:was_derived_from Was Derived From Unknown skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['d4d:addressingGaps'] d4d:addressingGaps addressingGaps semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true D4D_Motivation -Dataset.anomalies d4d:anomalies Anomalies D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:anomalies'] d4d:anomalies anomalies semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.content_warnings d4d:content_warnings Content Warnings D4D_Composition skos:exactMatch @graph[?@type='Dataset']['d4d:contentWarnings'] d4d:contentWarnings contentWarnings semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.informed_consent d4d:informed_consent Informed Consent D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:informedConsent'] d4d:informedConsent informedConsent semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true D4D_Human -Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes D4D_Collection skos:exactMatch @graph[?@type='Dataset']['d4d:dataCollectionTimeframe'] d4d:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.confidential_elements d4d:confidential_elements Confidential Elements D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Ethics -Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Uses -Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Uses -Dataset.distribution_dates d4d:distribution_dates Distribution Dates D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Distribution -Dataset.errata d4d:errata Errata D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['correction'] schema:correction correction semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews D4D_Ethics skos:exactMatch @graph[?@type='Dataset']['rai:ethicalReview'] rai:ethicalReview ethicalReview semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Ethics -Dataset.existing_uses d4d:existing_uses Existing Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.intended_uses d4d:intended_uses Intended Uses D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.other_tasks d4d:other_tasks Other Tasks D4D_Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.tasks d4d:tasks Tasks D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Motivation -Dataset.purposes d4d:purposes Purposes D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Motivation -Dataset.known_biases d4d:known_biases Known Biases D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:dataBiases'] rai:dataBiases dataBiases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.known_limitations d4d:known_limitations Known Limitations D4D_Composition skos:exactMatch @graph[?@type='Dataset']['rai:dataLimitations'] rai:dataLimitations dataLimitations semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['rai:imputationProtocol'] rai:imputationProtocol imputationProtocol semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionMissingData'] rai:dataCollectionMissingData dataCollectionMissingData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources D4D_Collection skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.raw_sources d4d:raw_sources Raw Sources D4D_Preprocessing skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.updates d4d:updates Updates D4D_Maintenance skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.human_subject_research d4d:human_subject_research Human Subject Research D4D_Human skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Human -Dataset.vulnerable_populations d4d:vulnerable_populations Vulnerable Populations Unknown skos:exactMatch @graph[?@type='Dataset']['rai:atRiskPopulations'] rai:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false true true Unknown -Dataset.distribution_formats d4d:distribution_formats Distribution Formats D4D_Distribution skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Distribution -Dataset.encoding d4d:encoding Encoding D4D_Base skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.funders d4d:funders Funders D4D_Motivation skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Motivation -DatasetCollection.parent_datasets d4d:parent_datasets Parent Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -DatasetCollection.related_datasets d4d:related_datasets Related Datasets D4D_Base skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.variables d4d:variables Variables D4D_Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Variables -Dataset.creators d4d:creators Creators D4D_Motivation skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Motivation -Dataset.created_by d4d:created_by Created By Unknown skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.modified_by d4d:modified_by Modified By Unknown skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements D4D_Composition skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataManipulationProtocol'] rai:dataManipulationProtocol dataManipulationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataPreprocessingProtocol'] rai:dataPreprocessingProtocol dataPreprocessingProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationProtocol'] rai:dataAnnotationProtocol dataAnnotationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses D4D_Preprocessing skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationAnalysis'] rai:dataAnnotationAnalysis dataAnnotationAnalysis semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Preprocessing -Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism D4D_Maintenance skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance -Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions D4D_Data_Governance skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Data_Governance -Dataset.compression d4d:compression Compression Unknown skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true Unknown -Dataset.dialect d4d:dialect Dialect D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.media_type d4d:media_type Media Type D4D_Base skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.instances d4d:instances Instances D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.subpopulations d4d:subpopulations Subpopulations D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Composition -Dataset.resources d4d:resources Resources D4D_Base skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.data_collectors d4d:data_collectors Data Collectors D4D_Collection skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.maintainers d4d:maintainers Maintainers D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Maintenance -Dataset.subsets d4d:subsets Subsets D4D_Composition skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Composition -Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies D4D_Collection skos:relatedMatch @graph[?@type='Dataset']['evi:samplingPlan'] evi:samplingPlan samplingPlan semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true D4D_Collection -Dataset.version_access d4d:version_access Version Access D4D_Maintenance skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance -Dataset.use_repository d4d:use_repository Use Repository D4D_Uses skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Uses -Dataset.path d4d:path Path D4D_Base skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.is_deidentified d4d:is_deidentified Is Deidentified D4D_Base skos:narrowMatch @graph[?@type='Dataset']['rai:confidentialityLevel'] rai:confidentialityLevel confidentialityLevel semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Base -Dataset.is_tabular d4d:is_tabular Is Tabular D4D_Base skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true D4D_Base -Dataset.retention_limit d4d:retention_limit Retention Limit D4D_Maintenance skos:narrowMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-19 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true D4D_Maintenance +d4d_schema_path subject_id subject_label predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version in_rocrate_json in_pydantic_model in_interface_mapping +Dataset.title d4d:title Title skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.description d4d:description Description skos:exactMatch @graph[?@type='Dataset']['description'] schema:description description semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.doi d4d:doi Doi skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.keywords d4d:keywords Keywords skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.language d4d:language Language skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.license d4d:license License skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.publisher d4d:publisher Publisher skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.version d4d:version Version skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.page d4d:page Page skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.download_url d4d:download_url Download Url skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.bytes d4d:bytes Bytes skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.md5 d4d:md5 Md5 skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.sha256 d4d:sha256 Sha256 skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.hash d4d:hash Hash skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.created_on d4d:created_on Created On skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.issued d4d:issued Issued skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.last_updated_on d4d:last_updated_on Last Updated On skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.status d4d:status Status skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.conforms_to d4d:conforms_to Conforms To skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true false true +Dataset.was_derived_from d4d:was_derived_from Was Derived From skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps skos:exactMatch @graph[?@type='Dataset']['d4d:addressingGaps'] d4d:addressingGaps addressingGaps semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.anomalies d4d:anomalies Anomalies skos:exactMatch @graph[?@type='Dataset']['d4d:dataAnomalies'] d4d:dataAnomalies dataAnomalies semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.content_warnings d4d:content_warnings Content Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:contentWarning'] d4d:contentWarning contentWarning semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.informed_consent d4d:informed_consent Informed Consent skos:exactMatch @graph[?@type='Dataset']['d4d:informedConsent'] d4d:informedConsent informedConsent semapv:ManualMappingCuration 1.0 Source: Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false true true +Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionTimeframe'] rai:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.confidential_elements d4d:confidential_elements Confidential Elements skos:exactMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts skos:exactMatch @graph[?@type='Dataset']['rai:dataSocialImpact'] rai:dataSocialImpact dataSocialImpact semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.distribution_dates d4d:distribution_dates Distribution Dates skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.errata d4d:errata Errata skos:exactMatch @graph[?@type='Dataset']['correction'] schema:correction correction semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews skos:exactMatch @graph[?@type='Dataset']['rai:ethicalReview'] rai:ethicalReview ethicalReview semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.existing_uses d4d:existing_uses Existing Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.intended_uses d4d:intended_uses Intended Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.other_tasks d4d:other_tasks Other Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.tasks d4d:tasks Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.purposes d4d:purposes Purposes skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.known_biases d4d:known_biases Known Biases skos:exactMatch @graph[?@type='Dataset']['rai:dataBiases'] rai:dataBiases dataBiases semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.known_limitations d4d:known_limitations Known Limitations skos:exactMatch @graph[?@type='Dataset']['rai:dataLimitations'] rai:dataLimitations dataLimitations semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols skos:exactMatch @graph[?@type='Dataset']['rai:imputationProtocol'] rai:imputationProtocol imputationProtocol semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionMissingData'] rai:dataCollectionMissingData dataCollectionMissingData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.raw_sources d4d:raw_sources Raw Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.updates d4d:updates Updates skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.human_subject_research d4d:human_subject_research Human Subject Research skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.distribution_formats d4d:distribution_formats Distribution Formats skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.encoding d4d:encoding Encoding skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.funders d4d:funders Funders skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +DatasetCollection.parent_datasets d4d:parent_datasets Parent Datasets skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +DatasetCollection.related_datasets d4d:related_datasets Related Datasets skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.variables d4d:variables Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.creators d4d:creators Creators skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.created_by d4d:created_by Created By skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.modified_by d4d:modified_by Modified By skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataManipulationProtocol'] rai:dataManipulationProtocol dataManipulationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataPreprocessingProtocol'] rai:dataPreprocessingProtocol dataPreprocessingProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationProtocol'] rai:dataAnnotationProtocol dataAnnotationProtocol semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses skos:closeMatch @graph[?@type='Dataset']['rai:dataAnnotationAnalysis'] rai:dataAnnotationAnalysis dataAnnotationAnalysis semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.compression d4d:compression Compression skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.dialect d4d:dialect Dialect skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.media_type d4d:media_type Media Type skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.instances d4d:instances Instances skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.subpopulations d4d:subpopulations Subpopulations skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.resources d4d:resources Resources skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.data_collectors d4d:data_collectors Data Collectors skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.maintainers d4d:maintainers Maintainers skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.subsets d4d:subsets Subsets skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies skos:relatedMatch @graph[?@type='Dataset']['evi:samplingPlan'] evi:samplingPlan samplingPlan semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-alignment-v1 1.0 false false true +Dataset.version_access d4d:version_access Version Access skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.use_repository d4d:use_repository Use Repository skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.path d4d:path Path skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.is_deidentified d4d:is_deidentified Is Deidentified skos:narrowMatch @graph[?@type='Dataset']['rai:confidentialityLevel'] rai:confidentialityLevel confidentialityLevel semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-alignment-v1 1.0 true true true +Dataset.is_tabular d4d:is_tabular Is Tabular skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Source: Specification https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 false false true +Dataset.retention_limit d4d:retention_limit Retention Limit skos:narrowMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.8 Source: RO-Crate JSON + Pydantic https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-alignment-v1 1.0 true true true diff --git a/data/mappings/d4d_rocrate_sssom_uri_comprehensive_v1.tsv b/data/mappings/d4d_rocrate_sssom_uri_comprehensive_v1.tsv index c2141e9d..353dd4b0 100644 --- a/data/mappings/d4d_rocrate_sssom_uri_comprehensive_v1.tsv +++ b/data/mappings/d4d_rocrate_sssom_uri_comprehensive_v1.tsv @@ -1,288 +1,300 @@ # Comprehensive URI-level SSSOM - ALL D4D Attributes # Shows current and recommended slot_uri for every attribute -# Date: 2026-03-19T23:47:16.197869 -# Total attributes: 270 +# Date: 2026-04-09T10:17:21.922299 +# Total attributes: 284 # # Status breakdown: -# free_text: 54 -# mapped: 67 -# novel_d4d: 42 +# free_text: 55 +# mapped: 63 +# novel_d4d: 45 # recommended: 69 -# unmapped: 38 +# unmapped: 52 # -# Current slot_uri coverage: 31/270 (11.5%) -# Attributes needing slot_uri: 111/270 (41.1%) +# Current slot_uri coverage: 31/284 (10.9%) +# Attributes needing slot_uri: 114/284 (40.1%) # -# d4d_module: D4D schema module containing this attribute -# -d4d_slot_name d4d_module d4d_slot_uri_current subject_source predicate_id d4d_slot_uri_recommended object_id object_label object_source confidence mapping_justification comment mapping_status needs_slot_uri vocab_crosswalk author_id mapping_date mapping_set_id mapping_set_version d4d_module -access_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -access_url Unknown skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -access_urls Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -acquisition_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -acquisition_methods D4D_Collection skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -addressing_gaps D4D_Motivation skos:exactMatch d4d:addressing_gaps d4d:addressing_gaps addressing_gaps https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Motivation -affected_subsets Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -affiliation Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -affiliations Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -agreement_metric Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -analysis_method Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -annotation_analyses D4D_Preprocessing skos:exactMatch d4d:annotation_analyses d4d:annotation_analyses annotation_analyses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -annotation_quality_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -annotations_per_item Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -annotator_demographics Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -anomalies D4D_Composition skos:exactMatch d4d:anomalies d4d:anomalies anomalies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -anomaly_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -anonymization_method Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -archival Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -assent_procedures Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -bias_description Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -bias_type Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -bytes D4D_Base dcat:byteSize https://www.w3.org/ns/dcat# skos:exactMatch schema:contentSize schema:contentSize contentSize https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -categories Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -citation D4D_Base skos:exactMatch schema:citation schema:citation citation https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -cleaning_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -cleaning_strategies D4D_Preprocessing skos:exactMatch d4d:cleaning_strategies d4d:cleaning_strategies cleaning_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -collection_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -collection_mechanisms D4D_Collection skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -collection_timeframes D4D_Collection skos:exactMatch d4d:dataCollectionTimeframe d4d:dataCollectionTimeframe dataCollectionTimeframe https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -collector_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -comment_prefix Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -compensation_amount Unknown skos:exactMatch d4d:compensation_amount d4d:compensation_amount compensation_amount https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -compensation_provided Unknown skos:exactMatch d4d:compensation_provided d4d:compensation_provided compensation_provided https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -compensation_rationale Unknown skos:exactMatch d4d:compensation_rationale d4d:compensation_rationale compensation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -compensation_type Unknown skos:exactMatch d4d:compensation_type d4d:compensation_type compensation_type https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -compression Unknown dcat:compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -confidential_elements D4D_Composition skos:exactMatch d4d:confidential_elements d4d:confidential_elements confidential_elements https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -confidential_elements_present Unknown skos:exactMatch d4d:confidential_elements_present d4d:confidential_elements_present confidential_elements_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -confidentiality_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -confidentiality_level Unknown skos:exactMatch d4d:confidentiality_level d4d:confidentiality_level confidentiality_level https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -conforms_to Unknown dcterms:conformsTo http://purl.org/dc/terms/ skos:exactMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -conforms_to_class Unknown dcterms:conformsTo http://purl.org/dc/terms/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -conforms_to_schema Unknown dcterms:conformsTo http://purl.org/dc/terms/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -consent_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -consent_documentation Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -consent_obtained Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -consent_scope Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -consent_type Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -contact_person Unknown skos:exactMatch d4d:contact_person d4d:contact_person contact_person https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -content_warnings D4D_Composition skos:exactMatch d4d:content_warnings d4d:content_warnings content_warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -content_warnings_present Unknown skos:exactMatch d4d:content_warnings_present d4d:content_warnings_present content_warnings_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -contribution_url Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -counts Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -created_by Unknown dcterms:creator http://purl.org/dc/terms/ skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -created_on Unknown dcterms:created http://purl.org/dc/terms/ skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -creators D4D_Motivation skos:closeMatch schema:author schema:author author https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Motivation -credit_roles Unknown skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_annotation_platform Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_annotation_protocol Unknown skos:exactMatch d4d:data_annotation_protocol d4d:data_annotation_protocol data_annotation_protocol https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_collectors D4D_Collection skos:relatedMatch schema:contributor schema:contributor contributor https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -data_linkage Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_protection_impacts D4D_Ethics skos:exactMatch d4d:data_protection_impacts d4d:data_protection_impacts data_protection_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Ethics -data_substrate Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_topic Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_type Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -data_use_permission Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -deidentification_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -delimiter Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -derivation Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -description Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -dialect D4D_Base schema:encodingFormat https://schema.org/ skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -disagreement_patterns Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -discouraged_uses D4D_Uses skos:exactMatch rai:prohibitedUses rai:prohibitedUses prohibitedUses http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -discouragement_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -distribution Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -distribution_dates D4D_Distribution skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Distribution -distribution_formats D4D_Distribution skos:exactMatch evi:formats evi:formats formats https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Distribution -doi Unknown dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch schema:identifier schema:identifier identifier https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -double_quote Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -download_url Unknown dcat:downloadURL https://www.w3.org/ns/dcat# skos:exactMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -email Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -encoding D4D_Base dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -end_date Unknown skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -errata D4D_Maintenance skos:exactMatch d4d:errata d4d:errata errata https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -erratum_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -erratum_url Unknown skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -ethical_reviews D4D_Ethics skos:exactMatch d4d:ethical_reviews d4d:ethical_reviews ethical_reviews https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Ethics -ethics_review_board Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -examples Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -existing_uses D4D_Uses skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -extension_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -extension_mechanism D4D_Maintenance skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -external_resources D4D_Base dcterms:references http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -format D4D_Base dcterms:format http://purl.org/dc/terms/ skos:exactMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -frequency Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -funders D4D_Motivation skos:exactMatch schema:funder schema:funder funder https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Motivation -future_guarantees Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -future_use_impacts D4D_Uses skos:exactMatch d4d:future_use_impacts d4d:future_use_impacts future_use_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -governance_committee_contact Unknown skos:exactMatch d4d:governance_committee_contact d4d:governance_committee_contact governance_committee_contact https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -grant_number Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -grantor Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -grants Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -guardian_consent Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -handling_strategy Unknown skos:exactMatch d4d:handling_strategy d4d:handling_strategy handling_strategy https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -hash D4D_Base dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -header Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -hipaa_compliant Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -human_subject_research D4D_Human skos:exactMatch d4d:humanSubject d4d:humanSubject humanSubject https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Human -id Unknown skos:exactMatch rdf:ID rdf:ID ID unknown 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -identifiable_elements_present Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -identification Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -identifiers_removed Unknown skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -impact_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -imputation_method Unknown skos:exactMatch d4d:imputation_method d4d:imputation_method imputation_method https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -imputation_protocols D4D_Preprocessing skos:exactMatch d4d:imputation_protocols d4d:imputation_protocols imputation_protocols https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -imputation_rationale Unknown skos:exactMatch d4d:imputation_rationale d4d:imputation_rationale imputation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -imputation_validation Unknown skos:exactMatch d4d:imputation_validation d4d:imputation_validation imputation_validation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -imputed_fields Unknown skos:exactMatch d4d:imputed_fields d4d:imputed_fields imputed_fields https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -informed_consent D4D_Human semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Human -instance_type Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -instances D4D_Composition skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -intended_uses D4D_Uses skos:exactMatch d4d:intended_uses d4d:intended_uses intended_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -inter_annotator_agreement Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -inter_annotator_agreement_score Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -involves_human_subjects Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -ip_restrictions D4D_Data_Governance skos:closeMatch schema:conditionsOfAccess schema:conditionsOfAccess conditionsOfAccess https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Data_Governance -irb_approval Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_data_split D4D_Base semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -is_deidentified D4D_Base skos:exactMatch d4d:is_deidentified d4d:is_deidentified is_deidentified https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -is_direct Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_identifier Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_random Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_representative Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_sample Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_sensitive Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_shared Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -is_subpopulation D4D_Base semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -is_tabular D4D_Base skos:narrowMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -issued Unknown dcterms:issued http://purl.org/dc/terms/ skos:exactMatch schema:datePublished schema:datePublished datePublished https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -keywords Unknown dcat:keyword https://www.w3.org/ns/dcat# skos:exactMatch schema:keywords schema:keywords keywords https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -known_biases D4D_Composition skos:exactMatch d4d:known_biases d4d:known_biases known_biases https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -known_limitations D4D_Composition skos:exactMatch d4d:known_limitations d4d:known_limitations known_limitations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -label Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -label_description Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -labeling_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -labeling_strategies D4D_Preprocessing skos:exactMatch d4d:labeling_strategies d4d:labeling_strategies labeling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -language Unknown dcterms:language http://purl.org/dc/terms/ skos:exactMatch schema:inLanguage schema:inLanguage inLanguage https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -last_updated_on Unknown dcterms:modified http://purl.org/dc/terms/ skos:exactMatch schema:dateModified schema:dateModified dateModified https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -latest_version_doi Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -license Unknown dcterms:license http://purl.org/dc/terms/ skos:exactMatch schema:license schema:license license https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -license_and_use_terms D4D_Data_Governance skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Data_Governance -license_terms Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -limitation_description Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -limitation_type Unknown skos:closeMatch schema:temporalCoverage schema:temporalCoverage temporalCoverage https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -machine_annotation_tools D4D_Preprocessing skos:closeMatch rai:machineAnnotationTools rai:machineAnnotationTools machineAnnotationTools http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -maintainer_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -maintainers D4D_Maintenance skos:relatedMatch schema:maintainer schema:maintainer maintainer https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -maximum_value Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -md5 D4D_Base dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -measurement_technique Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -mechanism_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -media_type D4D_Base dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -method Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -minimum_value Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -missing Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -missing_data_causes Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -missing_data_documentation D4D_Collection semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -missing_data_patterns Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -missing_information Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -missing_value_code Unknown skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -mitigation_strategy Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -modified_by Unknown dcterms:contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor schema:contributor contributor https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -name Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -notification_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -orcid Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -other_compliance Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -other_tasks D4D_Uses skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -page Unknown dcat:landingPage https://www.w3.org/ns/dcat# skos:exactMatch schema:url schema:url url https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -parent_datasets D4D_Base skos:exactMatch schema:isPartOf schema:isPartOf isPartOf https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -participant_compensation D4D_Human skos:exactMatch d4d:participant_compensation d4d:participant_compensation participant_compensation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Human -participant_privacy D4D_Human skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Human -path D4D_Base schema:contentUrl https://schema.org/ skos:narrowMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -precision Unknown skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -preprocessing_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -preprocessing_strategies D4D_Preprocessing skos:exactMatch d4d:preprocessing_strategies d4d:preprocessing_strategies preprocessing_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -principal_investigator Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -privacy_techniques Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -prohibited_uses D4D_Uses skos:exactMatch d4d:prohibited_uses d4d:prohibited_uses prohibited_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -prohibition_reason Unknown skos:exactMatch d4d:prohibition_reason d4d:prohibition_reason prohibition_reason https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -publisher Unknown dcterms:publisher http://purl.org/dc/terms/ skos:exactMatch schema:publisher schema:publisher publisher https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -purposes D4D_Motivation skos:closeMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Motivation -quality_notes Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -quote_char Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -raw_data_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -raw_data_format Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -raw_data_sources D4D_Collection semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -raw_sources D4D_Preprocessing skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Preprocessing -recommended_mitigation Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -regulatory_compliance Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -regulatory_restrictions D4D_Data_Governance skos:closeMatch schema:conditionsOfAccess schema:conditionsOfAccess conditionsOfAccess https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Data_Governance -reidentification_risk Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -related_datasets D4D_Base skos:exactMatch schema:isRelatedTo schema:isRelatedTo isRelatedTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -relationship_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -relationship_type Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -release_dates Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -repository_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -repository_url Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -representative_verification Unknown skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -resources D4D_Base schema:hasPart https://schema.org/ skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -response Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -restrictions Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -retention_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -retention_limit D4D_Maintenance skos:exactMatch d4d:retention_limit d4d:retention_limit retention_limit https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -retention_period Unknown skos:exactMatch d4d:retention_period d4d:retention_period retention_period https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -review_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -reviewing_organization Unknown skos:exactMatch d4d:reviewing_organization d4d:reviewing_organization reviewing_organization https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -revocation_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -role Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -sampling_strategies D4D_Collection skos:exactMatch d4d:sampling_strategies d4d:sampling_strategies sampling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Collection -scope_impact Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -sensitive_elements D4D_Composition skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -sensitive_elements_present Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -sensitivity_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -sha256 D4D_Base dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:sha256 evi:sha256 sha256 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Base -source_data Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -source_description Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -source_type Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -special_populations Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -special_protections Unknown skos:exactMatch d4d:special_protections d4d:special_protections special_protections https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -split_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -start_date Unknown skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -status Unknown dcterms:type http://purl.org/dc/terms/ skos:exactMatch schema:creativeWorkStatus schema:creativeWorkStatus creativeWorkStatus https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -strategies Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -subpopulation_elements_present Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -subpopulations D4D_Composition skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -subsets D4D_Composition skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Composition -target_dataset Unknown skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -task_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -tasks D4D_Motivation skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Motivation -timeframe_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -title Unknown dcterms:title http://purl.org/dc/terms/ skos:exactMatch schema:name schema:name name https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -tool_accuracy Unknown skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -tool_descriptions Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -tools Unknown skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -unit Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -update_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -updates D4D_Maintenance skos:exactMatch rai:dataReleaseMaintenancePlan rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -url Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -usage_notes Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -use_category Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -use_repository D4D_Uses skos:relatedMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Uses -used_software Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -variable_name Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -variables D4D_Variables skos:exactMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Variables -version Unknown dcterms:hasVersion http://purl.org/dc/terms/ skos:exactMatch schema:version schema:version version https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -version_access D4D_Maintenance skos:relatedMatch schema:version schema:version version https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 D4D_Maintenance -version_details Unknown semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -versions_available Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -vulnerable_groups_included Unknown skos:exactMatch d4d:vulnerable_groups_included d4d:vulnerable_groups_included vulnerable_groups_included https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -vulnerable_populations Unknown skos:exactMatch d4d:vulnerable_populations d4d:vulnerable_populations vulnerable_populations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -warnings Unknown skos:exactMatch d4d:warnings d4d:warnings warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -was_derived_from Unknown prov:wasDerivedFrom http://www.w3.org/ns/prov# skos:exactMatch schema:isBasedOn schema:isBasedOn isBasedOn https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -was_directly_observed Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -was_inferred_derived Unknown skos:closeMatch prov:wasDerivedFrom prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -was_reported_by_subjects Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -was_validated_verified Unknown skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -why_missing Unknown semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -why_not_representative Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown -withdrawal_mechanism Unknown semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-comprehensive-v1 1.0 Unknown +d4d_slot_name d4d_slot_uri_current subject_source predicate_id d4d_slot_uri_recommended object_id object_label object_source confidence mapping_justification comment mapping_status needs_slot_uri vocab_crosswalk author_id mapping_date mapping_set_id mapping_set_version +access_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +access_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +access_urls semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +acquisition_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +acquisition_methods skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +addressing_gaps skos:exactMatch d4d:addressing_gaps d4d:addressing_gaps addressing_gaps https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affected_subsets semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affiliation semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affiliations semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +agreement_metric semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +analysis_method semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotation_analyses skos:exactMatch d4d:annotation_analyses d4d:annotation_analyses annotation_analyses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotation_quality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotations_per_item semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotator_demographics semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anomalies skos:exactMatch d4d:dataAnomalies d4d:dataAnomalies dataAnomalies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anomaly_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anonymization_method semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +archival semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +assent_procedures semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +at_risk_groups_included semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +at_risk_populations skos:exactMatch d4d:atRiskPopulations d4d:atRiskPopulations atRiskPopulations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bias_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bias_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bytes dcat:byteSize https://www.w3.org/ns/dcat# skos:exactMatch schema:contentSize schema:contentSize contentSize https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +categories semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +citation skos:exactMatch schema:citation schema:citation citation https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +cleaning_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +cleaning_strategies skos:exactMatch d4d:cleaning_strategies d4d:cleaning_strategies cleaning_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_consents semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_mechanisms skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_notifications semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_timeframes skos:exactMatch d4d:collection_timeframes d4d:collection_timeframes collection_timeframes https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_type semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collector_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +comment_prefix semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_amount skos:exactMatch d4d:compensation_amount d4d:compensation_amount compensation_amount https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_provided skos:exactMatch d4d:compensation_provided d4d:compensation_provided compensation_provided https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_rationale skos:exactMatch d4d:compensation_rationale d4d:compensation_rationale compensation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_type skos:exactMatch d4d:compensation_type d4d:compensation_type compensation_type https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compression dcat:compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidential_elements skos:exactMatch d4d:confidential_elements d4d:confidential_elements confidential_elements https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidential_elements_present skos:exactMatch d4d:confidential_elements_present d4d:confidential_elements_present confidential_elements_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidentiality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidentiality_level skos:exactMatch d4d:confidentiality_level d4d:confidentiality_level confidentiality_level https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to dcterms:conformsTo http://purl.org/dc/terms/ skos:exactMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to_class d4d:conformsToClass https://w3id.org/bridge2ai/data-sheets-schema/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to_schema d4d:conformsToSchema https://w3id.org/bridge2ai/data-sheets-schema/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_obtained semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_revocations semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_scope semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +contact_person skos:exactMatch d4d:contact_person d4d:contact_person contact_person https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +content_warnings skos:exactMatch d4d:content_warnings d4d:content_warnings content_warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +content_warnings_present skos:exactMatch d4d:content_warnings_present d4d:content_warnings_present content_warnings_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +contribution_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +counts semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +created_by dcterms:creator http://purl.org/dc/terms/ skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +created_on dcterms:created http://purl.org/dc/terms/ skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +creators skos:closeMatch schema:author schema:author author https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +credit_roles skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_annotation_platform semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_annotation_protocol skos:exactMatch d4d:data_annotation_protocol d4d:data_annotation_protocol data_annotation_protocol https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_collectors skos:relatedMatch schema:contributor schema:contributor contributor https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_linkage semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_protection_impacts skos:exactMatch d4d:data_protection_impacts d4d:data_protection_impacts data_protection_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_substrate semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_topic semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_use_permission semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +deidentification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +delimiter semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +derivation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +dialect schema:encodingFormat https://schema.org/ skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +direct_collection semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +disagreement_patterns semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +discouraged_uses skos:exactMatch d4d:discouraged_uses d4d:discouraged_uses discouraged_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +discouragement_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution_dates skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution_formats skos:exactMatch evi:formats evi:formats formats https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +doi d4d:doiIdentifier https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch schema:identifier schema:identifier identifier https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +double_quote semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +download_url dcat:downloadURL https://www.w3.org/ns/dcat# skos:exactMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +email semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +encoding d4d:characterEncoding https://w3id.org/bridge2ai/data-sheets-schema/ skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +end_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +errata skos:exactMatch d4d:errata d4d:errata errata https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +erratum_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +erratum_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ethical_reviews skos:exactMatch d4d:ethical_reviews d4d:ethical_reviews ethical_reviews https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ethics_review_board semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +examples semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +existing_uses skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +extension_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +extension_mechanism skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +external_resources dcterms:references http://purl.org/dc/terms/ semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_collections semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_count semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_type semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +format dcterms:format http://purl.org/dc/terms/ skos:exactMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +frequency semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +funders skos:exactMatch schema:funder schema:funder funder https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +future_guarantees semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +future_use_impacts skos:exactMatch d4d:future_use_impacts d4d:future_use_impacts future_use_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +governance_committee_contact skos:exactMatch d4d:governance_committee_contact d4d:governance_committee_contact governance_committee_contact https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grant_number semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grantor semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grants semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +guardian_consent semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +handling_strategy skos:exactMatch d4d:handling_strategy d4d:handling_strategy handling_strategy https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +hash d4d:hashValue https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +header semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +hipaa_compliant semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +human_subject_research skos:exactMatch d4d:humanSubject d4d:humanSubject humanSubject https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +id skos:exactMatch rdf:ID rdf:ID ID unknown 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identifiable_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identification semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identifiers_removed skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +impact_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_method skos:exactMatch d4d:imputation_method d4d:imputation_method imputation_method https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_protocols skos:exactMatch d4d:imputation_protocols d4d:imputation_protocols imputation_protocols https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_rationale skos:exactMatch d4d:imputation_rationale d4d:imputation_rationale imputation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_validation skos:exactMatch d4d:imputation_validation d4d:imputation_validation imputation_validation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputed_fields skos:exactMatch d4d:imputed_fields d4d:imputed_fields imputed_fields https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +informed_consent semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +instance_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +instances skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +intended_uses skos:exactMatch d4d:intended_uses d4d:intended_uses intended_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +inter_annotator_agreement semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +inter_annotator_agreement_score semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +involves_human_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ip_restrictions skos:exactMatch d4d:ip_restrictions d4d:ip_restrictions ip_restrictions https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +irb_approval semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_data_split semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_deidentified skos:exactMatch d4d:is_deidentified d4d:is_deidentified is_deidentified https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_direct semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_identifier semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_random semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_sample semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_sensitive semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_shared semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_subpopulation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_tabular skos:narrowMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +issued dcterms:issued http://purl.org/dc/terms/ skos:exactMatch schema:datePublished schema:datePublished datePublished https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +keywords dcat:keyword https://www.w3.org/ns/dcat# skos:exactMatch schema:keywords schema:keywords keywords https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +known_biases skos:exactMatch d4d:known_biases d4d:known_biases known_biases https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +known_limitations skos:exactMatch d4d:known_limitations d4d:known_limitations known_limitations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +label semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +label_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +labeling_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +labeling_strategies skos:exactMatch d4d:labeling_strategies d4d:labeling_strategies labeling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +language dcterms:language http://purl.org/dc/terms/ skos:exactMatch schema:inLanguage schema:inLanguage inLanguage https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +last_updated_on dcterms:modified http://purl.org/dc/terms/ skos:exactMatch schema:dateModified schema:dateModified dateModified https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +latest_version_doi semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license dcterms:license http://purl.org/dc/terms/ skos:exactMatch schema:license schema:license license https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license_and_use_terms skos:exactMatch d4d:license_and_use_terms d4d:license_and_use_terms license_and_use_terms https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license_terms semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +limitation_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +limitation_type skos:closeMatch schema:temporalCoverage schema:temporalCoverage temporalCoverage https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +machine_annotation_tools skos:closeMatch rai:machineAnnotationTools rai:machineAnnotationTools machineAnnotationTools http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maintainer_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maintainers skos:relatedMatch schema:maintainer schema:maintainer maintainer https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maximum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +md5 d4d:md5Checksum https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +measurement_technique semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +mechanism_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +media_type dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +method semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +minimum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_causes semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_patterns semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_information semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_value_code skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +mitigation_strategy semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +modified_by dcterms:contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor schema:contributor contributor https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +notification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +orcid semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +other_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +other_tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +page dcat:landingPage https://www.w3.org/ns/dcat# skos:exactMatch schema:url schema:url url https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +parent_datasets skos:exactMatch schema:isPartOf schema:isPartOf isPartOf https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +participant_compensation skos:exactMatch d4d:participant_compensation d4d:participant_compensation participant_compensation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +participant_privacy skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +path schema:contentUrl https://schema.org/ skos:narrowMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +precision skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +preprocessing_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +preprocessing_strategies skos:exactMatch d4d:preprocessing_strategies d4d:preprocessing_strategies preprocessing_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +principal_investigator semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +privacy_techniques semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +prohibited_uses skos:exactMatch d4d:prohibited_uses d4d:prohibited_uses prohibited_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +prohibition_reason skos:exactMatch d4d:prohibition_reason d4d:prohibition_reason prohibition_reason https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +publisher dcterms:publisher http://purl.org/dc/terms/ skos:exactMatch schema:publisher schema:publisher publisher https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +purposes skos:closeMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +quality_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +quote_char semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_format semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_sources skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_sources skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +recommended_mitigation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +regulatory_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +regulatory_restrictions skos:exactMatch d4d:regulatory_restrictions d4d:regulatory_restrictions regulatory_restrictions https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +reidentification_risk semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +related_datasets skos:exactMatch schema:isRelatedTo schema:isRelatedTo isRelatedTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationship_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationship_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationships semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +release_dates semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +repository_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +repository_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +representative_verification skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +resources schema:hasPart https://schema.org/ skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +response semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +restrictions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_limit skos:exactMatch d4d:retention_limit d4d:retention_limit retention_limit https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_period skos:exactMatch d4d:retention_period d4d:retention_period retention_period https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +review_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +reviewing_organization skos:exactMatch d4d:reviewing_organization d4d:reviewing_organization reviewing_organization https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +revocation_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +role semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sampling_strategies skos:exactMatch d4d:sampling_strategies d4d:sampling_strategies sampling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +scope_impact semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitive_elements skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitive_elements_present semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitivity_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sha256 schema:sha256 https://schema.org/ skos:exactMatch evi:sha256 evi:sha256 sha256 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_data semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +special_populations semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +special_protections semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +split_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +splits semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +start_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +status d4d:publicationStatus https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch schema:creativeWorkStatus schema:creativeWorkStatus creativeWorkStatus https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +strategies skos:exactMatch d4d:strategies d4d:strategies strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subpopulation_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subpopulations skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subsets skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +target_dataset skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +task_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +third_party_sharing semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +timeframe_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +title dcterms:title http://purl.org/dc/terms/ skos:exactMatch schema:name schema:name name https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tool_accuracy skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tool_descriptions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tools skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_bytes semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_file_count semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_size_bytes semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +unit semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +update_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +updates skos:exactMatch rai:dataReleaseMaintenancePlan rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +url semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +usage_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +use_category semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +use_repository skos:relatedMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +used_software semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +variable_name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +variables skos:exactMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version schema:version https://schema.org/ skos:exactMatch schema:version schema:version version https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version_access skos:relatedMatch schema:version schema:version version https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +versions_available semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +warnings skos:exactMatch d4d:warnings d4d:warnings warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_derived_from prov:wasDerivedFrom http://www.w3.org/ns/prov# skos:exactMatch schema:isBasedOn schema:isBasedOn isBasedOn https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_directly_observed semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_inferred_derived skos:closeMatch prov:wasDerivedFrom prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_reported_by_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_validated_verified skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +why_missing semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +why_not_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +withdrawal_mechanism semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 diff --git a/data/mappings/d4d_rocrate_sssom_uri_mapping.tsv b/data/mappings/d4d_rocrate_sssom_uri_mapping.tsv index f8e1f447..3596eaf1 100644 --- a/data/mappings/d4d_rocrate_sssom_uri_mapping.tsv +++ b/data/mappings/d4d_rocrate_sssom_uri_mapping.tsv @@ -1,45 +1,43 @@ # SSSOM URI-level Mapping (D4D slot URIs ↔ RO-Crate property URIs) # Generated from D4D LinkML schema slot_uri definitions -# Date: 2026-03-19T23:15:33.148007 +# Date: 2026-04-09T10:17:17.452134 # Total mappings: 33 # # Maps at the vocabulary/semantic level using: # - D4D: slot_uri from LinkML schema (dcterms, dcat, schema, prov) # - RO-Crate: JSON-LD property URIs (schema.org, EVI, RAI, D4D) # -# d4d_module: D4D schema module containing this attribute -# -subject_id subject_label d4d_module subject_source predicate_id object_id object_label object_source mapping_justification confidence comment author_id mapping_date mapping_set_id mapping_set_version d4d_slot_name vocab_crosswalk d4d_module -schema:sameAs sameAs Unknown https://schema.org/ skos:exactMatch schema:sameAs sameAs https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'same_as' (slot_uri: schema:sameAs) → RO-Crate 'schema:sameAs' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 same_as false Unknown -dcat:theme theme Unknown https://www.w3.org/ns/dcat# skos:closeMatch schema:about about https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'themes' (slot_uri: dcat:theme) → RO-Crate 'schema:about' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 themes true Unknown -dcterms:title title Unknown http://purl.org/dc/terms/ skos:closeMatch schema:name name https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'title' (slot_uri: dcterms:title) → RO-Crate 'schema:name' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 title true Unknown -dcterms:language language Unknown http://purl.org/dc/terms/ skos:closeMatch schema:inLanguage inLanguage https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'language' (slot_uri: dcterms:language) → RO-Crate 'schema:inLanguage' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 language true Unknown -dcterms:publisher publisher Unknown http://purl.org/dc/terms/ skos:closeMatch schema:publisher publisher https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'publisher' (slot_uri: dcterms:publisher) → RO-Crate 'schema:publisher' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 publisher true Unknown -dcterms:issued issued Unknown http://purl.org/dc/terms/ skos:closeMatch schema:datePublished datePublished https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'issued' (slot_uri: dcterms:issued) → RO-Crate 'schema:datePublished' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 issued true Unknown -dcat:landingPage landingPage Unknown https://www.w3.org/ns/dcat# skos:closeMatch schema:url url https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'page' (slot_uri: dcat:landingPage) → RO-Crate 'schema:url' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 page true Unknown -schema:encodingFormat encodingFormat D4D_Base https://schema.org/ skos:exactMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'dialect' (slot_uri: schema:encodingFormat) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 dialect false D4D_Base -dcat:byteSize byteSize D4D_Base https://www.w3.org/ns/dcat# skos:closeMatch schema:contentSize contentSize https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'bytes' (slot_uri: dcat:byteSize) → RO-Crate 'schema:contentSize' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 bytes true D4D_Base -schema:contentUrl contentUrl D4D_Base https://schema.org/ skos:exactMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'path' (slot_uri: schema:contentUrl) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 path false D4D_Base -dcat:downloadURL downloadURL Unknown https://www.w3.org/ns/dcat# skos:closeMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'download_url' (slot_uri: dcat:downloadURL) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 download_url true Unknown -dcterms:format format D4D_Base http://purl.org/dc/terms/ skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'format' (slot_uri: dcterms:format) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 format true D4D_Base -dcat:mediaType mediaType D4D_Base https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'encoding' (slot_uri: dcat:mediaType) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 encoding true D4D_Base -dcat:compressFormat compressFormat Unknown https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'compression' (slot_uri: dcat:compressFormat) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 compression true Unknown -dcat:mediaType mediaType D4D_Base https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'media_type' (slot_uri: dcat:mediaType) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 media_type true D4D_Base -dcterms:identifier identifier D4D_Base http://purl.org/dc/terms/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'hash' (slot_uri: dcterms:identifier) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 hash true D4D_Base -dcterms:identifier identifier D4D_Base http://purl.org/dc/terms/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'md5' (slot_uri: dcterms:identifier) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 md5 true D4D_Base -dcterms:identifier identifier D4D_Base http://purl.org/dc/terms/ skos:relatedMatch evi:sha256 sha256 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'sha256' (slot_uri: dcterms:identifier) → RO-Crate 'evi:sha256' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 sha256 true D4D_Base -dcterms:conformsTo conformsTo Unknown http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 conforms_to true Unknown -dcterms:conformsTo conformsTo Unknown http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to_schema' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_schema true Unknown -dcterms:conformsTo conformsTo Unknown http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to_class' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_class true Unknown -dcterms:license license Unknown http://purl.org/dc/terms/ skos:closeMatch schema:license license https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'license' (slot_uri: dcterms:license) → RO-Crate 'schema:license' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 license true Unknown -dcat:keyword keyword Unknown https://www.w3.org/ns/dcat# skos:closeMatch schema:keywords keywords https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'keywords' (slot_uri: dcat:keyword) → RO-Crate 'schema:keywords' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 keywords true Unknown -dcterms:hasVersion hasVersion Unknown http://purl.org/dc/terms/ skos:closeMatch schema:version version https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'version' (slot_uri: dcterms:hasVersion) → RO-Crate 'schema:version' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 version true Unknown -dcterms:creator creator Unknown http://purl.org/dc/terms/ skos:closeMatch schema:creator creator https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_by' (slot_uri: dcterms:creator) → RO-Crate 'schema:creator' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 created_by true Unknown -dcterms:created created Unknown http://purl.org/dc/terms/ skos:closeMatch schema:dateCreated dateCreated https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_on' (slot_uri: dcterms:created) → RO-Crate 'schema:dateCreated' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 created_on true Unknown -dcterms:modified modified Unknown http://purl.org/dc/terms/ skos:closeMatch schema:dateModified dateModified https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'last_updated_on' (slot_uri: dcterms:modified) → RO-Crate 'schema:dateModified' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 last_updated_on true Unknown -dcterms:contributor contributor Unknown http://purl.org/dc/terms/ skos:closeMatch schema:contributor contributor https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'modified_by' (slot_uri: dcterms:contributor) → RO-Crate 'schema:contributor' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 modified_by true Unknown -dcterms:type type Unknown http://purl.org/dc/terms/ skos:closeMatch schema:creativeWorkStatus creativeWorkStatus https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'status' (slot_uri: dcterms:type) → RO-Crate 'schema:creativeWorkStatus' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 status true Unknown -prov:wasDerivedFrom wasDerivedFrom Unknown http://www.w3.org/ns/prov# skos:relatedMatch schema:isBasedOn isBasedOn https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'was_derived_from' (slot_uri: prov:wasDerivedFrom) → RO-Crate 'schema:isBasedOn' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 was_derived_from true Unknown -dcterms:identifier identifier Unknown http://purl.org/dc/terms/ skos:closeMatch schema:identifier identifier https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'doi' (slot_uri: dcterms:identifier) → RO-Crate 'schema:identifier' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 doi true Unknown -dcterms:references references D4D_Base http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink relatedLink https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'external_resources' (slot_uri: dcterms:references) → RO-Crate 'schema:relatedLink' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 external_resources true D4D_Base -schema:hasPart hasPart D4D_Base https://schema.org/ skos:exactMatch schema:hasPart hasPart https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'resources' (slot_uri: schema:hasPart) → RO-Crate 'schema:hasPart' https://orcid.org/0000-0000-0000-0000 2026-03-19 d4d-rocrate-uri-alignment-v1 1.0 resources false D4D_Base +subject_id subject_label subject_source predicate_id object_id object_label object_source mapping_justification confidence comment author_id mapping_date mapping_set_id mapping_set_version d4d_slot_name vocab_crosswalk +schema:sameAs sameAs https://schema.org/ skos:exactMatch schema:sameAs sameAs https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'same_as' (slot_uri: schema:sameAs) → RO-Crate 'schema:sameAs' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 same_as false +dcat:theme theme https://www.w3.org/ns/dcat# skos:closeMatch schema:about about https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'themes' (slot_uri: dcat:theme) → RO-Crate 'schema:about' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 themes true +dcterms:title title http://purl.org/dc/terms/ skos:closeMatch schema:name name https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'title' (slot_uri: dcterms:title) → RO-Crate 'schema:name' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 title true +dcterms:language language http://purl.org/dc/terms/ skos:closeMatch schema:inLanguage inLanguage https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'language' (slot_uri: dcterms:language) → RO-Crate 'schema:inLanguage' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 language true +dcterms:publisher publisher http://purl.org/dc/terms/ skos:closeMatch schema:publisher publisher https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'publisher' (slot_uri: dcterms:publisher) → RO-Crate 'schema:publisher' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 publisher true +dcterms:issued issued http://purl.org/dc/terms/ skos:closeMatch schema:datePublished datePublished https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'issued' (slot_uri: dcterms:issued) → RO-Crate 'schema:datePublished' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 issued true +dcat:landingPage landingPage https://www.w3.org/ns/dcat# skos:closeMatch schema:url url https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'page' (slot_uri: dcat:landingPage) → RO-Crate 'schema:url' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 page true +schema:encodingFormat encodingFormat https://schema.org/ skos:exactMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'dialect' (slot_uri: schema:encodingFormat) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 dialect false +dcat:byteSize byteSize https://www.w3.org/ns/dcat# skos:closeMatch schema:contentSize contentSize https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'bytes' (slot_uri: dcat:byteSize) → RO-Crate 'schema:contentSize' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 bytes true +schema:contentUrl contentUrl https://schema.org/ skos:exactMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'path' (slot_uri: schema:contentUrl) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 path false +dcat:downloadURL downloadURL https://www.w3.org/ns/dcat# skos:closeMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'download_url' (slot_uri: dcat:downloadURL) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 download_url true +dcterms:format format http://purl.org/dc/terms/ skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'format' (slot_uri: dcterms:format) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 format true +d4d:characterEncoding characterEncoding https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'encoding' (slot_uri: d4d:characterEncoding) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 encoding true +dcat:compressFormat compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'compression' (slot_uri: dcat:compressFormat) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 compression true +dcat:mediaType mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'media_type' (slot_uri: dcat:mediaType) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 media_type true +d4d:hashValue hashValue https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'hash' (slot_uri: d4d:hashValue) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 hash true +d4d:md5Checksum md5Checksum https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'md5' (slot_uri: d4d:md5Checksum) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 md5 true +schema:sha256 sha256 https://schema.org/ skos:relatedMatch evi:sha256 sha256 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'sha256' (slot_uri: schema:sha256) → RO-Crate 'evi:sha256' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 sha256 true +dcterms:conformsTo conformsTo http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to true +d4d:conformsToSchema conformsToSchema https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'conforms_to_schema' (slot_uri: d4d:conformsToSchema) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_schema true +d4d:conformsToClass conformsToClass https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'conforms_to_class' (slot_uri: d4d:conformsToClass) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_class true +dcterms:license license http://purl.org/dc/terms/ skos:closeMatch schema:license license https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'license' (slot_uri: dcterms:license) → RO-Crate 'schema:license' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 license true +dcat:keyword keyword https://www.w3.org/ns/dcat# skos:closeMatch schema:keywords keywords https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'keywords' (slot_uri: dcat:keyword) → RO-Crate 'schema:keywords' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 keywords true +schema:version version https://schema.org/ skos:exactMatch schema:version version https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'version' (slot_uri: schema:version) → RO-Crate 'schema:version' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 version false +dcterms:creator creator http://purl.org/dc/terms/ skos:closeMatch schema:creator creator https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_by' (slot_uri: dcterms:creator) → RO-Crate 'schema:creator' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 created_by true +dcterms:created created http://purl.org/dc/terms/ skos:closeMatch schema:dateCreated dateCreated https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_on' (slot_uri: dcterms:created) → RO-Crate 'schema:dateCreated' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 created_on true +dcterms:modified modified http://purl.org/dc/terms/ skos:closeMatch schema:dateModified dateModified https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'last_updated_on' (slot_uri: dcterms:modified) → RO-Crate 'schema:dateModified' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 last_updated_on true +dcterms:contributor contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor contributor https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'modified_by' (slot_uri: dcterms:contributor) → RO-Crate 'schema:contributor' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 modified_by true +d4d:publicationStatus publicationStatus https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:creativeWorkStatus creativeWorkStatus https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'status' (slot_uri: d4d:publicationStatus) → RO-Crate 'schema:creativeWorkStatus' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 status true +prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# skos:relatedMatch schema:isBasedOn isBasedOn https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'was_derived_from' (slot_uri: prov:wasDerivedFrom) → RO-Crate 'schema:isBasedOn' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 was_derived_from true +d4d:doiIdentifier doiIdentifier https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:identifier identifier https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'doi' (slot_uri: d4d:doiIdentifier) → RO-Crate 'schema:identifier' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 doi true +dcterms:references references http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink relatedLink https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'external_resources' (slot_uri: dcterms:references) → RO-Crate 'schema:relatedLink' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 external_resources true +schema:hasPart hasPart https://schema.org/ skos:exactMatch schema:hasPart hasPart https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'resources' (slot_uri: schema:hasPart) → RO-Crate 'schema:hasPart' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 resources false diff --git a/data/mappings/d4d_rocrate_structural_mapping.sssom.tsv b/data/mappings/d4d_rocrate_structural_mapping.sssom.tsv index ead6df02..68725263 100644 --- a/data/mappings/d4d_rocrate_structural_mapping.sssom.tsv +++ b/data/mappings/d4d_rocrate_structural_mapping.sssom.tsv @@ -1,145 +1,150 @@ -# d4d_module: D4D schema module containing this attribute -# -subject_id subject_label subject_category d4d_module predicate_id object_id object_label mapping_justification confidence subject_source object_source subject_type subject_multivalued object_type type_compatible composition_path structural_notes warnings d4d_module -d4d:Purpose/name name Purpose D4D_Motivation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Purpose D4D_Motivation -d4d:Purpose/description description Purpose D4D_Motivation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Purpose D4D_Motivation -d4d:Task/name name Task D4D_Motivation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Task D4D_Motivation -d4d:Task/description description Task D4D_Motivation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Task D4D_Motivation -d4d:AddressingGap/name name AddressingGap D4D_Motivation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AddressingGap D4D_Motivation -d4d:AddressingGap/description description AddressingGap D4D_Motivation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AddressingGap D4D_Motivation -d4d:Creator/principal_investigator principal_investigator Creator D4D_Motivation skos:exactMatch principalInvestigator principalInvestigator semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape Person False str True Mapped via DatasetProperty hierarchy from Creator D4D_Motivation -d4d:Creator/name name Creator D4D_Motivation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Creator D4D_Motivation -d4d:Creator/description description Creator D4D_Motivation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Creator D4D_Motivation -d4d:FundingMechanism/name name FundingMechanism D4D_Motivation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FundingMechanism D4D_Motivation -d4d:FundingMechanism/description description FundingMechanism D4D_Motivation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FundingMechanism D4D_Motivation -d4d:Instance/name name Instance D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Instance D4D_Composition -d4d:Instance/description description Instance D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Instance D4D_Composition -d4d:SamplingStrategy/name name SamplingStrategy D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SamplingStrategy D4D_Composition -d4d:SamplingStrategy/description description SamplingStrategy D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SamplingStrategy D4D_Composition -d4d:MissingInfo/name name MissingInfo D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingInfo D4D_Composition -d4d:MissingInfo/description description MissingInfo D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingInfo D4D_Composition -d4d:Relationships/name name Relationships D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Relationships D4D_Composition -d4d:Relationships/description description Relationships D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Relationships D4D_Composition -d4d:Splits/name name Splits D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Splits D4D_Composition -d4d:Splits/description description Splits D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Splits D4D_Composition -d4d:DataAnomaly/name name DataAnomaly D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataAnomaly D4D_Composition -d4d:DataAnomaly/description description DataAnomaly D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataAnomaly D4D_Composition -d4d:DatasetBias/name name DatasetBias D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetBias D4D_Composition -d4d:DatasetBias/description description DatasetBias D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetBias D4D_Composition -d4d:DatasetLimitation/name name DatasetLimitation D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetLimitation D4D_Composition -d4d:DatasetLimitation/description description DatasetLimitation D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetLimitation D4D_Composition -d4d:ExternalResource/name name ExternalResource D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExternalResource D4D_Composition -d4d:ExternalResource/description description ExternalResource D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExternalResource D4D_Composition -d4d:Confidentiality/name name Confidentiality D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Confidentiality D4D_Composition -d4d:Confidentiality/description description Confidentiality D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Confidentiality D4D_Composition -d4d:ContentWarning/name name ContentWarning D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ContentWarning D4D_Composition -d4d:ContentWarning/description description ContentWarning D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ContentWarning D4D_Composition -d4d:Subpopulation/name name Subpopulation D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Subpopulation D4D_Composition -d4d:Subpopulation/description description Subpopulation D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Subpopulation D4D_Composition -d4d:Deidentification/name name Deidentification D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Deidentification D4D_Composition -d4d:Deidentification/description description Deidentification D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Deidentification D4D_Composition -d4d:SensitiveElement/name name SensitiveElement D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SensitiveElement D4D_Composition -d4d:SensitiveElement/description description SensitiveElement D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SensitiveElement D4D_Composition -d4d:DatasetRelationship/description description DatasetRelationship D4D_Composition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetRelationship D4D_Composition -d4d:DatasetRelationship/name name DatasetRelationship D4D_Composition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetRelationship D4D_Composition -d4d:InstanceAcquisition/name name InstanceAcquisition D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InstanceAcquisition D4D_Collection -d4d:InstanceAcquisition/description description InstanceAcquisition D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InstanceAcquisition D4D_Collection -d4d:CollectionMechanism/name name CollectionMechanism D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionMechanism D4D_Collection -d4d:CollectionMechanism/description description CollectionMechanism D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionMechanism D4D_Collection -d4d:DataCollector/name name DataCollector D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataCollector D4D_Collection -d4d:DataCollector/description description DataCollector D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataCollector D4D_Collection -d4d:CollectionTimeframe/name name CollectionTimeframe D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionTimeframe D4D_Collection -d4d:CollectionTimeframe/description description CollectionTimeframe D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionTimeframe D4D_Collection -d4d:DirectCollection/name name DirectCollection D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DirectCollection D4D_Collection -d4d:DirectCollection/description description DirectCollection D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DirectCollection D4D_Collection -d4d:MissingDataDocumentation/name name MissingDataDocumentation D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingDataDocumentation D4D_Collection -d4d:MissingDataDocumentation/description description MissingDataDocumentation D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingDataDocumentation D4D_Collection -d4d:RawDataSource/name name RawDataSource D4D_Collection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawDataSource D4D_Collection -d4d:RawDataSource/description description RawDataSource D4D_Collection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawDataSource D4D_Collection -d4d:PreprocessingStrategy/name name PreprocessingStrategy D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from PreprocessingStrategy D4D_Preprocessing -d4d:PreprocessingStrategy/description description PreprocessingStrategy D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from PreprocessingStrategy D4D_Preprocessing -d4d:CleaningStrategy/name name CleaningStrategy D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CleaningStrategy D4D_Preprocessing -d4d:CleaningStrategy/description description CleaningStrategy D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CleaningStrategy D4D_Preprocessing -d4d:LabelingStrategy/name name LabelingStrategy D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LabelingStrategy D4D_Preprocessing -d4d:LabelingStrategy/description description LabelingStrategy D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LabelingStrategy D4D_Preprocessing -d4d:RawData/name name RawData D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawData D4D_Preprocessing -d4d:RawData/description description RawData D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawData D4D_Preprocessing -d4d:ImputationProtocol/name name ImputationProtocol D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ImputationProtocol D4D_Preprocessing -d4d:ImputationProtocol/description description ImputationProtocol D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ImputationProtocol D4D_Preprocessing -d4d:AnnotationAnalysis/name name AnnotationAnalysis D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AnnotationAnalysis D4D_Preprocessing -d4d:AnnotationAnalysis/description description AnnotationAnalysis D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AnnotationAnalysis D4D_Preprocessing -d4d:MachineAnnotationTools/name name MachineAnnotationTools D4D_Preprocessing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MachineAnnotationTools D4D_Preprocessing -d4d:MachineAnnotationTools/description description MachineAnnotationTools D4D_Preprocessing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MachineAnnotationTools D4D_Preprocessing -d4d:ExistingUse/name name ExistingUse D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExistingUse D4D_Uses -d4d:ExistingUse/description description ExistingUse D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExistingUse D4D_Uses -d4d:UseRepository/name name UseRepository D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UseRepository D4D_Uses -d4d:UseRepository/description description UseRepository D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UseRepository D4D_Uses -d4d:OtherTask/name name OtherTask D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from OtherTask D4D_Uses -d4d:OtherTask/description description OtherTask D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from OtherTask D4D_Uses -d4d:FutureUseImpact/name name FutureUseImpact D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FutureUseImpact D4D_Uses -d4d:FutureUseImpact/description description FutureUseImpact D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FutureUseImpact D4D_Uses -d4d:DiscouragedUse/name name DiscouragedUse D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DiscouragedUse D4D_Uses -d4d:DiscouragedUse/description description DiscouragedUse D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DiscouragedUse D4D_Uses -d4d:IntendedUse/name name IntendedUse D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IntendedUse D4D_Uses -d4d:IntendedUse/description description IntendedUse D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IntendedUse D4D_Uses -d4d:ProhibitedUse/name name ProhibitedUse D4D_Uses skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ProhibitedUse D4D_Uses -d4d:ProhibitedUse/description description ProhibitedUse D4D_Uses skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ProhibitedUse D4D_Uses -d4d:ThirdPartySharing/name name ThirdPartySharing D4D_Distribution skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ThirdPartySharing D4D_Distribution -d4d:ThirdPartySharing/description description ThirdPartySharing D4D_Distribution skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ThirdPartySharing D4D_Distribution -d4d:DistributionFormat/name name DistributionFormat D4D_Distribution skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionFormat D4D_Distribution -d4d:DistributionFormat/description description DistributionFormat D4D_Distribution skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionFormat D4D_Distribution -d4d:DistributionDate/name name DistributionDate D4D_Distribution skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionDate D4D_Distribution -d4d:DistributionDate/description description DistributionDate D4D_Distribution skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionDate D4D_Distribution -d4d:Maintainer/name name Maintainer D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Maintainer D4D_Maintenance -d4d:Maintainer/description description Maintainer D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Maintainer D4D_Maintenance -d4d:Erratum/name name Erratum D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Erratum D4D_Maintenance -d4d:Erratum/description description Erratum D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Erratum D4D_Maintenance -d4d:UpdatePlan/name name UpdatePlan D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UpdatePlan D4D_Maintenance -d4d:UpdatePlan/description description UpdatePlan D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UpdatePlan D4D_Maintenance -d4d:RetentionLimits/name name RetentionLimits D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RetentionLimits D4D_Maintenance -d4d:RetentionLimits/description description RetentionLimits D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RetentionLimits D4D_Maintenance -d4d:VersionAccess/name name VersionAccess D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VersionAccess D4D_Maintenance -d4d:VersionAccess/description description VersionAccess D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VersionAccess D4D_Maintenance -d4d:ExtensionMechanism/name name ExtensionMechanism D4D_Maintenance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExtensionMechanism D4D_Maintenance -d4d:ExtensionMechanism/description description ExtensionMechanism D4D_Maintenance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExtensionMechanism D4D_Maintenance -d4d:EthicalReview/name name EthicalReview D4D_Ethics skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from EthicalReview D4D_Ethics -d4d:EthicalReview/description description EthicalReview D4D_Ethics skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from EthicalReview D4D_Ethics -d4d:DataProtectionImpact/name name DataProtectionImpact D4D_Ethics skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataProtectionImpact D4D_Ethics -d4d:DataProtectionImpact/description description DataProtectionImpact D4D_Ethics skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataProtectionImpact D4D_Ethics -d4d:CollectionNotification/name name CollectionNotification D4D_Ethics skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionNotification D4D_Ethics -d4d:CollectionNotification/description description CollectionNotification D4D_Ethics skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionNotification D4D_Ethics -d4d:CollectionConsent/name name CollectionConsent D4D_Ethics skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionConsent D4D_Ethics -d4d:CollectionConsent/description description CollectionConsent D4D_Ethics skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionConsent D4D_Ethics -d4d:ConsentRevocation/name name ConsentRevocation D4D_Ethics skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ConsentRevocation D4D_Ethics -d4d:ConsentRevocation/description description ConsentRevocation D4D_Ethics skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ConsentRevocation D4D_Ethics -d4d:HumanSubjectResearch/name name HumanSubjectResearch D4D_Human skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectResearch D4D_Human -d4d:HumanSubjectResearch/description description HumanSubjectResearch D4D_Human skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectResearch D4D_Human -d4d:InformedConsent/name name InformedConsent D4D_Human skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InformedConsent D4D_Human -d4d:InformedConsent/description description InformedConsent D4D_Human skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InformedConsent D4D_Human -d4d:ParticipantPrivacy/name name ParticipantPrivacy D4D_Human skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ParticipantPrivacy D4D_Human -d4d:ParticipantPrivacy/description description ParticipantPrivacy D4D_Human skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ParticipantPrivacy D4D_Human -d4d:HumanSubjectCompensation/name name HumanSubjectCompensation D4D_Human skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectCompensation D4D_Human -d4d:HumanSubjectCompensation/description description HumanSubjectCompensation D4D_Human skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectCompensation D4D_Human -d4d:VulnerablePopulations/name name VulnerablePopulations Unknown skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VulnerablePopulations Unknown -d4d:VulnerablePopulations/description description VulnerablePopulations Unknown skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VulnerablePopulations Unknown -d4d:LicenseAndUseTerms/name name LicenseAndUseTerms D4D_Data_Governance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LicenseAndUseTerms D4D_Data_Governance -d4d:LicenseAndUseTerms/description description LicenseAndUseTerms D4D_Data_Governance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LicenseAndUseTerms D4D_Data_Governance -d4d:IPRestrictions/name name IPRestrictions D4D_Data_Governance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IPRestrictions D4D_Data_Governance -d4d:IPRestrictions/description description IPRestrictions D4D_Data_Governance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IPRestrictions D4D_Data_Governance -d4d:ExportControlRegulatoryRestrictions/confidentiality_level confidentiality_level ExportControlRegulatoryRestrictions D4D_Data_Governance skos:exactMatch confidentialityLevel confidentialityLevel semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape ConfidentialityLevelEnum False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions D4D_Data_Governance -d4d:ExportControlRegulatoryRestrictions/name name ExportControlRegulatoryRestrictions D4D_Data_Governance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions D4D_Data_Governance -d4d:ExportControlRegulatoryRestrictions/description description ExportControlRegulatoryRestrictions D4D_Data_Governance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions D4D_Data_Governance -d4d:VariableMetadata/name name VariableMetadata D4D_Variables skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VariableMetadata D4D_Variables -d4d:VariableMetadata/description description VariableMetadata D4D_Variables skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VariableMetadata D4D_Variables -d4d:Dataset/anomalies anomalies Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies Composition path: anomalies Unknown -d4d:Dataset/anomaly_details anomaly_details Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.anomaly_details Composition path: anomalies.anomaly_details Unknown -d4d:Dataset/id id Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.id Composition path: anomalies.id Unknown -d4d:Dataset/name name Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.name Composition path: anomalies.name Unknown -d4d:Dataset/description description Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.description Composition path: anomalies.description Unknown -d4d:Dataset/used_software used_software Dataset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.used_software Composition path: anomalies.used_software Unknown -d4d:DataSubset/anomalies anomalies DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies Composition path: anomalies Unknown -d4d:DataSubset/anomaly_details anomaly_details DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.anomaly_details Composition path: anomalies.anomaly_details Unknown -d4d:DataSubset/id id DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.id Composition path: anomalies.id Unknown -d4d:DataSubset/name name DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.name Composition path: anomalies.name Unknown -d4d:DataSubset/description description DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.description Composition path: anomalies.description Unknown -d4d:DataSubset/used_software used_software DataSubset Unknown skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.used_software Composition path: anomalies.used_software Unknown +subject_id subject_label subject_category predicate_id object_id object_label mapping_justification confidence subject_source object_source subject_type subject_multivalued object_type type_compatible composition_path structural_notes warnings +d4d:Dataset/addressing_gaps addressing_gaps Dataset skos:closeMatch d4d:addressingGaps d4d:addressingGaps semapv:SemanticSimilarity 0.6 d4d:data_sheets_schema rocrate:fairscape AddressingGap True str False slot_uri mapping: d4d:addressingGaps Cardinality mismatch: multivalued slot mapping to single value +d4d:Dataset/informed_consent informed_consent Dataset skos:closeMatch d4d:informedConsent d4d:informedConsent semapv:SemanticSimilarity 0.6 d4d:data_sheets_schema rocrate:fairscape InformedConsent True str False slot_uri mapping: d4d:informedConsent Cardinality mismatch: multivalued slot mapping to single value +d4d:Dataset/at_risk_populations at_risk_populations Dataset skos:exactMatch d4d:atRiskPopulations d4d:atRiskPopulations semapv:SemanticSimilarity 0.9 d4d:data_sheets_schema rocrate:fairscape AtRiskPopulations False str True slot_uri mapping: d4d:atRiskPopulations +d4d:DataSubset/addressing_gaps addressing_gaps DataSubset skos:closeMatch d4d:addressingGaps d4d:addressingGaps semapv:SemanticSimilarity 0.6 d4d:data_sheets_schema rocrate:fairscape AddressingGap True str False slot_uri mapping: d4d:addressingGaps Cardinality mismatch: multivalued slot mapping to single value +d4d:DataSubset/informed_consent informed_consent DataSubset skos:closeMatch d4d:informedConsent d4d:informedConsent semapv:SemanticSimilarity 0.6 d4d:data_sheets_schema rocrate:fairscape InformedConsent True str False slot_uri mapping: d4d:informedConsent Cardinality mismatch: multivalued slot mapping to single value +d4d:DataSubset/at_risk_populations at_risk_populations DataSubset skos:exactMatch d4d:atRiskPopulations d4d:atRiskPopulations semapv:SemanticSimilarity 0.9 d4d:data_sheets_schema rocrate:fairscape AtRiskPopulations False str True slot_uri mapping: d4d:atRiskPopulations +d4d:LabelingStrategy/data_annotation_platform data_annotation_platform LabelingStrategy skos:exactMatch rai:dataAnnotationPlatform rai:dataAnnotationPlatform semapv:SemanticSimilarity 0.9 d4d:data_sheets_schema rocrate:fairscape string True list True slot_uri mapping: rai:dataAnnotationPlatform +d4d:Purpose/name name Purpose skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Purpose +d4d:Purpose/description description Purpose skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Purpose +d4d:Task/name name Task skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Task +d4d:Task/description description Task skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Task +d4d:AddressingGap/name name AddressingGap skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AddressingGap +d4d:AddressingGap/description description AddressingGap skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AddressingGap +d4d:Creator/principal_investigator principal_investigator Creator skos:exactMatch principalInvestigator principalInvestigator semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape Person False str True Mapped via DatasetProperty hierarchy from Creator +d4d:Creator/name name Creator skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Creator +d4d:Creator/description description Creator skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Creator +d4d:FundingMechanism/name name FundingMechanism skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FundingMechanism +d4d:FundingMechanism/description description FundingMechanism skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FundingMechanism +d4d:Instance/name name Instance skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Instance +d4d:Instance/description description Instance skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Instance +d4d:SamplingStrategy/name name SamplingStrategy skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SamplingStrategy +d4d:SamplingStrategy/description description SamplingStrategy skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SamplingStrategy +d4d:MissingInfo/name name MissingInfo skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingInfo +d4d:MissingInfo/description description MissingInfo skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingInfo +d4d:Relationships/name name Relationships skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Relationships +d4d:Relationships/description description Relationships skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Relationships +d4d:Splits/name name Splits skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Splits +d4d:Splits/description description Splits skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Splits +d4d:DataAnomaly/name name DataAnomaly skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataAnomaly +d4d:DataAnomaly/description description DataAnomaly skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataAnomaly +d4d:DatasetBias/name name DatasetBias skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetBias +d4d:DatasetBias/description description DatasetBias skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetBias +d4d:DatasetLimitation/name name DatasetLimitation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetLimitation +d4d:DatasetLimitation/description description DatasetLimitation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetLimitation +d4d:ExternalResource/name name ExternalResource skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExternalResource +d4d:ExternalResource/description description ExternalResource skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExternalResource +d4d:Confidentiality/name name Confidentiality skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Confidentiality +d4d:Confidentiality/description description Confidentiality skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Confidentiality +d4d:ContentWarning/name name ContentWarning skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ContentWarning +d4d:ContentWarning/description description ContentWarning skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ContentWarning +d4d:Subpopulation/name name Subpopulation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Subpopulation +d4d:Subpopulation/description description Subpopulation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Subpopulation +d4d:Deidentification/name name Deidentification skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Deidentification +d4d:Deidentification/description description Deidentification skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Deidentification +d4d:SensitiveElement/name name SensitiveElement skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SensitiveElement +d4d:SensitiveElement/description description SensitiveElement skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from SensitiveElement +d4d:DatasetRelationship/description description DatasetRelationship skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetRelationship +d4d:DatasetRelationship/name name DatasetRelationship skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DatasetRelationship +d4d:InstanceAcquisition/name name InstanceAcquisition skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InstanceAcquisition +d4d:InstanceAcquisition/description description InstanceAcquisition skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InstanceAcquisition +d4d:CollectionMechanism/name name CollectionMechanism skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionMechanism +d4d:CollectionMechanism/description description CollectionMechanism skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionMechanism +d4d:DataCollector/name name DataCollector skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataCollector +d4d:DataCollector/description description DataCollector skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataCollector +d4d:CollectionTimeframe/name name CollectionTimeframe skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionTimeframe +d4d:CollectionTimeframe/description description CollectionTimeframe skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionTimeframe +d4d:DirectCollection/name name DirectCollection skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DirectCollection +d4d:DirectCollection/description description DirectCollection skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DirectCollection +d4d:MissingDataDocumentation/name name MissingDataDocumentation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingDataDocumentation +d4d:MissingDataDocumentation/description description MissingDataDocumentation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MissingDataDocumentation +d4d:RawDataSource/name name RawDataSource skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawDataSource +d4d:RawDataSource/description description RawDataSource skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawDataSource +d4d:PreprocessingStrategy/name name PreprocessingStrategy skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from PreprocessingStrategy +d4d:PreprocessingStrategy/description description PreprocessingStrategy skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from PreprocessingStrategy +d4d:CleaningStrategy/name name CleaningStrategy skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CleaningStrategy +d4d:CleaningStrategy/description description CleaningStrategy skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CleaningStrategy +d4d:LabelingStrategy/name name LabelingStrategy skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LabelingStrategy +d4d:LabelingStrategy/description description LabelingStrategy skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LabelingStrategy +d4d:RawData/name name RawData skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawData +d4d:RawData/description description RawData skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RawData +d4d:ImputationProtocol/name name ImputationProtocol skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ImputationProtocol +d4d:ImputationProtocol/description description ImputationProtocol skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ImputationProtocol +d4d:AnnotationAnalysis/name name AnnotationAnalysis skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AnnotationAnalysis +d4d:AnnotationAnalysis/description description AnnotationAnalysis skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AnnotationAnalysis +d4d:MachineAnnotationTools/name name MachineAnnotationTools skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MachineAnnotationTools +d4d:MachineAnnotationTools/description description MachineAnnotationTools skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from MachineAnnotationTools +d4d:ExistingUse/name name ExistingUse skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExistingUse +d4d:ExistingUse/description description ExistingUse skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExistingUse +d4d:UseRepository/name name UseRepository skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UseRepository +d4d:UseRepository/description description UseRepository skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UseRepository +d4d:OtherTask/name name OtherTask skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from OtherTask +d4d:OtherTask/description description OtherTask skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from OtherTask +d4d:FutureUseImpact/name name FutureUseImpact skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FutureUseImpact +d4d:FutureUseImpact/description description FutureUseImpact skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from FutureUseImpact +d4d:DiscouragedUse/name name DiscouragedUse skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DiscouragedUse +d4d:DiscouragedUse/description description DiscouragedUse skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DiscouragedUse +d4d:IntendedUse/name name IntendedUse skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IntendedUse +d4d:IntendedUse/description description IntendedUse skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IntendedUse +d4d:ProhibitedUse/name name ProhibitedUse skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ProhibitedUse +d4d:ProhibitedUse/description description ProhibitedUse skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ProhibitedUse +d4d:ThirdPartySharing/name name ThirdPartySharing skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ThirdPartySharing +d4d:ThirdPartySharing/description description ThirdPartySharing skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ThirdPartySharing +d4d:DistributionFormat/name name DistributionFormat skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionFormat +d4d:DistributionFormat/description description DistributionFormat skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionFormat +d4d:DistributionDate/name name DistributionDate skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionDate +d4d:DistributionDate/description description DistributionDate skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DistributionDate +d4d:Maintainer/name name Maintainer skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Maintainer +d4d:Maintainer/description description Maintainer skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Maintainer +d4d:Erratum/name name Erratum skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Erratum +d4d:Erratum/description description Erratum skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from Erratum +d4d:UpdatePlan/name name UpdatePlan skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UpdatePlan +d4d:UpdatePlan/description description UpdatePlan skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from UpdatePlan +d4d:RetentionLimits/name name RetentionLimits skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RetentionLimits +d4d:RetentionLimits/description description RetentionLimits skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from RetentionLimits +d4d:VersionAccess/name name VersionAccess skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VersionAccess +d4d:VersionAccess/description description VersionAccess skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VersionAccess +d4d:ExtensionMechanism/name name ExtensionMechanism skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExtensionMechanism +d4d:ExtensionMechanism/description description ExtensionMechanism skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExtensionMechanism +d4d:EthicalReview/name name EthicalReview skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from EthicalReview +d4d:EthicalReview/description description EthicalReview skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from EthicalReview +d4d:DataProtectionImpact/name name DataProtectionImpact skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataProtectionImpact +d4d:DataProtectionImpact/description description DataProtectionImpact skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from DataProtectionImpact +d4d:CollectionNotification/name name CollectionNotification skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionNotification +d4d:CollectionNotification/description description CollectionNotification skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionNotification +d4d:CollectionConsent/name name CollectionConsent skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionConsent +d4d:CollectionConsent/description description CollectionConsent skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from CollectionConsent +d4d:ConsentRevocation/name name ConsentRevocation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ConsentRevocation +d4d:ConsentRevocation/description description ConsentRevocation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ConsentRevocation +d4d:HumanSubjectResearch/name name HumanSubjectResearch skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectResearch +d4d:HumanSubjectResearch/description description HumanSubjectResearch skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectResearch +d4d:InformedConsent/name name InformedConsent skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InformedConsent +d4d:InformedConsent/description description InformedConsent skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from InformedConsent +d4d:ParticipantPrivacy/name name ParticipantPrivacy skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ParticipantPrivacy +d4d:ParticipantPrivacy/description description ParticipantPrivacy skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ParticipantPrivacy +d4d:HumanSubjectCompensation/name name HumanSubjectCompensation skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectCompensation +d4d:HumanSubjectCompensation/description description HumanSubjectCompensation skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from HumanSubjectCompensation +d4d:AtRiskPopulations/name name AtRiskPopulations skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AtRiskPopulations +d4d:AtRiskPopulations/description description AtRiskPopulations skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from AtRiskPopulations +d4d:LicenseAndUseTerms/name name LicenseAndUseTerms skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LicenseAndUseTerms +d4d:LicenseAndUseTerms/description description LicenseAndUseTerms skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from LicenseAndUseTerms +d4d:IPRestrictions/name name IPRestrictions skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IPRestrictions +d4d:IPRestrictions/description description IPRestrictions skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from IPRestrictions +d4d:ExportControlRegulatoryRestrictions/confidentiality_level confidentiality_level ExportControlRegulatoryRestrictions skos:exactMatch confidentialityLevel confidentialityLevel semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape ConfidentialityLevelEnum False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions +d4d:ExportControlRegulatoryRestrictions/name name ExportControlRegulatoryRestrictions skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions +d4d:ExportControlRegulatoryRestrictions/description description ExportControlRegulatoryRestrictions skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from ExportControlRegulatoryRestrictions +d4d:VariableMetadata/name name VariableMetadata skos:exactMatch name name semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VariableMetadata +d4d:VariableMetadata/description description VariableMetadata skos:exactMatch description description semapv:StructuralMapping 1.0 d4d:data_sheets_schema rocrate:fairscape string False str True Mapped via DatasetProperty hierarchy from VariableMetadata +d4d:Dataset/anomalies anomalies Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies Composition path: anomalies +d4d:Dataset/anomaly_details anomaly_details Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.anomaly_details Composition path: anomalies.anomaly_details +d4d:Dataset/id id Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.id Composition path: anomalies.id +d4d:Dataset/name name Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.name Composition path: anomalies.name +d4d:Dataset/description description Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.description Composition path: anomalies.description +d4d:Dataset/used_software used_software Dataset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.used_software Composition path: anomalies.used_software +d4d:DataSubset/anomalies anomalies DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies Composition path: anomalies +d4d:DataSubset/anomaly_details anomaly_details DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.anomaly_details Composition path: anomalies.anomaly_details +d4d:DataSubset/id id DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.id Composition path: anomalies.id +d4d:DataSubset/name name DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.name Composition path: anomalies.name +d4d:DataSubset/description description DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.description Composition path: anomalies.description +d4d:DataSubset/used_software used_software DataSubset skos:closeMatch d4d:dataAnomalies d4d:dataAnomalies semapv:StructuralMapping 0.7 d4d:data_sheets_schema rocrate:fairscape string False str True anomalies.used_software Composition path: anomalies.used_software diff --git a/data/mappings/d4d_rocrate_structural_mapping_summary.md b/data/mappings/d4d_rocrate_structural_mapping_summary.md index f675f26b..fe72f8d3 100644 --- a/data/mappings/d4d_rocrate_structural_mapping_summary.md +++ b/data/mappings/d4d_rocrate_structural_mapping_summary.md @@ -1,6 +1,48 @@ # D4D to RO-Crate Schema-Structure-Aware Mapping Summary +## SEMANTIC Mappings (7) + +- **Dataset.addressing_gaps** → **d4d:addressingGaps** + - Confidence: 0.6 + - Type compatible: False + - Notes: slot_uri mapping: d4d:addressingGaps + - ⚠️ Warnings: Cardinality mismatch: multivalued slot mapping to single value + +- **Dataset.informed_consent** → **d4d:informedConsent** + - Confidence: 0.6 + - Type compatible: False + - Notes: slot_uri mapping: d4d:informedConsent + - ⚠️ Warnings: Cardinality mismatch: multivalued slot mapping to single value + +- **Dataset.at_risk_populations** → **d4d:atRiskPopulations** + - Confidence: 0.9 + - Type compatible: True + - Notes: slot_uri mapping: d4d:atRiskPopulations + +- **DataSubset.addressing_gaps** → **d4d:addressingGaps** + - Confidence: 0.6 + - Type compatible: False + - Notes: slot_uri mapping: d4d:addressingGaps + - ⚠️ Warnings: Cardinality mismatch: multivalued slot mapping to single value + +- **DataSubset.informed_consent** → **d4d:informedConsent** + - Confidence: 0.6 + - Type compatible: False + - Notes: slot_uri mapping: d4d:informedConsent + - ⚠️ Warnings: Cardinality mismatch: multivalued slot mapping to single value + +- **DataSubset.at_risk_populations** → **d4d:atRiskPopulations** + - Confidence: 0.9 + - Type compatible: True + - Notes: slot_uri mapping: d4d:atRiskPopulations + +- **LabelingStrategy.data_annotation_platform** → **rai:dataAnnotationPlatform** + - Confidence: 0.9 + - Type compatible: True + - Notes: slot_uri mapping: rai:dataAnnotationPlatform + + ## STRUCTURAL Mappings (142) - **Purpose.name** → **name** diff --git a/docs/ontology_mapping_guide.md b/docs/ontology_mapping_guide.md new file mode 100644 index 00000000..82c37de5 --- /dev/null +++ b/docs/ontology_mapping_guide.md @@ -0,0 +1,314 @@ +# D4D Ontology Mapping Guide + +## Overview + +Ontology mappings in the D4D schema serve two purposes: **RDF serialization** and **semantic +interoperability**. Every `slot_uri` becomes the RDF predicate when D4D YAML is serialized to +Turtle, JSON-LD, or other RDF formats. `exact_mappings`, `broad_mappings`, and `close_mappings` +declare alignment with terms in external vocabularies, enabling federated queries and FAIR +data discovery without requiring consumers to adopt the D4D namespace. + +Where a well-established ontology term exists and is semantically accurate, it is used directly +as the `slot_uri`. Where no suitable standard term exists, a custom `d4d:` term is minted. + +--- + +## Namespace Prefixes + +| Prefix | Full URI | Ontology / Standard | +|--------|----------|---------------------| +| `d4d` | `https://w3id.org/bridge2ai/data-sheets-schema/` | D4D custom terms (this project) | +| `data_sheets_schema` | `https://w3id.org/bridge2ai/data-sheets-schema/` | Alias for `d4d` used as default prefix | +| `dcterms` | `http://purl.org/dc/terms/` | Dublin Core Terms — general metadata | +| `dcat` | `http://www.w3.org/ns/dcat#` | Data Catalog Vocabulary — dataset/distribution metadata | +| `schema` | `http://schema.org/` | Schema.org — broad structured data vocabulary | +| `prov` | `http://www.w3.org/ns/prov#` | PROV-O — provenance ontology | +| `qudt` | `http://qudt.org/schema/qudt/` | QUDT — quantities, units, dimensions, types | +| `skos` | `http://www.w3.org/2004/02/skos/core#` | SKOS — knowledge organization / concept schemes | +| `AIO` | `https://w3id.org/aio/` | Artificial Intelligence Ontology — AI/ML concepts | +| `DUO` | `http://purl.obolibrary.org/obo/DUO_` | Data Use Ontology — data access and use conditions | +| `biolink` | `https://w3id.org/biolink/vocab/` | Biolink Model — biomedical knowledge graph vocabulary | +| `sh` | `https://w3id.org/shacl/` | SHACL — shapes constraint language | +| `linkml` | `https://w3id.org/linkml/` | LinkML framework built-ins | +| `mediatypes` | `https://www.iana.org/assignments/media-types/` | IANA media type registry | +| `B2AI_TOPIC` | `https://w3id.org/bridge2ai/b2ai-standards-registry/` | Bridge2AI standards registry | +| `rai` | `http://mlcommons.org/croissant/RAI/` | Responsible AI (Croissant/MLCommons) | + +--- + +## Mapping Strategy + +LinkML supports four mapping mechanisms: + +| Mechanism | Meaning | Use in D4D | +|-----------|---------|------------| +| `slot_uri` | The primary RDF predicate; the slot's canonical identity in RDF | Used for every slot; chosen to be the best single match | +| `exact_mappings` | Semantically equivalent term in another ontology | Used when a second vocabulary has an identical concept | +| `broad_mappings` | A related but broader or less precise term | Used when no exact match exists; indicates approximate alignment | +| `close_mappings` | Similar but not identical — overlapping semantics | Used at the class level or when meaning is analogous but scope differs | + +The guiding rule: prefer `dcterms`, `dcat`, or `schema:` terms for `slot_uri` when an accurate +match exists. Fall back to `d4d:` for D4D-specific concepts (consent details, bias types, +annotation protocols, etc.). + +--- + +## Core Slot Mappings (D4D_Base_import.yaml) + +| Slot | `slot_uri` | Ontology | Rationale | +|------|-----------|----------|-----------| +| `title` | `dcterms:title` | Dublin Core | Standard bibliographic title term | +| `language` | `dcterms:language` | Dublin Core | Standard language declaration; `schema:inLanguage` as exact mapping | +| `publisher` | `dcterms:publisher` | Dublin Core | Entity responsible for making resource available | +| `issued` | `dcterms:issued` | Dublin Core | Formal publication/issuance date | +| `page` | `dcat:landingPage` | DCAT | Web page describing (not delivering) the resource | +| `download_url` | `dcat:downloadURL` | DCAT | Direct data download link; `schema:url` as exact mapping | +| `bytes` | `dcat:byteSize` | DCAT | File size in bytes — DCAT distribution property | +| `format` | `dcterms:format` | Dublin Core | File format/extension | +| `media_type` | `dcat:mediaType` | DCAT | MIME type; `schema:encodingFormat` as exact mapping | +| `compression` | `dcat:compressFormat` | DCAT | Compression algorithm used | +| `keywords` | `dcat:keyword` | DCAT | Discovery keywords for the resource | +| `license` | `dcterms:license` | Dublin Core | Legal license term | +| `version` | `schema:version` | Schema.org | Version string | +| `created_by` | `dcterms:creator` | Dublin Core | Primary creator | +| `created_on` | `dcterms:created` | Dublin Core | Creation datetime | +| `last_updated_on` | `dcterms:modified` | Dublin Core | Last modification datetime | +| `modified_by` | `dcterms:contributor` | Dublin Core | Contributor to update | +| `conforms_to` | `dcterms:conformsTo` | Dublin Core | Standard or schema conformed to | +| `was_derived_from` | `prov:wasDerivedFrom` | PROV-O | Derivation provenance; `dcterms:source` as exact mapping | +| `external_resources` | `dcterms:references` | Dublin Core | Links to related resources | +| `resources` | `schema:hasPart` | Schema.org | Sub-datasets or component files | +| `sha256` | `schema:sha256` | Schema.org | Schema.org has a dedicated sha256 property | +| `hash` | `d4d:hashValue` | D4D custom | Generic hash; no standard single-algorithm-neutral term | +| `md5` | `d4d:md5Checksum` | D4D custom | MD5-specific; schema.org only defines sha256 | +| `encoding` | `d4d:characterEncoding` | D4D custom | Character encoding; no precise DCAT/DC term at enum level | +| `doi` | `d4d:doiIdentifier` | D4D custom | DOI-specific; `dcterms:identifier` is broad mapping | +| `status` | `d4d:publicationStatus` | D4D custom | Publication lifecycle status; no standard enum-backed term | +| `conforms_to_schema` | `d4d:conformsToSchema` | D4D custom | Schema-specific refinement of `dcterms:conformsTo` | +| `conforms_to_class` | `d4d:conformsToClass` | D4D custom | Class-specific refinement of `dcterms:conformsTo` | + +### Base Class URIs + +| Class | `class_uri` | Rationale | +|-------|------------|-----------| +| `NamedThing` | `schema:Thing` | Root Schema.org class for any identifiable entity | +| `Organization` | `schema:Organization` | Direct Schema.org match | +| `Person` | `schema:Person` | Direct Schema.org match | +| `Software` | `schema:SoftwareApplication` | Direct Schema.org match | +| `Information` | — (inherits `NamedThing`) | Abstract grouping; close mapping to `schema:CreativeWork` | +| `DatasetCollection` | — | Exact mapping: `dcat:Dataset`; close mapping: `dcat:Catalog` | + +--- + +## Module-Specific Mapping Decisions + +### D4D_Distribution.yaml — DCAT alignment +Distribution metadata (access URLs, release dates, formats, access rights) maps heavily to +DCAT. `dcat:accessURL` is used for access points, `dcterms:accessRights` for access conditions, +`dcterms:rights` for rights statements, and `dcat:theme` for thematic classification. + +### D4D_Data_Governance.yaml — DUO for data use permissions +Data Use Ontology (DUO) terms are used as `meaning:` values on `DataUsePermissionEnum` permissible +values (e.g., `DUO:0000004` for no restriction, `DUO:0000007` for disease-specific research). +The enum itself uses `d4d:` for the slot_uri since DUO is an OBO ontology for values, not +predicates. Regulatory compliance and license slots use `dcterms:` terms. + +### D4D_Collection.yaml and D4D_Preprocessing.yaml — RAI/Croissant +The Responsible AI (RAI) vocabulary from MLCommons/Croissant is used for `data_annotation_platform` +(`rai:dataAnnotationPlatform`). Other collection-specific concepts (consent, collector demographics, +acquisition methods) use `d4d:` terms, as no standard ontology covers these at the required +specificity. + +### D4D_Variables.yaml — QUDT for units +Variable unit declarations use `qudt:unit` as the `slot_uri`. This is the primary QUDT predicate +for associating a unit of measurement with a quantity. Range is `uriorcurie` to allow QUDT unit +URIs (e.g., `qudt:KilogramPerCubicMeter`) or custom values. + +### D4D_Motivation.yaml — Schema.org for funding and contributors +Funder and funding information use `schema:funder` and `schema:funding`. Creator credit roles +use `d4d:creditRoles` (no standard slot for CRediT taxonomy associations). + +### D4D_Composition.yaml — AIO for bias types +`BiasTypeEnum` permissible values use `meaning:` to map to AIO ontology terms +(e.g., `AIO:MeasurementBias`, `AIO:HistoricalBias`). For bias types without a 1:1 AIO match, +`broad_mappings` to the closest AIO term is used instead. + +### D4D_Maintenance.yaml and D4D_Distribution.yaml — DCTERMS for versions +`dcterms:hasVersion` covers version availability; `dcterms:available` is used for availability +dates. Errata and update tracking use `d4d:` terms. + +--- + +## Custom d4d: Terms + +The following `d4d:` slot URIs were minted because no standard ontology provided a +sufficiently precise predicate. All resolve under `https://w3id.org/bridge2ai/data-sheets-schema/`. + +### Identity and Provenance +| d4d term | Used for | +|----------|---------| +| `d4d:doiIdentifier` | DOI-specific identifier (broader `dcterms:identifier` used as broad mapping) | +| `d4d:orcidIdentifier` | ORCID researcher identifier | +| `d4d:hashValue` | Generic cryptographic hash (algorithm-neutral) | +| `d4d:md5Checksum` | MD5-specific checksum | +| `d4d:conformsToSchema` | Schema-specific conformance (refinement of `dcterms:conformsTo`) | +| `d4d:conformsToClass` | Class-specific conformance (refinement of `dcterms:conformsTo`) | +| `d4d:publicationStatus` | Publication lifecycle status | +| `d4d:characterEncoding` | Character encoding scheme | + +### Collection and Annotation +| d4d term | Used for | +|----------|---------| +| `d4d:collectionMechanisms` | Data collection mechanisms | +| `d4d:collectionType` | Type of collection (direct, inferred, etc.) | +| `d4d:collectionTimeframes` | Time period(s) of data collection | +| `d4d:dataCollectors` | Who collected the data | +| `d4d:dataAnnotationProtocol` | Annotation protocol description | +| `d4d:annotationsPerItem` | Number of annotations per data item | +| `d4d:annotatorDemographics` | Demographics of annotators | +| `d4d:agreementMetric` | Inter-annotator agreement metric | +| `d4d:disagreementPatterns` | Patterns in annotation disagreements | +| `d4d:wasDirectlyObserved` | Observation acquisition method flag | +| `d4d:wasReportedBySubjects` | Self-report acquisition flag | +| `d4d:wasInferred` | Inference/derivation acquisition flag | +| `d4d:wasValidated` | Validation acquisition flag | +| `d4d:directCollection` | Whether directly collected from subjects | + +### Consent and Ethics +| d4d term | Used for | +|----------|---------| +| `d4d:consentObtained` | Whether consent was obtained | +| `d4d:consentType` | Type of consent obtained | +| `d4d:consentScope` | Scope/breadth of consent | +| `d4d:consentDocumentation` | Where consent documentation lives | +| `d4d:consentRevocations` | Revocation information | +| `d4d:collectionConsents` | Consent details for collection | +| `d4d:collectionNotifications` | Notifications given to data subjects | +| `d4d:assentProcedures` | Assent procedures for minors/vulnerable groups | +| `d4d:withdrawalMechanism` | How subjects can withdraw data | +| `d4d:ethicalReviews` | IRB/ethics review details | +| `d4d:ethicsReviewBoard` | Name of reviewing ethics board | +| `d4d:ethicsContactPoint` | Contact for ethics questions | +| `d4d:dataProtectionImpacts` | Data protection impact assessments | + +### Sensitive and Confidential Data +| d4d term | Used for | +|----------|---------| +| `d4d:sensitiveElements` | Description of sensitive data elements | +| `d4d:sensitive_elements_present` | Boolean flag for sensitivity | +| `d4d:confidentialElements` | Confidential element descriptions | +| `d4d:confidential_elements_present` | Boolean flag for confidentiality | +| `d4d:confidentialityLevel` | Level of confidentiality | +| `d4d:contentWarnings` | Content warning descriptions | +| `d4d:content_warnings_present` | Boolean flag for content warnings | +| `d4d:reidentificationRisk` | Re-identification risk description | +| `d4d:anonymizationMethod` | Method used for anonymization | +| `d4d:removedIdentifierTypes` | Types of identifiers removed | + +### Preprocessing and Cleaning +| d4d term | Used for | +|----------|---------| +| `d4d:cleaningStrategies` | Data cleaning strategy objects | +| `d4d:strategies` | Generic preprocessing strategy list | + +### Uses and Distribution +| d4d term | Used for | +|----------|---------| +| `d4d:useCategory` | Category of intended use | +| `d4d:discouragedUses` | Uses that are discouraged but not prohibited | +| `d4d:existingUses` | Known prior uses of the dataset | +| `d4d:useRepository` | Repository indexing dataset uses | +| `d4d:distributionFormats` | Formats in which dataset is distributed | +| `d4d:distributionDates` | Distribution date information | +| `d4d:thirdPartySharing` | Third-party sharing arrangements | +| `d4d:regulatoryRestrictions` | Legal/regulatory distribution restrictions | +| `d4d:externalResourceRestrictions` | Restrictions on external resource access | +| `d4d:availabilityGuarantee` | Long-term availability commitments | +| `d4d:retentionPeriod` | Data retention period | +| `d4d:retentionLimit` | Maximum retention limit | + +### Maintenance and Versioning +| d4d term | Used for | +|----------|---------| +| `d4d:versionsAvailable` | Available dataset versions | +| `d4d:versionAccess` | How to access specific versions | +| `d4d:updates` | Planned or past update information | +| `d4d:errata` | Error corrections | +| `d4d:erratumURL` | URL for errata publication | +| `d4d:extensionMechanism` | How the dataset can be extended | +| `d4d:contributionURL` | How to contribute to the dataset | + +### Human Subjects and At-Risk Populations +| d4d term | Used for | +|----------|---------| +| `d4d:atRiskPopulations` | At-risk population documentation | +| `d4d:atRiskGroupsIncluded` | Specific at-risk groups present | +| `d4d:specialProtections` | Special protections in place | +| `d4d:specialPopulations` | Other special population designations | + +### Composition and Subpopulations +| d4d term | Used for | +|----------|---------| +| `d4d:subpopulations` | Subpopulation objects | +| `d4d:subpopulationIdentification` | How subpopulations are identified | +| `d4d:subpopulationDistribution` | Distribution across subpopulations | +| `d4d:subpopulationElementsPresent` | Whether subpopulation info is present | +| `d4d:samplingStrategies` | Sampling strategy objects | +| `d4d:sourceData` | Source of sampled data | +| `d4d:whyNotRepresentative` | Explanation of non-representativeness | +| `d4d:relationships` | Relationships between dataset instances | +| `d4d:dataSubset` | Subset description | +| `d4d:splits` | Train/val/test split information | + +### Miscellaneous +| d4d term | Used for | +|----------|---------| +| `d4d:usedSoftware` | Software used in this property/process | +| `d4d:tasks` | ML tasks the dataset supports | +| `d4d:biasType` | Type of bias (linked to AIO enum) | +| `d4d:anomalies` | Dataset anomaly objects | +| `d4d:fileCollections` | Nested file collection list | +| `d4d:fileCount` | Number of files | +| `d4d:totalFileCount` | Total file count across nested collections | +| `d4d:variableName` | Name of a variable/feature | +| `d4d:regulatoryCompliance` | Regulatory compliance declarations | +| `d4d:teamAffiliation` | Team or institutional affiliation | + +--- + +## Known Limitations + +### Approximate (broad_mappings) mappings +The following slots use `broad_mappings` because no ontology provides an exact match: + +- **`d4d:hashValue`** → `broad_mappings: dcterms:identifier` — a hash value is a form of identifier + but `dcterms:identifier` is not specific to cryptographic hashes. +- **`d4d:md5Checksum`** → same rationale; `schema:sha256` exists but MD5 has no Schema.org term. +- **`d4d:doiIdentifier`** → `broad_mappings: dcterms:identifier`; DOI is a specific identifier scheme + but no dedicated ontology predicate for DOI is widely used in DCAT/DC contexts. +- **`d4d:conforms_to_schema` / `d4d:conforms_to_class`** → `broad_mappings: dcterms:conformsTo`; + these are refinements of conformance but LinkML has no standard sub-property for schema/class + specificity. +- **`d4d:orcidIdentifier`** → `broad_mappings: schema:identifier`; ORCID has no dedicated + Schema.org property, though `schema:identifier` with a type qualifier is the recommended + approach. + +### DUO as value vocabulary, not predicate vocabulary +DUO terms are used as `meaning:` values on enum members in `DataUsePermissionEnum` and +`DataUseModifierEnum`. They are not used as `slot_uri` predicates because DUO defines controlled +vocabulary *concepts*, not RDF predicates. The slot predicate remains `d4d:` while the enum value's +semantic meaning points to the DUO term. + +### RAI/Croissant coverage +The RAI (Responsible AI) vocabulary from MLCommons/Croissant is only partially developed. +Only `rai:dataAnnotationPlatform` is currently used. Many other collection and annotation +concepts (demographics, protocols, agreement metrics) have no RAI equivalent and use `d4d:` terms. +As RAI matures, these may be replaced with canonical RAI predicates. + +### AIO ontology gaps +For `BiasTypeEnum`, some bias types have no 1:1 AIO mapping: +- `selection_bias` → `broad_mappings: AIO:SelectionAndSamplingBias` (covers both selection and sampling) +- `sampling_bias` → same broad mapping as selection_bias +- `aggregation_bias` → `broad_mappings: AIO:EcologicalFallacyBias` (closest available) +- `algorithmic_bias` → `broad_mappings: AIO:ProcessingBias, AIO:ComputationalBias` (split concept) +- `annotation_bias` → `broad_mappings: AIO:AnnotatorReportingBias` (narrower than intended) diff --git a/project/jsonld/data_sheets_schema.jsonld b/project/jsonld/data_sheets_schema.jsonld index a449322d..b220d28d 100644 --- a/project/jsonld/data_sheets_schema.jsonld +++ b/project/jsonld/data_sheets_schema.jsonld @@ -478,262 +478,380 @@ { "name": "FormatEnum", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/FormatEnum", + "description": "Common file format extensions for data files and documents.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "permissible_values": [ { - "text": "CSV" + "text": "CSV", + "description": "Comma-Separated Values - tabular data format." }, { - "text": "TSV" + "text": "TSV", + "description": "Tab-Separated Values - tabular data format with tab delimiters." }, { - "text": "XML" + "text": "XML", + "description": "Extensible Markup Language - structured markup format." }, { - "text": "JSON" + "text": "JSON", + "description": "JavaScript Object Notation - structured data interchange format." }, { - "text": "JSONL" + "text": "JSONL", + "description": "JSON Lines - newline-delimited JSON format." }, { - "text": "YAML" + "text": "YAML", + "description": "YAML Ain't Markup Language - human-readable data serialization format." }, { - "text": "HTML" + "text": "HTML", + "description": "HyperText Markup Language - web page markup format." }, { - "text": "PDF" + "text": "PDF", + "description": "Portable Document Format - fixed-layout document format." }, { - "text": "DOCX" + "text": "DOCX", + "description": "Microsoft Word Open XML Document - word processing document." }, { - "text": "XLSX" + "text": "XLSX", + "description": "Microsoft Excel Open XML Spreadsheet - spreadsheet format." }, { - "text": "PPTX" + "text": "PPTX", + "description": "Microsoft PowerPoint Open XML Presentation - presentation format." }, { - "text": "TXT" + "text": "TXT", + "description": "Plain text file." }, { - "text": "MD" + "text": "MD", + "description": "Markdown - lightweight markup language." }, { - "text": "ZIP" + "text": "ZIP", + "description": "ZIP archive - compressed file container." }, { - "text": "TAR" + "text": "TAR", + "description": "Tape Archive - file archive format." }, { - "text": "GZ" + "text": "GZ", + "description": "Gzip compressed file." }, { - "text": "BZ2" + "text": "BZ2", + "description": "Bzip2 compressed file." }, { - "text": "XZ" + "text": "XZ", + "description": "XZ compressed file." } ] }, { "name": "MediaTypeEnum", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/MediaTypeEnum", + "description": "MIME media types (Internet Media Types) for file content identification.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "permissible_values": [ { - "text": "text/csv" + "text": "text/csv", + "description": "MIME type for CSV (Comma-Separated Values) files." }, { - "text": "text/tab-separated-values" + "text": "text/tab-separated-values", + "description": "MIME type for TSV (Tab-Separated Values) files." }, { - "text": "application/json" + "text": "application/json", + "description": "MIME type for JSON (JavaScript Object Notation) files." }, { - "text": "application/xml" + "text": "application/xml", + "description": "MIME type for XML (Extensible Markup Language) files." }, { - "text": "text/xml" + "text": "text/xml", + "description": "Alternative MIME type for XML files (text variant)." }, { - "text": "application/yaml" + "text": "application/yaml", + "description": "MIME type for YAML files." }, { - "text": "text/yaml" + "text": "text/yaml", + "description": "Alternative MIME type for YAML files (text variant)." }, { - "text": "text/html" + "text": "text/html", + "description": "MIME type for HTML (HyperText Markup Language) files." }, { - "text": "application/pdf" + "text": "application/pdf", + "description": "MIME type for PDF (Portable Document Format) files." }, { - "text": "application/vnd.openxmlformats-officedocument.wordprocessingml.document" + "text": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + "description": "MIME type for Microsoft Word DOCX files." }, { - "text": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" + "text": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", + "description": "MIME type for Microsoft Excel XLSX files." }, { - "text": "application/vnd.openxmlformats-officedocument.presentationml.presentation" + "text": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "description": "MIME type for Microsoft PowerPoint PPTX files." }, { - "text": "text/plain" + "text": "text/plain", + "description": "MIME type for plain text files." }, { - "text": "text/markdown" + "text": "text/markdown", + "description": "MIME type for Markdown files." }, { - "text": "application/zip" + "text": "application/zip", + "description": "MIME type for ZIP archive files." }, { - "text": "application/x-tar" + "text": "application/x-tar", + "description": "MIME type for TAR archive files." }, { - "text": "application/gzip" + "text": "application/gzip", + "description": "MIME type for Gzip compressed files." }, { - "text": "application/x-bzip2" + "text": "application/x-bzip2", + "description": "MIME type for Bzip2 compressed files." }, { - "text": "application/x-xz" + "text": "application/x-xz", + "description": "MIME type for XZ compressed files." } ] }, { "name": "CompressionEnum", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/CompressionEnum", + "description": "Compression algorithms and formats for file compression.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "permissible_values": [ { - "text": "gzip" + "text": "gzip", + "description": "GNU zip compression (commonly used with .gz extension)." }, { - "text": "bzip2" + "text": "bzip2", + "description": "Burrows-Wheeler block-sorting compression (commonly used with .bz2 extension)." }, { - "text": "zip" + "text": "zip", + "description": "ZIP archive compression format." }, { - "text": "tar" + "text": "tar", + "description": "Tape Archive format (typically combined with gzip or bzip2)." }, { - "text": "xz" + "text": "xz", + "description": "XZ Utils compression using LZMA2 algorithm." }, { - "text": "lzma" + "text": "lzma", + "description": "Lempel-Ziv-Markov chain algorithm compression." }, { - "text": "compress" + "text": "compress", + "description": "Unix compress utility (LZW compression)." } ] }, { "name": "EncodingEnum", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/EncodingEnum", + "description": "Character encoding schemes for text representation in different languages and scripts.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "permissible_values": [ { - "text": "ASCII" + "text": "ASCII", + "description": "American Standard Code for Information Interchange (7-bit, English characters only)." }, { - "text": "Big5" + "text": "Big5", + "description": "Traditional Chinese character encoding (primarily Taiwan and Hong Kong)." }, { - "text": "EUC-JP" + "text": "EUC-JP", + "description": "Extended Unix Code for Japanese." }, { - "text": "EUC-KR" + "text": "EUC-KR", + "description": "Extended Unix Code for Korean." }, { - "text": "EUC-TW" + "text": "EUC-TW", + "description": "Extended Unix Code for Traditional Chinese." }, { - "text": "GB2312" + "text": "GB2312", + "description": "Simplified Chinese character encoding standard." }, { - "text": "HZ-GB-2312" + "text": "HZ-GB-2312", + "description": "7-bit encoding for Simplified Chinese (GB2312)." }, { - "text": "ISO-2022-CN-EXT" + "text": "ISO-2022-CN-EXT", + "description": "Extended ISO-2022 encoding for Chinese (includes both Simplified and Traditional)." }, { - "text": "ISO-2022-CN" + "text": "ISO-2022-CN", + "description": "ISO-2022 encoding for Chinese." }, { - "text": "ISO-2022-JP-2" + "text": "ISO-2022-JP-2", + "description": "Extended ISO-2022 encoding for Japanese (includes additional character sets)." }, { - "text": "ISO-2022-JP" + "text": "ISO-2022-JP", + "description": "ISO-2022 encoding for Japanese." }, { - "text": "ISO-2022-KR" + "text": "ISO-2022-KR", + "description": "ISO-2022 encoding for Korean." }, { - "text": "ISO-8859-10" + "text": "ISO-8859-10", + "description": "Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic)." }, { - "text": "ISO-8859-11" + "text": "ISO-8859-11", + "description": "Latin/Thai encoding." }, { - "text": "ISO-8859-13" + "text": "ISO-8859-13", + "description": "Latin-7 (Baltic Rim languages)." }, { - "text": "ISO-8859-14" + "text": "ISO-8859-14", + "description": "Latin-8 (Celtic languages)." }, { - "text": "ISO-8859-15" + "text": "ISO-8859-15", + "description": "Latin-9 (Western European with Euro sign)." }, { - "text": "ISO-8859-16" + "text": "ISO-8859-16", + "description": "Latin-10 (South-Eastern European languages)." }, { - "text": "ISO-8859-1" + "text": "ISO-8859-1", + "description": "Latin-1 (Western European languages)." }, { - "text": "ISO-8859-2" + "text": "ISO-8859-2", + "description": "Latin-2 (Central European languages)." }, { - "text": "ISO-8859-3" + "text": "ISO-8859-3", + "description": "Latin-3 (South European languages - Turkish, Maltese, Esperanto)." }, { - "text": "ISO-8859-4" + "text": "ISO-8859-4", + "description": "Latin-4 (North European languages)." }, { - "text": "ISO-8859-5" + "text": "ISO-8859-5", + "description": "Latin/Cyrillic encoding." }, { - "text": "ISO-8859-6" + "text": "ISO-8859-6", + "description": "Latin/Arabic encoding." }, { - "text": "ISO-8859-7" + "text": "ISO-8859-7", + "description": "Latin/Greek encoding." }, { - "text": "ISO-8859-8" + "text": "ISO-8859-8", + "description": "Latin/Hebrew encoding." }, { - "text": "ISO-8859-9" + "text": "ISO-8859-9", + "description": "Latin-5 (Turkish)." }, { - "text": "KOI8-R" + "text": "KOI8-R", + "description": "Russian character encoding (Kod Obmena Informatsiey)." }, { - "text": "KOI8-U" + "text": "KOI8-U", + "description": "Ukrainian character encoding." }, { - "text": "Shift_JIS" + "text": "Shift_JIS", + "description": "Japanese character encoding (Microsoft and other systems)." }, { - "text": "UTF-16" + "text": "UTF-16", + "description": "Unicode Transformation Format 16-bit (variable-width encoding)." }, { - "text": "UTF-32" + "text": "UTF-32", + "description": "Unicode Transformation Format 32-bit (fixed-width encoding)." }, { - "text": "UTF-7" + "text": "UTF-7", + "description": "Unicode Transformation Format 7-bit (for 7-bit channels)." }, { - "text": "UTF-8" + "text": "UTF-8", + "description": "Unicode Transformation Format 8-bit (variable-width, most common Unicode encoding)." + }, + { + "text": "Windows-1250", + "description": "Windows code page for Central European languages." + }, + { + "text": "Windows-1251", + "description": "Windows code page for Cyrillic script." + }, + { + "text": "Windows-1252", + "description": "Windows code page for Western European languages." + }, + { + "text": "Windows-1253", + "description": "Windows code page for Greek." + }, + { + "text": "Windows-1254", + "description": "Windows code page for Turkish." + }, + { + "text": "Windows-1255", + "description": "Windows code page for Hebrew." + }, + { + "text": "Windows-1256", + "description": "Windows code page for Arabic." + }, + { + "text": "Windows-1257", + "description": "Windows code page for Baltic languages." + }, + { + "text": "Windows-1258", + "description": "Windows code page for Vietnamese." } ] }, @@ -745,27 +863,27 @@ "permissible_values": [ { "text": "conceptualization", - "description": "Ideas; formulation or evolution of overarching research goals and aims" + "description": "Ideas; formulation or evolution of overarching research goals and aims." }, { "text": "methodology", - "description": "Development or design of methodology; creation of models" + "description": "Development or design of methodology; creation of models." }, { "text": "software", - "description": "Programming, software development; designing computer programs" + "description": "Programming, software development; designing computer programs." }, { "text": "validation", - "description": "Verification of the overall replication/reproducibility of results" + "description": "Verification of the overall replication/reproducibility of results." }, { "text": "formal_analysis", - "description": "Application of statistical, mathematical, or other formal techniques" + "description": "Application of statistical, mathematical, or other formal techniques." }, { "text": "investigation", - "description": "Conducting the research and investigation process" + "description": "Conducting the research and investigation process." }, { "text": "resources", @@ -773,31 +891,31 @@ }, { "text": "data_curation", - "description": "Management activities to annotate, scrub data and maintain research data" + "description": "Management activities to annotate, scrub data and maintain research data." }, { "text": "writing_original_draft", - "description": "Preparation, creation and/or presentation of the published work" + "description": "Preparation, creation and/or presentation of the published work." }, { "text": "writing_review_editing", - "description": "Critical review, commentary or revision of the work" + "description": "Critical review, commentary or revision of the work." }, { "text": "visualization", - "description": "Preparation, creation and/or presentation of visualizations/data presentation" + "description": "Preparation, creation and/or presentation of visualizations/data presentation." }, { "text": "supervision", - "description": "Oversight and leadership responsibility for the research activity" + "description": "Oversight and leadership responsibility for the research activity." }, { "text": "project_administration", - "description": "Management and coordination responsibility for the research activity" + "description": "Management and coordination responsibility for the research activity." }, { "text": "funding_acquisition", - "description": "Acquisition of the financial support for the project" + "description": "Acquisition of the financial support for the project." } ] }, @@ -909,42 +1027,15 @@ "permissible_values": [ { "text": "MAJOR", - "description": "Incompatible changes, breaking backward compatibility" + "description": "Incompatible changes, breaking backward compatibility." }, { "text": "MINOR", - "description": "Backward-compatible new functionality or enhancements" + "description": "Backward-compatible new functionality or enhancements." }, { "text": "PATCH", - "description": "Backward-compatible bug fixes or minor corrections" - }, - { - "text": "Windows-1250" - }, - { - "text": "Windows-1251" - }, - { - "text": "Windows-1252" - }, - { - "text": "Windows-1253" - }, - { - "text": "Windows-1254" - }, - { - "text": "Windows-1255" - }, - { - "text": "Windows-1256" - }, - { - "text": "Windows-1257" - }, - { - "text": "Windows-1258" + "description": "Backward-compatible bug fixes or minor corrections." } ] }, @@ -1021,18 +1112,22 @@ { "name": "Boolean", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/Boolean", + "description": "Three-valued boolean logic supporting true, false, and unknown states.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "permissible_values": [ { "text": "true", + "description": "Affirmative or positive value.", "title": "True" }, { "text": "false", + "description": "Negative or false value.", "title": "False" }, { "text": "unknown", + "description": "Unknown, uncertain, or not applicable value.", "title": "Unknown" } ] @@ -1208,112 +1303,112 @@ "permissible_values": [ { "text": "no_restriction", - "description": "No restriction on data use", + "description": "No restriction on data use.", "meaning": "DUO:0000004" }, { "text": "general_research_use", - "description": "Data available for any research purpose (GRU)", + "description": "Data available for any research purpose (GRU).", "meaning": "DUO:0000042" }, { "text": "health_medical_biomedical_research", - "description": "Data limited to health, medical, or biomedical research (HMB)", + "description": "Data limited to health, medical, or biomedical research (HMB).", "meaning": "DUO:0000006" }, { "text": "disease_specific_research", - "description": "Data limited to research on specified disease(s) (DS)", + "description": "Data limited to research on specified disease(s) (DS).", "meaning": "DUO:0000007" }, { "text": "population_origins_ancestry_research", - "description": "Data limited to population origins or ancestry research (POA)", + "description": "Data limited to population origins or ancestry research (POA).", "meaning": "DUO:0000011" }, { "text": "clinical_care_use", - "description": "Data available for clinical care and applications (CC)", + "description": "Data available for clinical care and applications (CC).", "meaning": "DUO:0000043" }, { "text": "no_commercial_use", - "description": "Data use limited to non-commercial purposes (NCU)", + "description": "Data use limited to non-commercial purposes (NCU).", "meaning": "DUO:0000046" }, { "text": "non_profit_use_only", - "description": "Data use limited to not-for-profit organizations (NPU)", + "description": "Data use limited to not-for-profit organizations (NPU).", "meaning": "DUO:0000045" }, { "text": "non_profit_use_and_non_commercial_use", - "description": "Data limited to not-for-profit organizations and non-commercial use (NPUNCU)", + "description": "Data limited to not-for-profit organizations and non-commercial use (NPUNCU).", "meaning": "DUO:0000018" }, { "text": "no_methods_development", - "description": "Data cannot be used for methods or software development (NMDS)", + "description": "Data cannot be used for methods or software development (NMDS).", "meaning": "DUO:0000015" }, { "text": "genetic_studies_only", - "description": "Data limited to genetic studies only (GSO)", + "description": "Data limited to genetic studies only (GSO).", "meaning": "DUO:0000016" }, { "text": "ethics_approval_required", - "description": "Ethics approval (e.g., IRB/ERB) required for data use (IRB)", + "description": "Ethics approval (e.g., IRB/ERB) required for data use (IRB).", "meaning": "DUO:0000021" }, { "text": "collaboration_required", - "description": "Collaboration with primary investigator required (COL)", + "description": "Collaboration with primary investigator required (COL).", "meaning": "DUO:0000020" }, { "text": "publication_required", - "description": "Results must be published/shared with research community (PUB)", + "description": "Results must be published/shared with research community (PUB).", "meaning": "DUO:0000019" }, { "text": "geographic_restriction", - "description": "Data use limited to specific geographic region (GS)", + "description": "Data use limited to specific geographic region (GS).", "meaning": "DUO:0000022" }, { "text": "institution_specific", - "description": "Data use limited to approved institutions (IS)", + "description": "Data use limited to approved institutions (IS).", "meaning": "DUO:0000028" }, { "text": "project_specific", - "description": "Data use limited to approved project(s) (PS)", + "description": "Data use limited to approved project(s) (PS).", "meaning": "DUO:0000027" }, { "text": "user_specific", - "description": "Data use limited to approved users (US)", + "description": "Data use limited to approved users (US).", "meaning": "DUO:0000026" }, { "text": "time_limit", - "description": "Data use approved for limited time period (TS)", + "description": "Data use approved for limited time period (TS).", "meaning": "DUO:0000025" }, { "text": "return_to_database", - "description": "Derived data must be returned to database/resource (RTN)", + "description": "Derived data must be returned to database/resource (RTN).", "meaning": "DUO:0000029" }, { "text": "publication_moratorium", - "description": "Publication restricted until specified date (MOR)", + "description": "Publication restricted until specified date (MOR).", "meaning": "DUO:0000024" }, { "text": "no_population_ancestry_research", - "description": "Population/ancestry research prohibited (NPOA)", + "description": "Population/ancestry research prohibited (NPOA).", "meaning": "DUO:0000044" } ] @@ -1426,47 +1521,47 @@ "permissible_values": [ { "text": "data_file", - "description": "A data file containing dataset content", + "description": "A data file containing dataset content.", "meaning": "schema:DataDownload" }, { "text": "code_file", - "description": "A source code or script file", + "description": "A source code or script file.", "meaning": "schema:SoftwareSourceCode" }, { "text": "documentation_file", - "description": "A documentation file (README, guide, etc.)", + "description": "A documentation file (README, guide, etc.).", "meaning": "schema:Documentation" }, { "text": "metadata_file", - "description": "A metadata or annotation file", + "description": "A metadata or annotation file.", "meaning": "dcat:CatalogRecord" }, { "text": "configuration_file", - "description": "A configuration or settings file", + "description": "A configuration or settings file.", "meaning": "d4d:ConfigurationFile" }, { "text": "notebook_file", - "description": "A computational notebook file (Jupyter, R Markdown, etc.)", + "description": "A computational notebook file (Jupyter, R Markdown, etc.).", "meaning": "d4d:NotebookFile" }, { "text": "image_file", - "description": "An image or visualization file", + "description": "An image or visualization file.", "meaning": "schema:ImageObject" }, { "text": "archive_file", - "description": "An archive or compressed file", + "description": "An archive or compressed file.", "meaning": "d4d:ArchiveFile" }, { "text": "other", - "description": "Other file type", + "description": "Other file type.", "meaning": "d4d:OtherFile" } ] @@ -1479,52 +1574,52 @@ "permissible_values": [ { "text": "raw_data", - "description": "Raw, unprocessed data files", + "description": "Raw, unprocessed data files.", "meaning": "d4d:RawData" }, { "text": "processed_data", - "description": "Cleaned, processed, or transformed data files", + "description": "Cleaned, processed, or transformed data files.", "meaning": "d4d:ProcessedData" }, { "text": "training_split", - "description": "Files designated for model training", + "description": "Files designated for model training.", "meaning": "d4d:TrainingSplit" }, { "text": "test_split", - "description": "Files designated for model testing", + "description": "Files designated for model testing.", "meaning": "d4d:TestSplit" }, { "text": "validation_split", - "description": "Files designated for model validation", + "description": "Files designated for model validation.", "meaning": "d4d:ValidationSplit" }, { "text": "documentation", - "description": "Documentation files (README, codebook, etc.)", + "description": "Documentation files (README, codebook, etc.).", "meaning": "schema:Documentation" }, { "text": "metadata", - "description": "Metadata or annotation files", + "description": "Metadata or annotation files.", "meaning": "dcat:CatalogRecord" }, { "text": "code", - "description": "Code or script files", + "description": "Code or script files.", "meaning": "schema:SoftwareSourceCode" }, { "text": "supplementary", - "description": "Supplementary materials", + "description": "Supplementary materials.", "meaning": "schema:SupplementalMaterial" }, { "text": "other", - "description": "Other file collection type", + "description": "Other file collection type.", "meaning": "d4d:OtherFileCollection" } ] @@ -1534,7 +1629,14 @@ { "name": "same_as", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/same_as", - "description": "URL of a reference web resource that is the same as this dataset. Used to link to canonical or alternative representations of the same dataset on different platforms (e.g., DOI resolver, institutional repository, data catalog).", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "doi:10.XXXXX/example-dataset", + "@type": "Annotation" + } + ], + "description": "One or more URLs or URIs identifying equivalent or related representations of this dataset. Used to link to canonical or alternative representations of the same dataset on different platforms (e.g., DOI resolver, institutional repository, data catalog).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "http://schema.org/sameAs" @@ -1565,7 +1667,7 @@ { "name": "title", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/title", - "description": "the official title of the element", + "description": "The official title of the element.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/title" @@ -1581,7 +1683,7 @@ { "name": "language", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/language", - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/language" @@ -1600,6 +1702,14 @@ { "name": "publisher", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/publisher", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "ror:04t3en479 # use a ROR ID, DOI, or URL \u2014 not a plain name", + "@type": "Annotation" + } + ], + "description": "The organization or entity responsible for making the resource available.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/publisher" @@ -1615,6 +1725,7 @@ { "name": "issued", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/issued", + "description": "Date of formal issuance or publication of the resource.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/issued" @@ -1630,6 +1741,7 @@ { "name": "page", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/page", + "description": "A landing page or web page providing access to or information about the resource.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://www.w3.org/ns/dcat#landingPage" @@ -1677,6 +1789,7 @@ { "name": "path", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/path", + "description": "The file path or URL where the content is located.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://schema.org/contentUrl" @@ -1728,12 +1841,12 @@ { "name": "encoding", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/encoding", - "description": "the character encoding of the data", + "description": "The character encoding of the data.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ - "http://www.w3.org/ns/dcat#mediaType" + "https://w3id.org/bridge2ai/data-sheets-schema/characterEncoding" ], - "slot_uri": "http://www.w3.org/ns/dcat#mediaType", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/characterEncoding", "owner": "File", "domain_of": [ "File" @@ -1744,7 +1857,7 @@ { "name": "compression", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/compression", - "description": "compression format used, if any. e.g., gzip, bzip2, zip", + "description": "Compression format used, if any (e.g., gzip, bzip2, zip).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://www.w3.org/ns/dcat#compressFormat" @@ -1781,12 +1894,15 @@ { "name": "hash", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/hash", - "description": "hash of the data", + "description": "Cryptographic hash value of the data for integrity verification (e.g., SHA-256: 'e3b0c44298fc1c149afb...', MD5: 'd41d8cd98f00b204e9800998ecf8427e').", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/hashValue" + ], + "broad_mappings": [ "http://purl.org/dc/terms/identifier" ], - "slot_uri": "http://purl.org/dc/terms/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/hashValue", "owner": "File", "domain_of": [ "File" @@ -1797,12 +1913,15 @@ { "name": "md5", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/md5", - "description": "md5 hash of the data", + "description": "MD5 hash value of the data (128-bit cryptographic hash).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/md5Checksum" + ], + "broad_mappings": [ "http://purl.org/dc/terms/identifier" ], - "slot_uri": "http://purl.org/dc/terms/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/md5Checksum", "owner": "File", "domain_of": [ "File" @@ -1813,12 +1932,12 @@ { "name": "sha256", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/sha256", - "description": "sha256 hash of the data", + "description": "SHA-256 hash value of the data (256-bit cryptographic hash, recommended).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ - "http://purl.org/dc/terms/identifier" + "http://schema.org/sha256" ], - "slot_uri": "http://purl.org/dc/terms/identifier", + "slot_uri": "http://schema.org/sha256", "owner": "File", "domain_of": [ "File" @@ -1829,6 +1948,7 @@ { "name": "conforms_to", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/conforms_to", + "description": "An established standard, specification, or schema to which the resource conforms.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/conformsTo" @@ -1844,11 +1964,15 @@ { "name": "conforms_to_schema", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/conforms_to_schema", + "description": "The schema or data model to which the resource conforms.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/conformsToSchema" + ], + "broad_mappings": [ "http://purl.org/dc/terms/conformsTo" ], - "slot_uri": "http://purl.org/dc/terms/conformsTo", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/conformsToSchema", "owner": "Information", "domain_of": [ "Information" @@ -1859,11 +1983,15 @@ { "name": "conforms_to_class", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/conforms_to_class", + "description": "The specific class or type within a schema to which the resource conforms.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/conformsToClass" + ], + "broad_mappings": [ "http://purl.org/dc/terms/conformsTo" ], - "slot_uri": "http://purl.org/dc/terms/conformsTo", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/conformsToClass", "owner": "Information", "domain_of": [ "Information" @@ -1874,6 +2002,7 @@ { "name": "license", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/license", + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/license" @@ -1889,6 +2018,7 @@ { "name": "keywords", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/keywords", + "description": "Keywords or tags describing the resource for discovery and classification.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://www.w3.org/ns/dcat#keyword" @@ -1905,11 +2035,12 @@ { "name": "version", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/version", + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ - "http://purl.org/dc/terms/hasVersion" + "http://schema.org/version" ], - "slot_uri": "http://purl.org/dc/terms/hasVersion", + "slot_uri": "http://schema.org/version", "owner": "Information", "domain_of": [ "Information" @@ -1920,6 +2051,7 @@ { "name": "created_by", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/created_by", + "description": "The person or organization primarily responsible for creating the resource.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/creator" @@ -1935,6 +2067,7 @@ { "name": "created_on", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/created_on", + "description": "The date and time when the resource was created.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/created" @@ -1950,6 +2083,7 @@ { "name": "last_updated_on", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/last_updated_on", + "description": "The date and time when the resource was most recently modified or updated.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/modified" @@ -1965,6 +2099,7 @@ { "name": "modified_by", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/modified_by", + "description": "A person or organization that contributed to modifying or updating the resource.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://purl.org/dc/terms/contributor" @@ -1980,11 +2115,12 @@ { "name": "status", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/status", + "description": "The status of the resource (e.g., draft, published, deprecated).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ - "http://purl.org/dc/terms/type" + "https://w3id.org/bridge2ai/data-sheets-schema/publicationStatus" ], - "slot_uri": "http://purl.org/dc/terms/type", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/publicationStatus", "owner": "Information", "domain_of": [ "Information" @@ -1995,6 +2131,7 @@ { "name": "was_derived_from", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/was_derived_from", + "description": "A resource from which this resource was derived, in whole or in part.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://www.w3.org/ns/prov#wasDerivedFrom" @@ -2013,12 +2150,18 @@ { "name": "doi", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/doi", - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/doiIdentifier" + ], + "exact_mappings": [ + "http://schema.org/identifier" + ], + "broad_mappings": [ "http://purl.org/dc/terms/identifier" ], - "slot_uri": "http://purl.org/dc/terms/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/doiIdentifier", "owner": "Information", "domain_of": [ "Information" @@ -2089,6 +2232,13 @@ }, { "name": "dataset__total_file_count", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "156", + "@type": "Annotation" + } + ], "description": "Total number of files across all file collections in this dataset. Can be aggregated from file_collections[].file_count.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ @@ -2105,6 +2255,13 @@ }, { "name": "dataset__total_size_bytes", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "10737418240 (10 GiB = 10 \u00d7 1024\u00b3 bytes)", + "@type": "Annotation" + } + ], "description": "Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ @@ -2121,6 +2278,7 @@ }, { "name": "dataset__purposes", + "description": "Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each describing a specific creation goal or intended application.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/purposes" @@ -2139,6 +2297,7 @@ }, { "name": "dataset__tasks", + "description": "Tasks the dataset is intended to support. List of Task objects from the Motivation module describing specific machine learning, research, or analytical tasks.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/tasks" @@ -2157,6 +2316,7 @@ }, { "name": "dataset__addressing_gaps", + "description": "Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation module, each describing a gap in existing datasets or knowledge that this dataset fills.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/addressingGaps" @@ -2175,6 +2335,7 @@ }, { "name": "dataset__creators", + "description": "Individuals or organizations who created the dataset. List of Creator objects describing authorship, roles, and affiliations of dataset creators.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "http://schema.org/creator" @@ -2193,6 +2354,7 @@ }, { "name": "dataset__funders", + "description": "Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing grants, contracts, or other funding sources including grantors and grant identifiers.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "http://schema.org/funder" @@ -2211,14 +2373,12 @@ }, { "name": "dataset__subsets", + "description": "Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each representing a logical partition such as training, validation, or test splits, or demographic subgroups.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ - "http://www.w3.org/ns/dcat#distribution" + "https://w3id.org/bridge2ai/data-sheets-schema/dataSubset" ], - "exact_mappings": [ - "http://schema.org/distribution" - ], - "slot_uri": "http://www.w3.org/ns/dcat#distribution", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/dataSubset", "alias": "subsets", "owner": "Dataset", "domain_of": [ @@ -2232,6 +2392,7 @@ }, { "name": "dataset__instances", + "description": "Individual data instances or records in the dataset. List of Instance objects from the Composition module describing what each data point represents, its type, and associated label information.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/instances" @@ -2250,6 +2411,7 @@ }, { "name": "dataset__anomalies", + "description": "Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects from the Composition module, each documenting a specific anomaly and its potential impact.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/anomalies" @@ -2306,6 +2468,7 @@ }, { "name": "dataset__confidential_elements", + "description": "Confidential or restricted information within the dataset that requires access controls. List of Confidentiality objects describing what is confidential and why it cannot be released.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/confidentialElements" @@ -2324,6 +2487,7 @@ }, { "name": "dataset__content_warnings", + "description": "Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ContentWarning objects alerting users to sensitive content categories.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/contentWarnings" @@ -2342,6 +2506,7 @@ }, { "name": "dataset__subpopulations", + "description": "Subpopulations represented within the dataset. List of Subpopulation objects from the Composition module describing demographic or other groups, their representation, and any imbalances.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/subpopulations" @@ -2360,6 +2525,7 @@ }, { "name": "dataset__sensitive_elements", + "description": "Sensitive data elements requiring special handling or access controls. List of SensitiveElement objects identifying sensitive attributes such as personal identifiers, protected health information, or legally sensitive content.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/sensitiveElements" @@ -2376,8 +2542,47 @@ "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "dataset__relationships", + "description": "Explicit relationships between individual instances in the dataset. List of Relationships objects from the Composition module describing how instances relate (e.g., graph edges, ratings, social network links).", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/relationships" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/relationships", + "alias": "relationships", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "Relationships", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "dataset__splits", + "description": "Recommended data splits for this dataset. List of Splits objects from the Composition module describing train/validation/test partitions and the rationale for each split strategy.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/splits" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/splits", + "alias": "splits", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "Splits", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "dataset__acquisition_methods", + "description": "Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Collection module describing how data was sourced, whether directly observed or derived.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/acquisitionMethods" @@ -2396,6 +2601,7 @@ }, { "name": "dataset__collection_mechanisms", + "description": "Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from the Collection module describing sensors, surveys, APIs, or other collection instruments.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/collectionMechanisms" @@ -2414,6 +2620,7 @@ }, { "name": "dataset__sampling_strategies", + "description": "Strategies used to select data instances from a larger population. List of SamplingStrategy objects from the Collection module describing sampling methodology, inclusion criteria, and limitations.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/samplingStrategies" @@ -2432,6 +2639,7 @@ }, { "name": "dataset__data_collectors", + "description": "Individuals or organizations responsible for collecting the data. List of DataCollector objects from the Collection module describing who performed data collection and their roles.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/dataCollectors" @@ -2450,6 +2658,7 @@ }, { "name": "dataset__collection_timeframes", + "description": "Time periods during which data was collected. List of CollectionTimeframe objects from the Collection module describing collection start and end dates, and any gaps in the collection period.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/collectionTimeframes" @@ -2466,6 +2675,82 @@ "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "dataset__direct_collection", + "description": "Whether data was collected directly from individuals or via third parties. List of DirectCollection objects from the Collection module describing direct vs. indirect collection methods and sources.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/directCollection" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/directCollection", + "alias": "direct_collection", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "DirectCollection", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "dataset__collection_notifications", + "description": "Notifications provided to individuals about data collection. List of CollectionNotification objects from the Ethics module describing how and when individuals were informed about the data collection.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/collectionNotifications" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/collectionNotifications", + "alias": "collection_notifications", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "CollectionNotification", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "dataset__collection_consents", + "description": "Consent obtained from individuals for data collection and use. List of CollectionConsent objects from the Ethics module describing how consent was requested, provided, and documented.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/collectionConsents" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/collectionConsents", + "alias": "collection_consents", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "CollectionConsent", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "dataset__consent_revocations", + "description": "Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects from the Ethics module describing how revocation works and what happens to data after revocation.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/consentRevocations" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/consentRevocations", + "alias": "consent_revocations", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "ConsentRevocation", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "dataset__missing_data_documentation", "description": "Documentation of missing data patterns and handling strategies.", @@ -2487,7 +2772,7 @@ }, { "name": "dataset__raw_data_sources", - "description": "Description of raw data sources before preprocessing.", + "description": "List of raw data sources before preprocessing. Each RawDataSource object describes where the original data came from and how it can be accessed.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/rawDataSources" @@ -2506,6 +2791,7 @@ }, { "name": "dataset__ethical_reviews", + "description": "Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the Ethics module describing IRB approvals, ethics committee reviews, and compliance certifications.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/ethicalReviews" @@ -2524,6 +2810,7 @@ }, { "name": "dataset__data_protection_impacts", + "description": "Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact objects from the Ethics module documenting privacy risk assessments and mitigation measures.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/dataProtectionImpacts" @@ -2633,6 +2920,7 @@ }, { "name": "dataset__preprocessing_strategies", + "description": "Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preprocessing module describing normalization, transformation, and other preparation steps.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/preprocessingStrategies" @@ -2651,6 +2939,7 @@ }, { "name": "dataset__cleaning_strategies", + "description": "Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy objects from the Preprocessing module describing outlier removal, deduplication, and error correction steps.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/cleaningStrategies" @@ -2669,6 +2958,7 @@ }, { "name": "dataset__labeling_strategies", + "description": "Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the Preprocessing module describing annotation procedures, annotator qualifications, and quality controls.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/labelingStrategies" @@ -2687,6 +2977,7 @@ }, { "name": "dataset__raw_sources", + "description": "Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the Preprocessing module describing original data sources and their formats.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/rawSources" @@ -2705,7 +2996,7 @@ }, { "name": "dataset__imputation_protocols", - "description": "Data imputation methodology and techniques.", + "description": "Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from the Preprocessing module describing the imputation technique, affected variables, and rationale.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/imputation_protocols" @@ -2759,6 +3050,7 @@ }, { "name": "dataset__existing_uses", + "description": "Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the Uses module describing research, commercial, or other applications of the dataset.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/existingUses" @@ -2777,6 +3069,7 @@ }, { "name": "dataset__use_repository", + "description": "Repositories or registries tracking how the dataset has been used. List of UseRepository objects from the Uses module pointing to papers with code, citation indices, or other use-tracking resources.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/useRepository" @@ -2795,6 +3088,7 @@ }, { "name": "dataset__other_tasks", + "description": "Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from the Uses module describing potential applications not originally planned by the dataset creators.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/otherTasks" @@ -2813,6 +3107,7 @@ }, { "name": "dataset__future_use_impacts", + "description": "Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects from the Uses module describing foreseeable consequences of using this dataset in new applications.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/futureUseImpacts" @@ -2831,6 +3126,7 @@ }, { "name": "dataset__discouraged_uses", + "description": "Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List of DiscouragedUse objects from the Uses module explaining why certain applications should be avoided.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/discouragedUses" @@ -2887,6 +3183,7 @@ }, { "name": "dataset__distribution_formats", + "description": "Formats in which the dataset is distributed or made available. List of DistributionFormat objects from the Distribution module describing file formats, compression, and access methods.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/distributionFormats" @@ -2905,6 +3202,7 @@ }, { "name": "dataset__distribution_dates", + "description": "Dates when the dataset was or will be distributed or released. List of DistributionDate objects from the Distribution module describing initial release dates, version release dates, and planned future releases.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/distributionDates" @@ -2921,8 +3219,28 @@ "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "dataset__third_party_sharing", + "description": "Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distribution module describing whether and how the dataset is shared with entities outside the creating organization.", + "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", + "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/thirdPartySharing" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/thirdPartySharing", + "alias": "third_party_sharing", + "owner": "Dataset", + "domain_of": [ + "Dataset" + ], + "range": "ThirdPartySharing", + "multivalued": true, + "inlined": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "dataset__license_and_use_terms", + "description": "License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Governance module describing the applicable license, permitted uses, and any restrictions.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "http://schema.org/license" @@ -2939,6 +3257,7 @@ }, { "name": "dataset__ip_restrictions", + "description": "Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the Data Governance module describing copyright, trademark, or other IP considerations.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/ipRestrictions" @@ -2955,6 +3274,7 @@ }, { "name": "dataset__regulatory_restrictions", + "description": "Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestrictions object from the Data Governance module describing compliance requirements such as ITAR, EAR, or GDPR.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/regulatoryRestrictions" @@ -2971,6 +3291,7 @@ }, { "name": "dataset__maintainers", + "description": "Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects from the Maintenance module describing maintenance contacts, roles, and support channels.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/maintainers" @@ -2989,6 +3310,7 @@ }, { "name": "dataset__errata", + "description": "Known errors or corrections to the dataset since publication. List of Erratum objects from the Maintenance module describing discovered errors, affected records, and correction procedures.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/errata" @@ -3007,6 +3329,7 @@ }, { "name": "dataset__updates", + "description": "Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module describing update frequency, versioning policy, and planned enhancements.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/updates" @@ -3023,6 +3346,7 @@ }, { "name": "dataset__retention_limit", + "description": "Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance module describing how long the dataset will be available and any deletion schedules.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/retentionLimit" @@ -3039,11 +3363,12 @@ }, { "name": "dataset__version_access", + "description": "Information about access to different versions of the dataset. VersionAccess object from the Maintenance module describing where older versions can be found and how version history is maintained.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ - "http://www.w3.org/ns/dcat#accessURL" + "https://w3id.org/bridge2ai/data-sheets-schema/versionAccess" ], - "slot_uri": "http://www.w3.org/ns/dcat#accessURL", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/versionAccess", "alias": "version_access", "owner": "Dataset", "domain_of": [ @@ -3055,6 +3380,7 @@ }, { "name": "dataset__extension_mechanism", + "description": "Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintenance module describing how others can propose additions, corrections, or expansions.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/extensionMechanism" @@ -3093,6 +3419,7 @@ }, { "name": "dataset__is_deidentified", + "description": "De-identification status and procedures applied to the dataset. Deidentification object describing whether the dataset contains personal data, what de-identification methods were applied, and any residual re-identification risks.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/isDeidentified" @@ -3109,11 +3436,12 @@ }, { "name": "dataset__is_tabular", + "description": "Whether the dataset is in tabular format (rows and columns). True if the data is structured as a table (e.g., CSV, TSV, relational database); false for unstructured formats such as images or free text.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema", "mappings": [ - "http://schema.org/encodingFormat" + "https://w3id.org/bridge2ai/data-sheets-schema/isTabular" ], - "slot_uri": "http://schema.org/encodingFormat", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/isTabular", "alias": "is_tabular", "owner": "Dataset", "domain_of": [ @@ -3210,6 +3538,13 @@ }, { "name": "namedThing__id", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/my-dataset-001", + "@type": "Annotation" + } + ], "description": "A unique identifier for a thing.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ @@ -3260,6 +3595,13 @@ }, { "name": "datasetProperty__id", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/property-001", + "@type": "Annotation" + } + ], "description": "An optional identifier for this property.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ @@ -3327,6 +3669,7 @@ }, { "name": "software__version", + "description": "The version identifier of the software (e.g., \"1.0.0\", \"2.3.1-beta\").", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://schema.org/softwareVersion" @@ -3342,6 +3685,7 @@ }, { "name": "software__license", + "description": "The license under which the software is distributed (e.g., \"MIT\", \"Apache-2.0\", \"GPL-3.0\").", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://schema.org/license" @@ -3357,6 +3701,7 @@ }, { "name": "software__url", + "description": "URL where the software can be found (e.g., homepage, repository, or documentation).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ "http://schema.org/url" @@ -3367,7 +3712,7 @@ "domain_of": [ "Software" ], - "range": "string", + "range": "uri", "@type": "SlotDefinition" }, { @@ -3405,15 +3750,22 @@ }, { "name": "person__orcid", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "0000-0001-2345-6789", + "@type": "Annotation" + } + ], "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "mappings": [ - "http://schema.org/identifier" + "https://w3id.org/bridge2ai/data-sheets-schema/orcidIdentifier" ], - "exact_mappings": [ + "broad_mappings": [ "http://schema.org/identifier" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/orcidIdentifier", "alias": "orcid", "owner": "Person", "domain_of": [ @@ -3425,6 +3777,7 @@ }, { "name": "formatDialect__comment_prefix", + "description": "Character(s) used to indicate comment lines (e.g., \"#\" for CSV comments).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/comment_prefix", "alias": "comment_prefix", @@ -3437,6 +3790,7 @@ }, { "name": "formatDialect__delimiter", + "description": "Field delimiter character (e.g., \",\" for CSV, \"\\t\" for TSV).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/delimiter", "alias": "delimiter", @@ -3449,6 +3803,7 @@ }, { "name": "formatDialect__double_quote", + "description": "Whether quotes within quoted fields are escaped by doubling them. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/double_quote", "alias": "double_quote", @@ -3461,6 +3816,7 @@ }, { "name": "formatDialect__header", + "description": "Whether the first row of the file contains column headers. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/header", "alias": "header", @@ -3473,6 +3829,7 @@ }, { "name": "formatDialect__quote_char", + "description": "Character used for quoting fields (e.g., '\"' for CSV).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/quote_char", "alias": "quote_char", @@ -3488,9 +3845,12 @@ "description": "Short explanation describing the primary purpose of creating the dataset.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse", "alias": "response", "owner": "Purpose", "domain_of": [ @@ -3504,9 +3864,12 @@ "description": "Short explanation describing the specific task or tasks for which this dataset was created.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse", "alias": "response", "owner": "Task", "domain_of": [ @@ -3520,9 +3883,12 @@ "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/questionResponse", "alias": "response", "owner": "AddressingGap", "domain_of": [ @@ -3536,12 +3902,13 @@ "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ - "http://purl.org/dc/terms/creator" + "https://w3id.org/bridge2ai/data-sheets-schema/principalInvestigator" ], - "exact_mappings": [ + "broad_mappings": [ + "http://purl.org/dc/terms/creator", "http://schema.org/creator" ], - "slot_uri": "http://purl.org/dc/terms/creator", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/principalInvestigator", "alias": "principal_investigator", "owner": "Creator", "domain_of": [ @@ -3555,9 +3922,12 @@ "description": "Organizations with which the creator or team is affiliated.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/teamAffiliation" + ], + "broad_mappings": [ "http://schema.org/affiliation" ], - "slot_uri": "http://schema.org/affiliation", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/teamAffiliation", "alias": "affiliations", "owner": "Creator", "domain_of": [ @@ -3571,7 +3941,7 @@ }, { "name": "creator__credit_roles", - "description": "Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team. Specifies the specific contributions made to this dataset (e.g., Conceptualization, Data Curation, Methodology). Note: roles are specified here rather than on Person directly, since the same person may have different roles across different datasets.", + "description": "One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team (e.g., Conceptualization, Data Curation, Methodology).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/creditRoles" @@ -3626,9 +3996,12 @@ "description": "The alphanumeric identifier for the grant.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/motivation", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/grantIdentifier" + ], + "broad_mappings": [ "http://schema.org/identifier" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/grantIdentifier", "alias": "grant_number", "owner": "Grant", "domain_of": [ @@ -3658,12 +4031,15 @@ }, { "name": "instance__instance_type", - "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "description": "The type or types of instances in the dataset (e.g., \"movie\", \"user\", \"rating\", \"clinical record\"). Use when the dataset contains multiple instance types with different structures.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/instanceType" + ], + "broad_mappings": [ "http://purl.org/dc/terms/type" ], - "slot_uri": "http://purl.org/dc/terms/type", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/instanceType", "alias": "instance_type", "owner": "Instance", "domain_of": [ @@ -3677,12 +4053,12 @@ "description": "Type of data (e.g., raw text, images) from Bridge2AI standards.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ - "http://purl.org/dc/terms/format" + "http://purl.org/dc/terms/type" ], "values_from": [ "B2AI_SUBSTRATE" ], - "slot_uri": "http://purl.org/dc/terms/format", + "slot_uri": "http://purl.org/dc/terms/type", "alias": "data_substrate", "owner": "Instance", "domain_of": [ @@ -3693,6 +4069,13 @@ }, { "name": "instance__counts", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "42000 (42,000 patient records)", + "@type": "Annotation" + } + ], "description": "How many instances are there in total (of each type, if appropriate)?\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ @@ -3728,9 +4111,12 @@ "description": "If labeled, what pattern or format do labels follow?\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/labelPattern" + ], + "broad_mappings": [ "http://schema.org/description" ], - "slot_uri": "http://schema.org/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/labelPattern", "alias": "label_description", "owner": "Instance", "domain_of": [ @@ -3789,7 +4175,6 @@ "SamplingStrategy" ], "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { @@ -3806,12 +4191,11 @@ "SamplingStrategy" ], "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "samplingStrategy__source_data", - "description": "Description of the larger set from which the sample was drawn, if any.\n", + "description": "One or more descriptions of the larger sets from which the sample was drawn, if applicable.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/sourceData" @@ -3840,17 +4224,19 @@ "SamplingStrategy" ], "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "samplingStrategy__representative_verification", - "description": "Explanation of how representativeness was validated or verified.\n", + "description": "One or more explanations of how representativeness was validated or verified (e.g., statistical tests, domain expert review).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/verificationDescription" + ], + "broad_mappings": [ "http://schema.org/description" ], - "slot_uri": "http://schema.org/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/verificationDescription", "alias": "representative_verification", "owner": "SamplingStrategy", "domain_of": [ @@ -3862,7 +4248,7 @@ }, { "name": "samplingStrategy__why_not_representative", - "description": "Explanation of why the sample is not representative, if applicable.\n", + "description": "One or more explanations of why the sample is not representative of the larger set, if applicable.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/whyNotRepresentative" @@ -3879,7 +4265,7 @@ }, { "name": "samplingStrategy__strategies", - "description": "Description of the sampling strategy (deterministic, probabilistic, etc.).\n", + "description": "One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, systematic).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/strategies" @@ -3899,9 +4285,12 @@ "description": "Description of the missing data fields or elements.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/missingDataDescription" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/missingDataDescription", "alias": "missing", "owner": "MissingInfo", "domain_of": [ @@ -3916,9 +4305,12 @@ "description": "Explanation of why each piece of data is missing.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/missingDataCause" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/missingDataCause", "alias": "why_missing", "owner": "MissingInfo", "domain_of": [ @@ -3930,7 +4322,7 @@ }, { "name": "relationships__relationship_details", - "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "description": "Free-text description of how relationships between instances are represented (e.g., graph edges, ratings matrices, foreign keys), including relationship types and any associated metadata.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "http://purl.org/dc/terms/description" @@ -3947,7 +4339,7 @@ }, { "name": "splits__split_details", - "description": "Details on recommended data splits and their rationale.\n", + "description": "Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how they are defined, and the rationale for the split strategy.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "http://purl.org/dc/terms/description" @@ -3964,12 +4356,15 @@ }, { "name": "dataAnomaly__anomaly_details", - "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "description": "Free-text description of errors, noise sources, or redundancies in the dataset, including their known causes and estimated prevalence.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/anomalyDetails" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/anomalyDetails", "alias": "anomaly_details", "owner": "DataAnomaly", "domain_of": [ @@ -4029,7 +4424,7 @@ }, { "name": "datasetBias__affected_subsets", - "description": "Specific subsets or features of the dataset affected by this bias.\n", + "description": "One or more specific subsets or features of the dataset affected by this bias (e.g., \"female participants\", \"non-English text\", \"images taken at night\").\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/affectedSubsets" @@ -4113,9 +4508,12 @@ "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/availabilityGuarantee" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/availabilityGuarantee", "alias": "future_guarantees", "owner": "ExternalResource", "domain_of": [ @@ -4127,29 +4525,31 @@ }, { "name": "externalResource__archival", - "description": "Indication whether official archival versions of external resources are included.\n", + "description": "Indicates whether official archival versions of external resources are included in the dataset.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ - "http://schema.org/archivedAt" + "https://w3id.org/bridge2ai/data-sheets-schema/hasArchivalVersion" ], - "slot_uri": "http://schema.org/archivedAt", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/hasArchivalVersion", "alias": "archival", "owner": "ExternalResource", "domain_of": [ "ExternalResource" ], "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "externalResource__restrictions", - "description": "Description of any restrictions or fees associated with external resources.\n", + "description": "One or more descriptions of restrictions or fees associated with accessing these external resources (e.g., paywalls, registration requirements, API limits).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/externalResourceRestrictions" + ], + "broad_mappings": [ "http://purl.org/dc/terms/accessRights" ], - "slot_uri": "http://purl.org/dc/terms/accessRights", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/externalResourceRestrictions", "alias": "restrictions", "owner": "ExternalResource", "domain_of": [ @@ -4177,7 +4577,7 @@ }, { "name": "confidentiality__confidentiality_details", - "description": "Details on confidential data elements and handling procedures.\n", + "description": "Free-text description of which data elements are confidential, the basis for confidentiality (e.g., legal privilege, patient data), and how they are handled or restricted.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "http://purl.org/dc/terms/description" @@ -4210,6 +4610,7 @@ }, { "name": "contentWarning__warnings", + "description": "One or more specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content present in the dataset (e.g., violence, profanity, explicit imagery, hate speech).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ "http://purl.org/dc/terms/description" @@ -4242,11 +4643,15 @@ }, { "name": "subpopulation__identification", + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/subpopulationIdentification" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/subpopulationIdentification", "alias": "identification", "owner": "Subpopulation", "domain_of": [ @@ -4258,11 +4663,15 @@ }, { "name": "subpopulation__distribution", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/subpopulationDistribution" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/subpopulationDistribution", "alias": "distribution", "owner": "Subpopulation", "domain_of": [ @@ -4303,12 +4712,12 @@ }, { "name": "deidentification__identifiers_removed", - "description": "List of identifier types removed during de-identification.", + "description": "List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision').", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ - "http://schema.org/identifier" + "https://w3id.org/bridge2ai/data-sheets-schema/removedIdentifierTypes" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/removedIdentifierTypes", "alias": "identifiers_removed", "owner": "Deidentification", "domain_of": [ @@ -4373,9 +4782,9 @@ "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/composition", "mappings": [ - "http://schema.org/identifier" + "http://purl.org/dc/terms/relation" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "http://purl.org/dc/terms/relation", "alias": "target_dataset", "owner": "DatasetRelationship", "domain_of": [ @@ -4417,7 +4826,7 @@ }, { "name": "instanceAcquisition__was_directly_observed", - "description": "Whether the data was directly observed", + "description": "True if the data was directly observed by a researcher or instrument; false if it was obtained through other means (e.g., reported, inferred).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/wasDirectlyObserved" @@ -4433,7 +4842,7 @@ }, { "name": "instanceAcquisition__was_reported_by_subjects", - "description": "Whether the data was reported directly by the subjects themselves", + "description": "True if the data was self-reported directly by the subjects themselves (e.g., survey responses, questionnaires); false otherwise.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/wasReportedBySubjects" @@ -4449,7 +4858,7 @@ }, { "name": "instanceAcquisition__was_inferred_derived", - "description": "Whether the data was inferred or derived from other data", + "description": "True if the data was computationally inferred or derived from other data (e.g., model outputs, imputed values); false otherwise.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/wasInferred" @@ -4465,7 +4874,7 @@ }, { "name": "instanceAcquisition__was_validated_verified", - "description": "Whether the data was validated or verified in any way", + "description": "True if the data underwent a validation or verification process (e.g., expert review, cross-checking with ground truth); false otherwise.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/wasValidated" @@ -4481,7 +4890,7 @@ }, { "name": "instanceAcquisition__acquisition_details", - "description": "Details on how data was acquired for each instance.\n", + "description": "Free-text description of how data was acquired for each instance, including instruments, protocols, and any manual steps involved.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://purl.org/dc/terms/description" @@ -4498,7 +4907,7 @@ }, { "name": "collectionMechanism__mechanism_details", - "description": "Details on mechanisms or procedures used to collect the data.\n", + "description": "Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardware model, software API, manual curation process), including how those mechanisms were validated.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://purl.org/dc/terms/description" @@ -4515,7 +4924,7 @@ }, { "name": "dataCollector__role", - "description": "Role of the data collector (e.g., researcher, crowdworker)", + "description": "Role of the data collector (e.g., researcher, crowdworker).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://schema.org/roleName" @@ -4531,7 +4940,7 @@ }, { "name": "dataCollector__collector_details", - "description": "Details on who collected the data and their compensation.\n", + "description": "Free-text description of who was involved in data collection (e.g., students, crowdworkers, contractors), their training or qualifications, and how they were compensated.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://purl.org/dc/terms/description" @@ -4548,7 +4957,7 @@ }, { "name": "collectionTimeframe__start_date", - "description": "Start date of data collection", + "description": "Start date of data collection.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://schema.org/startDate" @@ -4564,7 +4973,7 @@ }, { "name": "collectionTimeframe__end_date", - "description": "End date of data collection", + "description": "End date of data collection.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://schema.org/endDate" @@ -4580,7 +4989,7 @@ }, { "name": "collectionTimeframe__timeframe_details", - "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "description": "Free-text description of the data collection period and whether this timeframe matches the creation timeframe of the underlying data (e.g., historical records, prospective collection).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://purl.org/dc/terms/description" @@ -4597,7 +5006,7 @@ }, { "name": "directCollection__is_direct", - "description": "Whether collection was direct from individuals", + "description": "Whether collection was direct from individuals.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/isDirect" @@ -4613,7 +5022,7 @@ }, { "name": "directCollection__collection_details", - "description": "Details on direct vs. indirect collection methods and sources.\n", + "description": "Free-text description of whether data was collected directly from individuals or obtained via third parties or other indirect sources, and what those sources are.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "http://purl.org/dc/terms/description" @@ -4664,7 +5073,7 @@ }, { "name": "missingDataDocumentation__handling_strategy", - "description": "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation).\n", + "description": "The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple imputation, flagging with sentinel values).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/handlingStrategy" @@ -4697,12 +5106,15 @@ }, { "name": "rawDataSource__source_type", - "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "description": "One or more types of raw source (e.g., sensor, database, user input, web scraping).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/sourceType" + ], + "broad_mappings": [ "http://purl.org/dc/terms/type" ], - "slot_uri": "http://purl.org/dc/terms/type", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/sourceType", "alias": "source_type", "owner": "RawDataSource", "domain_of": [ @@ -4730,7 +5142,7 @@ }, { "name": "rawDataSource__raw_data_format", - "description": "Format of the raw data before any preprocessing.\n", + "description": "One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/collection", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/rawDataFormat" @@ -4747,7 +5159,7 @@ }, { "name": "preprocessingStrategy__preprocessing_details", - "description": "Details on preprocessing steps applied to the data.\n", + "description": "Free-text description of preprocessing steps applied to the data, including tools used, parameters, order of operations, and rationale for each step.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "http://purl.org/dc/terms/description" @@ -4764,7 +5176,7 @@ }, { "name": "cleaningStrategy__cleaning_details", - "description": "Details on data cleaning procedures applied.\n", + "description": "Free-text description of data cleaning procedures applied, including criteria for removing or correcting instances, tools used, and how removed instances are accounted for.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "http://purl.org/dc/terms/description" @@ -4781,21 +5193,19 @@ }, { "name": "labelingStrategy__data_annotation_platform", - "description": "Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", + "description": "One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ - "http://schema.org/instrument" - ], - "exact_mappings": [ "http://mlcommons.org/croissant/RAI/dataAnnotationPlatform" ], - "slot_uri": "http://schema.org/instrument", + "slot_uri": "http://mlcommons.org/croissant/RAI/dataAnnotationPlatform", "alias": "data_annotation_platform", "owner": "LabelingStrategy", "domain_of": [ "LabelingStrategy" ], "range": "string", + "multivalued": true, "@type": "SlotDefinition" }, { @@ -4820,6 +5230,13 @@ }, { "name": "labelingStrategy__annotations_per_item", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "3 (three independent annotators per item)", + "@type": "Annotation" + } + ], "description": "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ @@ -4855,7 +5272,7 @@ }, { "name": "labelingStrategy__annotator_demographics", - "description": "Demographic information about annotators, if available and relevant (e.g., geographic location, language background, expertise level).", + "description": "One or more demographic characteristics of the annotators, if available and relevant (e.g., geographic location, language background, expertise level, native language).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/annotatorDemographics" @@ -4875,7 +5292,7 @@ }, { "name": "labelingStrategy__labeling_details", - "description": "Details on labeling/annotation procedures and quality metrics.\n", + "description": "Free-text description of the labeling or annotation procedures, including annotation guidelines, task definitions, and quality control metrics.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "http://purl.org/dc/terms/description" @@ -4892,12 +5309,19 @@ }, { "name": "rawData__access_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/raw/raw-data.zip", + "@type": "Annotation" + } + ], "description": "URL or access point for the raw data.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ - "http://www.w3.org/ns/dcat#accessURL" + "https://w3id.org/bridge2ai/data-sheets-schema/rawDataAccessURL" ], - "slot_uri": "http://www.w3.org/ns/dcat#accessURL", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/rawDataAccessURL", "alias": "access_url", "owner": "RawData", "domain_of": [ @@ -4908,7 +5332,7 @@ }, { "name": "rawData__raw_data_details", - "description": "Details on raw data availability and access procedures.\n", + "description": "Free-text description of raw data availability, access procedures, and any conditions or restrictions on accessing the raw data.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "http://purl.org/dc/terms/description" @@ -5077,9 +5501,12 @@ "description": "List of automated annotation tools with their versions. Format each entry as \"ToolName version\" (e.g., \"spaCy 3.5.0\", \"NLTK 3.8\", \"GPT-4 turbo\"). Use \"unknown\" for version if not available (e.g., \"Custom NER Model unknown\").\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/toolNames" + ], + "broad_mappings": [ "http://schema.org/name" ], - "slot_uri": "http://schema.org/name", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/toolNames", "alias": "tools", "owner": "MachineAnnotationTools", "domain_of": [ @@ -5108,7 +5535,7 @@ }, { "name": "machineAnnotationTools__tool_accuracy", - "description": "Known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", + "description": "One or more known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/toolAccuracy" @@ -5142,6 +5569,13 @@ }, { "name": "useRepository__repository_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/known-uses", + "@type": "Annotation" + } + ], "description": "URL to a repository of known dataset uses.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/uses#repository_url", @@ -5155,7 +5589,7 @@ }, { "name": "useRepository__repository_details", - "description": "Details on the repository of known dataset uses.\n", + "description": "Free-text description of the repository of known dataset uses, including how it is maintained and how to contribute new use cases.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "http://purl.org/dc/terms/description" @@ -5172,7 +5606,7 @@ }, { "name": "otherTask__task_details", - "description": "Details on other potential tasks the dataset could be used for.\n", + "description": "Free-text description of other potential tasks the dataset could support, including any prerequisites or limitations for those uses.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "http://purl.org/dc/terms/description" @@ -5189,7 +5623,7 @@ }, { "name": "futureUseImpact__impact_details", - "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "description": "Free-text description of potential future impacts or risks arising from the dataset's composition or collection (e.g., unfair treatment, privacy violations, legal or financial risks), and any recommended mitigation strategies.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "http://purl.org/dc/terms/description" @@ -5206,7 +5640,7 @@ }, { "name": "discouragedUse__discouragement_details", - "description": "Details on tasks for which the dataset should not be used.\n", + "description": "Free-text description of tasks or applications for which the dataset is not recommended, with explanation of why (e.g., out-of-scope, risk of harm, poor coverage).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "http://purl.org/dc/terms/description" @@ -5237,7 +5671,7 @@ }, { "name": "intendedUse__usage_notes", - "description": "Notes or caveats about using the dataset for intended purposes.", + "description": "A note or caveat about using the dataset for its intended purposes.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/uses#usage_notes", "alias": "usage_notes", @@ -5250,7 +5684,7 @@ }, { "name": "intendedUse__use_category", - "description": "Category of intended use (e.g., research, clinical, educational, commercial, policy).", + "description": "One or more categories of intended use (e.g., research, clinical, educational, commercial, policy).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/useCategory" @@ -5267,7 +5701,7 @@ }, { "name": "prohibitedUse__prohibition_reason", - "description": "Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", + "description": "One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "mappings": [ "https://w3id.org/bridge2ai/data-sheets-schema/prohibitionReason" @@ -5287,9 +5721,9 @@ "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/distribution", "mappings": [ - "http://purl.org/dc/terms/accessRights" + "https://w3id.org/bridge2ai/data-sheets-schema/isExternallyShared" ], - "slot_uri": "http://purl.org/dc/terms/accessRights", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/isExternallyShared", "alias": "is_shared", "owner": "ThirdPartySharing", "domain_of": [ @@ -5300,7 +5734,14 @@ }, { "name": "distributionFormat__access_urls", - "description": "Details of the distribution channel(s) or format(s).", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/download", + "@type": "Annotation" + } + ], + "description": "One or more URLs providing access to the distribution channel(s) or format(s).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/distribution", "mappings": [ "http://www.w3.org/ns/dcat#accessURL" @@ -5311,13 +5752,13 @@ "domain_of": [ "DistributionFormat" ], - "range": "string", + "range": "uri", "multivalued": true, "@type": "SlotDefinition" }, { "name": "distributionDate__release_dates", - "description": "Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases.\n", + "description": "One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., \"2024-03-15\") or as a descriptive string (e.g., \"Q2 2024\"). Use multiple values for staged or scheduled releases.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/distribution", "mappings": [ "http://purl.org/dc/terms/available" @@ -5350,7 +5791,7 @@ }, { "name": "maintainer__maintainer_details", - "description": "Details on who will support, host, or maintain the dataset.\n", + "description": "Free-text description of the organization, team, or individual responsible for maintaining the dataset, including contact information and hosting arrangements.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5367,12 +5808,19 @@ }, { "name": "erratum__erratum_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/errata/2024-01-15", + "@type": "Annotation" + } + ], "description": "URL or access point for the erratum.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ - "http://www.w3.org/ns/dcat#accessURL" + "https://w3id.org/bridge2ai/data-sheets-schema/erratumURL" ], - "slot_uri": "http://www.w3.org/ns/dcat#accessURL", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/erratumURL", "alias": "erratum_url", "owner": "Erratum", "domain_of": [ @@ -5383,7 +5831,7 @@ }, { "name": "erratum__erratum_details", - "description": "Details on any errata or corrections to the dataset.\n", + "description": "Free-text description of the error, its scope, the affected data or records, and the correction applied.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5416,7 +5864,7 @@ }, { "name": "updatePlan__update_details", - "description": "Details on update plans, responsible parties, and communication methods.\n", + "description": "Free-text description of planned update types (e.g., corrections, additions, deletions), responsible parties, and how updates will be communicated to users.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5449,7 +5897,7 @@ }, { "name": "retentionLimits__retention_details", - "description": "Details on data retention limits and enforcement procedures.\n", + "description": "Free-text description of applicable retention limits, legal or ethical basis for those limits, and how they will be enforced (e.g., automated deletion, anonymization after the retention period).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5466,18 +5914,18 @@ }, { "name": "versionAccess__latest_version_doi", - "description": "DOI or URL of the latest dataset version.", + "description": "DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567').", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ - "http://schema.org/identifier" + "http://purl.org/dc/terms/hasVersion" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "http://purl.org/dc/terms/hasVersion", "alias": "latest_version_doi", "owner": "VersionAccess", "domain_of": [ "VersionAccess" ], - "range": "string", + "range": "uriorcurie", "@type": "SlotDefinition" }, { @@ -5499,7 +5947,7 @@ }, { "name": "versionAccess__version_details", - "description": "Details on version support policies and obsolescence communication.\n", + "description": "Free-text description of version support policies, how long older versions will be hosted, and how dataset consumers will be notified when versions become obsolete.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5516,12 +5964,19 @@ }, { "name": "extensionMechanism__contribution_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/contributing", + "@type": "Annotation" + } + ], "description": "URL for contribution guidelines or process.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ - "http://www.w3.org/ns/dcat#landingPage" + "https://w3id.org/bridge2ai/data-sheets-schema/contributionURL" ], - "slot_uri": "http://www.w3.org/ns/dcat#landingPage", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/contributionURL", "alias": "contribution_url", "owner": "ExtensionMechanism", "domain_of": [ @@ -5532,7 +5987,7 @@ }, { "name": "extensionMechanism__extension_details", - "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "description": "Free-text description of how third parties can contribute to the dataset, how contributions are validated (e.g., peer review, automated tests), and how accepted contributions will be communicated to the community.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/maintenance", "mappings": [ "http://purl.org/dc/terms/description" @@ -5552,12 +6007,12 @@ "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ - "http://schema.org/contactPoint" + "https://w3id.org/bridge2ai/data-sheets-schema/ethicsContactPoint" ], - "exact_mappings": [ + "broad_mappings": [ "http://schema.org/contactPoint" ], - "slot_uri": "http://schema.org/contactPoint", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/ethicsContactPoint", "alias": "contact_person", "owner": "EthicalReview", "domain_of": [ @@ -5587,7 +6042,7 @@ }, { "name": "ethicalReview__review_details", - "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "description": "Free-text description of the ethical review process, board decisions, outcomes, and any supporting documentation (e.g., IRB approval number, ethics committee name).\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ "http://purl.org/dc/terms/description" @@ -5604,7 +6059,7 @@ }, { "name": "dataProtectionImpact__impact_details", - "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "description": "Free-text description of the data protection impact analysis, including methodology, privacy risks identified, mitigation measures taken, and any regulatory findings.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ "http://purl.org/dc/terms/description" @@ -5621,7 +6076,7 @@ }, { "name": "collectionNotification__notification_details", - "description": "Details on how individuals were notified about data collection.\n", + "description": "Free-text description of how individuals were notified about data collection, including the notification method (e.g., email, poster, in-person), timing, and the language or text of the notification itself if available.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ "http://purl.org/dc/terms/description" @@ -5638,7 +6093,7 @@ }, { "name": "collectionConsent__consent_details", - "description": "Details on how consent was requested, provided, and documented.\n", + "description": "Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, and documented, including the language individuals consented to.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ "http://purl.org/dc/terms/description" @@ -5655,7 +6110,7 @@ }, { "name": "consentRevocation__revocation_details", - "description": "Details on consent revocation mechanisms and procedures.\n", + "description": "Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out portal, written request), the scope of revocation (full withdrawal or specific uses), and what happens to their data after revocation.\n", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/ethics", "mappings": [ "http://purl.org/dc/terms/description" @@ -6042,12 +6497,16 @@ }, { "name": "licenseAndUseTerms__license_terms", - "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "description": "Description of the dataset's license and terms of use, including links, costs, or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', 'CC BY-NC-SA 4.0', 'proprietary - contact data@example.org for access').", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ - "http://purl.org/dc/terms/license" + "https://w3id.org/bridge2ai/data-sheets-schema/licenseDescription" ], - "slot_uri": "http://purl.org/dc/terms/license", + "broad_mappings": [ + "http://purl.org/dc/terms/license", + "http://purl.org/dc/terms/rights" + ], + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/licenseDescription", "alias": "license_terms", "owner": "LicenseAndUseTerms", "domain_of": [ @@ -6059,7 +6518,7 @@ }, { "name": "licenseAndUseTerms__data_use_permission", - "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ "http://purl.obolibrary.org/obo/DUO_0000001" @@ -6082,12 +6541,12 @@ "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ - "http://schema.org/contactPoint" + "https://w3id.org/bridge2ai/data-sheets-schema/licenseContactPoint" ], - "exact_mappings": [ + "broad_mappings": [ "http://schema.org/contactPoint" ], - "slot_uri": "http://schema.org/contactPoint", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/licenseContactPoint", "alias": "contact_person", "owner": "LicenseAndUseTerms", "domain_of": [ @@ -6098,7 +6557,7 @@ }, { "name": "iPRestrictions__restrictions", - "description": "Explanation of third-party IP restrictions.", + "description": "One or more explanations of third-party IP restrictions or associated fees.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ "http://purl.org/dc/terms/rights" @@ -6119,7 +6578,7 @@ }, { "name": "exportControlRegulatoryRestrictions__regulatory_restrictions", - "description": "Export or regulatory restrictions on the dataset.", + "description": "One or more export controls or regulatory restrictions applicable to the dataset (e.g., HIPAA, ITAR, GDPR).", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ "http://purl.org/dc/terms/accessRights" @@ -6193,12 +6652,12 @@ "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/data-governance", "mappings": [ - "http://schema.org/contactPoint" + "https://w3id.org/bridge2ai/data-sheets-schema/governanceContactPoint" ], - "exact_mappings": [ + "broad_mappings": [ "http://schema.org/contactPoint" ], - "slot_uri": "http://schema.org/contactPoint", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/governanceContactPoint", "alias": "governance_committee_contact", "owner": "ExportControlRegulatoryRestrictions", "domain_of": [ @@ -6212,12 +6671,13 @@ "description": "The name or identifier of the variable as it appears in the data files.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ - "http://schema.org/name" + "https://w3id.org/bridge2ai/data-sheets-schema/variableName" ], - "exact_mappings": [ - "http://schema.org/name" + "broad_mappings": [ + "http://schema.org/name", + "http://schema.org/identifier" ], - "slot_uri": "http://schema.org/name", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/variableName", "alias": "variable_name", "owner": "VariableMetadata", "domain_of": [ @@ -6285,6 +6745,13 @@ }, { "name": "variableMetadata__minimum_value", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "0.0", + "@type": "Annotation" + } + ], "description": "The minimum value that the variable can take. Applicable to numeric variables.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ @@ -6301,6 +6768,13 @@ }, { "name": "variableMetadata__maximum_value", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "100.0", + "@type": "Annotation" + } + ], "description": "The maximum value that the variable can take. Applicable to numeric variables.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ @@ -6317,7 +6791,7 @@ }, { "name": "variableMetadata__categories", - "description": "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", + "description": "One or more permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ "http://schema.org/valueReference" @@ -6354,9 +6828,9 @@ "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ - "http://schema.org/identifier" + "https://w3id.org/bridge2ai/data-sheets-schema/isIdentifier" ], - "slot_uri": "http://schema.org/identifier", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/isIdentifier", "alias": "is_identifier", "owner": "VariableMetadata", "domain_of": [ @@ -6383,6 +6857,13 @@ }, { "name": "variableMetadata__precision", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "2 (two decimal places, e.g., 3.14)", + "@type": "Annotation" + } + ], "description": "The precision or number of decimal places for numeric variables.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ @@ -6434,9 +6915,12 @@ "description": "Notes about data quality, reliability, or known issues specific to this variable.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/variables", "mappings": [ + "https://w3id.org/bridge2ai/data-sheets-schema/qualityNotes" + ], + "broad_mappings": [ "http://purl.org/dc/terms/description" ], - "slot_uri": "http://purl.org/dc/terms/description", + "slot_uri": "https://w3id.org/bridge2ai/data-sheets-schema/qualityNotes", "alias": "quality_notes", "owner": "VariableMetadata", "domain_of": [ @@ -6481,6 +6965,13 @@ }, { "name": "fileCollection__file_count", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "47", + "@type": "Annotation" + } + ], "description": "Number of files in this collection.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/file-collection", "mappings": [ @@ -6497,7 +6988,14 @@ }, { "name": "fileCollection__total_bytes", - "description": "Total size of all files in bytes.", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "1073741824 (1 GiB = 1024\u00b3 bytes)", + "@type": "Annotation" + } + ], + "description": "Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/file-collection", "mappings": [ "http://www.w3.org/ns/dcat#byteSize" @@ -6815,11 +7313,17 @@ "dataset__content_warnings", "dataset__subpopulations", "dataset__sensitive_elements", + "dataset__relationships", + "dataset__splits", "dataset__acquisition_methods", "dataset__collection_mechanisms", "dataset__sampling_strategies", "dataset__data_collectors", "dataset__collection_timeframes", + "dataset__direct_collection", + "dataset__collection_notifications", + "dataset__collection_consents", + "dataset__consent_revocations", "dataset__missing_data_documentation", "dataset__raw_data_sources", "dataset__ethical_reviews", @@ -6845,6 +7349,7 @@ "dataset__prohibited_uses", "dataset__distribution_formats", "dataset__distribution_dates", + "dataset__third_party_sharing", "dataset__license_and_use_terms", "dataset__ip_restrictions", "dataset__regulatory_restrictions", @@ -6877,6 +7382,13 @@ }, { "name": "total_file_count", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "156", + "@type": "Annotation" + } + ], "description": "Total number of files across all file collections in this dataset. Can be aggregated from file_collections[].file_count.", "slot_uri": "d4d:totalFileCount", "range": "integer", @@ -6884,6 +7396,13 @@ }, { "name": "total_size_bytes", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "10737418240 (10 GiB = 10 \u00d7 1024\u00b3 bytes)", + "@type": "Annotation" + } + ], "description": "Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes.", "slot_uri": "dcat:byteSize", "range": "integer", @@ -6891,6 +7410,7 @@ }, { "name": "purposes", + "description": "Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each describing a specific creation goal or intended application.", "slot_uri": "d4d:purposes", "range": "Purpose", "multivalued": true, @@ -6899,6 +7419,7 @@ }, { "name": "tasks", + "description": "Tasks the dataset is intended to support. List of Task objects from the Motivation module describing specific machine learning, research, or analytical tasks.", "slot_uri": "d4d:tasks", "range": "Task", "multivalued": true, @@ -6907,6 +7428,7 @@ }, { "name": "addressing_gaps", + "description": "Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation module, each describing a gap in existing datasets or knowledge that this dataset fills.", "slot_uri": "d4d:addressingGaps", "range": "AddressingGap", "multivalued": true, @@ -6915,6 +7437,7 @@ }, { "name": "creators", + "description": "Individuals or organizations who created the dataset. List of Creator objects describing authorship, roles, and affiliations of dataset creators.", "slot_uri": "schema:creator", "range": "Creator", "multivalued": true, @@ -6923,6 +7446,7 @@ }, { "name": "funders", + "description": "Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing grants, contracts, or other funding sources including grantors and grant identifiers.", "slot_uri": "schema:funder", "range": "FundingMechanism", "multivalued": true, @@ -6931,10 +7455,8 @@ }, { "name": "subsets", - "exact_mappings": [ - "schema:distribution" - ], - "slot_uri": "dcat:distribution", + "description": "Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each representing a logical partition such as training, validation, or test splits, or demographic subgroups.", + "slot_uri": "d4d:dataSubset", "range": "DataSubset", "multivalued": true, "inlined_as_list": true, @@ -6942,6 +7464,7 @@ }, { "name": "instances", + "description": "Individual data instances or records in the dataset. List of Instance objects from the Composition module describing what each data point represents, its type, and associated label information.", "slot_uri": "d4d:instances", "range": "Instance", "multivalued": true, @@ -6950,6 +7473,7 @@ }, { "name": "anomalies", + "description": "Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects from the Composition module, each documenting a specific anomaly and its potential impact.", "slot_uri": "d4d:anomalies", "range": "DataAnomaly", "multivalued": true, @@ -6976,6 +7500,7 @@ }, { "name": "confidential_elements", + "description": "Confidential or restricted information within the dataset that requires access controls. List of Confidentiality objects describing what is confidential and why it cannot be released.", "slot_uri": "d4d:confidentialElements", "range": "Confidentiality", "multivalued": true, @@ -6984,6 +7509,7 @@ }, { "name": "content_warnings", + "description": "Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ContentWarning objects alerting users to sensitive content categories.", "slot_uri": "d4d:contentWarnings", "range": "ContentWarning", "multivalued": true, @@ -6992,6 +7518,7 @@ }, { "name": "subpopulations", + "description": "Subpopulations represented within the dataset. List of Subpopulation objects from the Composition module describing demographic or other groups, their representation, and any imbalances.", "slot_uri": "d4d:subpopulations", "range": "Subpopulation", "multivalued": true, @@ -7000,14 +7527,34 @@ }, { "name": "sensitive_elements", + "description": "Sensitive data elements requiring special handling or access controls. List of SensitiveElement objects identifying sensitive attributes such as personal identifiers, protected health information, or legally sensitive content.", "slot_uri": "d4d:sensitiveElements", "range": "SensitiveElement", "multivalued": true, "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "relationships", + "description": "Explicit relationships between individual instances in the dataset. List of Relationships objects from the Composition module describing how instances relate (e.g., graph edges, ratings, social network links).", + "slot_uri": "d4d:relationships", + "range": "Relationships", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "splits", + "description": "Recommended data splits for this dataset. List of Splits objects from the Composition module describing train/validation/test partitions and the rationale for each split strategy.", + "slot_uri": "d4d:splits", + "range": "Splits", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "acquisition_methods", + "description": "Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Collection module describing how data was sourced, whether directly observed or derived.", "slot_uri": "d4d:acquisitionMethods", "range": "InstanceAcquisition", "multivalued": true, @@ -7016,6 +7563,7 @@ }, { "name": "collection_mechanisms", + "description": "Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from the Collection module describing sensors, surveys, APIs, or other collection instruments.", "slot_uri": "d4d:collectionMechanisms", "range": "CollectionMechanism", "multivalued": true, @@ -7024,6 +7572,7 @@ }, { "name": "sampling_strategies", + "description": "Strategies used to select data instances from a larger population. List of SamplingStrategy objects from the Collection module describing sampling methodology, inclusion criteria, and limitations.", "slot_uri": "d4d:samplingStrategies", "range": "SamplingStrategy", "multivalued": true, @@ -7032,6 +7581,7 @@ }, { "name": "data_collectors", + "description": "Individuals or organizations responsible for collecting the data. List of DataCollector objects from the Collection module describing who performed data collection and their roles.", "slot_uri": "d4d:dataCollectors", "range": "DataCollector", "multivalued": true, @@ -7040,12 +7590,49 @@ }, { "name": "collection_timeframes", + "description": "Time periods during which data was collected. List of CollectionTimeframe objects from the Collection module describing collection start and end dates, and any gaps in the collection period.", "slot_uri": "d4d:collectionTimeframes", "range": "CollectionTimeframe", "multivalued": true, "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "direct_collection", + "description": "Whether data was collected directly from individuals or via third parties. List of DirectCollection objects from the Collection module describing direct vs. indirect collection methods and sources.", + "slot_uri": "d4d:directCollection", + "range": "DirectCollection", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "collection_notifications", + "description": "Notifications provided to individuals about data collection. List of CollectionNotification objects from the Ethics module describing how and when individuals were informed about the data collection.", + "slot_uri": "d4d:collectionNotifications", + "range": "CollectionNotification", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "collection_consents", + "description": "Consent obtained from individuals for data collection and use. List of CollectionConsent objects from the Ethics module describing how consent was requested, provided, and documented.", + "slot_uri": "d4d:collectionConsents", + "range": "CollectionConsent", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, + { + "name": "consent_revocations", + "description": "Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects from the Ethics module describing how revocation works and what happens to data after revocation.", + "slot_uri": "d4d:consentRevocations", + "range": "ConsentRevocation", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "missing_data_documentation", "description": "Documentation of missing data patterns and handling strategies.", @@ -7057,7 +7644,7 @@ }, { "name": "raw_data_sources", - "description": "Description of raw data sources before preprocessing.", + "description": "List of raw data sources before preprocessing. Each RawDataSource object describes where the original data came from and how it can be accessed.", "slot_uri": "d4d:rawDataSources", "range": "RawDataSource", "multivalued": true, @@ -7066,6 +7653,7 @@ }, { "name": "ethical_reviews", + "description": "Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the Ethics module describing IRB approvals, ethics committee reviews, and compliance certifications.", "slot_uri": "d4d:ethicalReviews", "range": "EthicalReview", "multivalued": true, @@ -7074,6 +7662,7 @@ }, { "name": "data_protection_impacts", + "description": "Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact objects from the Ethics module documenting privacy risk assessments and mitigation measures.", "slot_uri": "d4d:dataProtectionImpacts", "range": "DataProtectionImpact", "multivalued": true, @@ -7125,6 +7714,7 @@ }, { "name": "preprocessing_strategies", + "description": "Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preprocessing module describing normalization, transformation, and other preparation steps.", "slot_uri": "d4d:preprocessingStrategies", "range": "PreprocessingStrategy", "multivalued": true, @@ -7133,6 +7723,7 @@ }, { "name": "cleaning_strategies", + "description": "Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy objects from the Preprocessing module describing outlier removal, deduplication, and error correction steps.", "slot_uri": "d4d:cleaningStrategies", "range": "CleaningStrategy", "multivalued": true, @@ -7141,6 +7732,7 @@ }, { "name": "labeling_strategies", + "description": "Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the Preprocessing module describing annotation procedures, annotator qualifications, and quality controls.", "slot_uri": "d4d:labelingStrategies", "range": "LabelingStrategy", "multivalued": true, @@ -7149,6 +7741,7 @@ }, { "name": "raw_sources", + "description": "Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the Preprocessing module describing original data sources and their formats.", "slot_uri": "d4d:rawSources", "range": "RawData", "multivalued": true, @@ -7157,7 +7750,7 @@ }, { "name": "imputation_protocols", - "description": "Data imputation methodology and techniques.", + "description": "Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from the Preprocessing module describing the imputation technique, affected variables, and rationale.", "slot_uri": "d4d:imputation_protocols", "range": "ImputationProtocol", "multivalued": true, @@ -7183,6 +7776,7 @@ }, { "name": "existing_uses", + "description": "Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the Uses module describing research, commercial, or other applications of the dataset.", "slot_uri": "d4d:existingUses", "range": "ExistingUse", "multivalued": true, @@ -7191,6 +7785,7 @@ }, { "name": "use_repository", + "description": "Repositories or registries tracking how the dataset has been used. List of UseRepository objects from the Uses module pointing to papers with code, citation indices, or other use-tracking resources.", "slot_uri": "d4d:useRepository", "range": "UseRepository", "multivalued": true, @@ -7199,6 +7794,7 @@ }, { "name": "other_tasks", + "description": "Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from the Uses module describing potential applications not originally planned by the dataset creators.", "slot_uri": "d4d:otherTasks", "range": "OtherTask", "multivalued": true, @@ -7207,6 +7803,7 @@ }, { "name": "future_use_impacts", + "description": "Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects from the Uses module describing foreseeable consequences of using this dataset in new applications.", "slot_uri": "d4d:futureUseImpacts", "range": "FutureUseImpact", "multivalued": true, @@ -7215,6 +7812,7 @@ }, { "name": "discouraged_uses", + "description": "Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List of DiscouragedUse objects from the Uses module explaining why certain applications should be avoided.", "slot_uri": "d4d:discouragedUses", "range": "DiscouragedUse", "multivalued": true, @@ -7241,6 +7839,7 @@ }, { "name": "distribution_formats", + "description": "Formats in which the dataset is distributed or made available. List of DistributionFormat objects from the Distribution module describing file formats, compression, and access methods.", "slot_uri": "d4d:distributionFormats", "range": "DistributionFormat", "multivalued": true, @@ -7249,14 +7848,25 @@ }, { "name": "distribution_dates", + "description": "Dates when the dataset was or will be distributed or released. List of DistributionDate objects from the Distribution module describing initial release dates, version release dates, and planned future releases.", "slot_uri": "d4d:distributionDates", "range": "DistributionDate", "multivalued": true, "inlined_as_list": true, "@type": "SlotDefinition" }, + { + "name": "third_party_sharing", + "description": "Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distribution module describing whether and how the dataset is shared with entities outside the creating organization.", + "slot_uri": "d4d:thirdPartySharing", + "range": "ThirdPartySharing", + "multivalued": true, + "inlined_as_list": true, + "@type": "SlotDefinition" + }, { "name": "license_and_use_terms", + "description": "License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Governance module describing the applicable license, permitted uses, and any restrictions.", "slot_uri": "schema:license", "range": "LicenseAndUseTerms", "inlined": true, @@ -7264,6 +7874,7 @@ }, { "name": "ip_restrictions", + "description": "Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the Data Governance module describing copyright, trademark, or other IP considerations.", "slot_uri": "d4d:ipRestrictions", "range": "IPRestrictions", "inlined": true, @@ -7271,6 +7882,7 @@ }, { "name": "regulatory_restrictions", + "description": "Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestrictions object from the Data Governance module describing compliance requirements such as ITAR, EAR, or GDPR.", "slot_uri": "d4d:regulatoryRestrictions", "range": "ExportControlRegulatoryRestrictions", "inlined": true, @@ -7278,6 +7890,7 @@ }, { "name": "maintainers", + "description": "Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects from the Maintenance module describing maintenance contacts, roles, and support channels.", "slot_uri": "d4d:maintainers", "range": "Maintainer", "multivalued": true, @@ -7286,6 +7899,7 @@ }, { "name": "errata", + "description": "Known errors or corrections to the dataset since publication. List of Erratum objects from the Maintenance module describing discovered errors, affected records, and correction procedures.", "slot_uri": "d4d:errata", "range": "Erratum", "multivalued": true, @@ -7294,6 +7908,7 @@ }, { "name": "updates", + "description": "Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module describing update frequency, versioning policy, and planned enhancements.", "slot_uri": "d4d:updates", "range": "UpdatePlan", "inlined": true, @@ -7301,6 +7916,7 @@ }, { "name": "retention_limit", + "description": "Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance module describing how long the dataset will be available and any deletion schedules.", "slot_uri": "d4d:retentionLimit", "range": "RetentionLimits", "inlined": true, @@ -7308,13 +7924,15 @@ }, { "name": "version_access", - "slot_uri": "dcat:accessURL", + "description": "Information about access to different versions of the dataset. VersionAccess object from the Maintenance module describing where older versions can be found and how version history is maintained.", + "slot_uri": "d4d:versionAccess", "range": "VersionAccess", "inlined": true, "@type": "SlotDefinition" }, { "name": "extension_mechanism", + "description": "Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintenance module describing how others can propose additions, corrections, or expansions.", "slot_uri": "d4d:extensionMechanism", "range": "ExtensionMechanism", "inlined": true, @@ -7334,6 +7952,7 @@ }, { "name": "is_deidentified", + "description": "De-identification status and procedures applied to the dataset. Deidentification object describing whether the dataset contains personal data, what de-identification methods were applied, and any residual re-identification risks.", "slot_uri": "d4d:isDeidentified", "range": "Deidentification", "inlined": true, @@ -7341,7 +7960,8 @@ }, { "name": "is_tabular", - "slot_uri": "schema:encodingFormat", + "description": "Whether the dataset is in tabular format (rows and columns). True if the data is structured as a table (e.g., CSV, TSV, relational database); false for unstructured formats such as images or free text.", + "slot_uri": "d4d:isTabular", "range": "boolean", "@type": "SlotDefinition" }, @@ -7429,11 +8049,17 @@ "dataset__content_warnings", "dataset__subpopulations", "dataset__sensitive_elements", + "dataset__relationships", + "dataset__splits", "dataset__acquisition_methods", "dataset__collection_mechanisms", "dataset__sampling_strategies", "dataset__data_collectors", "dataset__collection_timeframes", + "dataset__direct_collection", + "dataset__collection_notifications", + "dataset__collection_consents", + "dataset__consent_revocations", "dataset__missing_data_documentation", "dataset__raw_data_sources", "dataset__ethical_reviews", @@ -7459,6 +8085,7 @@ "dataset__prohibited_uses", "dataset__distribution_formats", "dataset__distribution_dates", + "dataset__third_party_sharing", "dataset__license_and_use_terms", "dataset__ip_restrictions", "dataset__regulatory_restrictions", @@ -7512,6 +8139,13 @@ "attributes": [ { "name": "id", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/my-dataset-001", + "@type": "Annotation" + } + ], "description": "A unique identifier for a thing.", "slot_uri": "schema:identifier", "identifier": true, @@ -7569,6 +8203,13 @@ "attributes": [ { "name": "id", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/property-001", + "@type": "Annotation" + } + ], "description": "An optional identifier for this property.", "slot_uri": "schema:identifier", "range": "uriorcurie", @@ -7625,20 +8266,23 @@ "attributes": [ { "name": "version", + "description": "The version identifier of the software (e.g., \"1.0.0\", \"2.3.1-beta\").", "slot_uri": "schema:softwareVersion", "range": "string", "@type": "SlotDefinition" }, { "name": "license", + "description": "The license under which the software is distributed (e.g., \"MIT\", \"Apache-2.0\", \"GPL-3.0\").", "slot_uri": "schema:license", "range": "string", "@type": "SlotDefinition" }, { "name": "url", + "description": "URL where the software can be found (e.g., homepage, repository, or documentation).", "slot_uri": "schema:url", - "range": "string", + "range": "uri", "@type": "SlotDefinition" } ], @@ -7681,11 +8325,18 @@ }, { "name": "orcid", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "0000-0001-2345-6789", + "@type": "Annotation" + } + ], "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", - "exact_mappings": [ + "broad_mappings": [ "schema:identifier" ], - "slot_uri": "schema:identifier", + "slot_uri": "d4d:orcidIdentifier", "range": "string", "pattern": "^\\d{4}-\\d{4}-\\d{4}-\\d{3}[0-9X]$", "@type": "SlotDefinition" @@ -7735,7 +8386,7 @@ { "name": "FormatDialect", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/FormatDialect", - "description": "Additional format information for a file", + "description": "Additional format information for a file.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/base", "slots": [ "formatDialect__comment_prefix", @@ -7748,22 +8399,27 @@ "attributes": [ { "name": "comment_prefix", + "description": "Character(s) used to indicate comment lines (e.g., \"#\" for CSV comments).", "@type": "SlotDefinition" }, { "name": "delimiter", + "description": "Field delimiter character (e.g., \",\" for CSV, \"\\t\" for TSV).", "@type": "SlotDefinition" }, { "name": "double_quote", + "description": "Whether quotes within quoted fields are escaped by doubling them. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "@type": "SlotDefinition" }, { "name": "header", + "description": "Whether the first row of the file contains column headers. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "@type": "SlotDefinition" }, { "name": "quote_char", + "description": "Character used for quoting fields (e.g., '\"' for CSV).", "@type": "SlotDefinition" } ], @@ -7788,7 +8444,10 @@ { "name": "response", "description": "Short explanation describing the primary purpose of creating the dataset.", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:questionResponse", "range": "string", "@type": "SlotDefinition" } @@ -7814,7 +8473,10 @@ { "name": "response", "description": "Short explanation describing the specific task or tasks for which this dataset was created.", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:questionResponse", "range": "string", "@type": "SlotDefinition" } @@ -7840,7 +8502,10 @@ { "name": "response", "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:questionResponse", "range": "string", "@type": "SlotDefinition" } @@ -7868,17 +8533,21 @@ { "name": "principal_investigator", "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", - "exact_mappings": [ + "broad_mappings": [ + "dcterms:creator", "schema:creator" ], - "slot_uri": "dcterms:creator", + "slot_uri": "d4d:principalInvestigator", "range": "Person", "@type": "SlotDefinition" }, { "name": "affiliations", "description": "Organizations with which the creator or team is affiliated.", - "slot_uri": "schema:affiliation", + "broad_mappings": [ + "schema:affiliation" + ], + "slot_uri": "d4d:teamAffiliation", "range": "Organization", "multivalued": true, "inlined_as_list": true, @@ -7886,7 +8555,7 @@ }, { "name": "credit_roles", - "description": "Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team. Specifies the specific contributions made to this dataset (e.g., Conceptualization, Data Curation, Methodology). Note: roles are specified here rather than on Person directly, since the same person may have different roles across different datasets.", + "description": "One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team (e.g., Conceptualization, Data Curation, Methodology).", "slot_uri": "d4d:creditRoles", "range": "CRediTRoleEnum", "multivalued": true, @@ -7964,7 +8633,10 @@ { "name": "grant_number", "description": "The alphanumeric identifier for the grant.", - "slot_uri": "schema:identifier", + "broad_mappings": [ + "schema:identifier" + ], + "slot_uri": "d4d:grantIdentifier", "range": "string", "@type": "SlotDefinition" } @@ -8006,8 +8678,11 @@ }, { "name": "instance_type", - "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", - "slot_uri": "dcterms:type", + "description": "The type or types of instances in the dataset (e.g., \"movie\", \"user\", \"rating\", \"clinical record\"). Use when the dataset contains multiple instance types with different structures.\n", + "broad_mappings": [ + "dcterms:type" + ], + "slot_uri": "d4d:instanceType", "range": "string", "@type": "SlotDefinition" }, @@ -8017,12 +8692,19 @@ "values_from": [ "B2AI_SUBSTRATE" ], - "slot_uri": "dcterms:format", + "slot_uri": "dcterms:type", "range": "uriorcurie", "@type": "SlotDefinition" }, { "name": "counts", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "42000 (42,000 patient records)", + "@type": "Annotation" + } + ], "description": "How many instances are there in total (of each type, if appropriate)?\n", "slot_uri": "schema:numberOfItems", "range": "integer", @@ -8038,7 +8720,10 @@ { "name": "label_description", "description": "If labeled, what pattern or format do labels follow?\n", - "slot_uri": "schema:description", + "broad_mappings": [ + "schema:description" + ], + "slot_uri": "d4d:labelPattern", "range": "string", "@type": "SlotDefinition" }, @@ -8088,7 +8773,6 @@ "description": "Indicates whether it is a sample of a larger set.", "slot_uri": "d4d:isSample", "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { @@ -8096,12 +8780,11 @@ "description": "Indicates whether the sample is random.", "slot_uri": "d4d:isRandom", "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "source_data", - "description": "Description of the larger set from which the sample was drawn, if any.\n", + "description": "One or more descriptions of the larger sets from which the sample was drawn, if applicable.\n", "slot_uri": "d4d:sourceData", "range": "string", "multivalued": true, @@ -8112,20 +8795,22 @@ "description": "Indicates whether the sample is representative of the larger set.\n", "slot_uri": "d4d:isRepresentative", "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "representative_verification", - "description": "Explanation of how representativeness was validated or verified.\n", - "slot_uri": "schema:description", + "description": "One or more explanations of how representativeness was validated or verified (e.g., statistical tests, domain expert review).\n", + "broad_mappings": [ + "schema:description" + ], + "slot_uri": "d4d:verificationDescription", "range": "string", "multivalued": true, "@type": "SlotDefinition" }, { "name": "why_not_representative", - "description": "Explanation of why the sample is not representative, if applicable.\n", + "description": "One or more explanations of why the sample is not representative of the larger set, if applicable.\n", "slot_uri": "d4d:whyNotRepresentative", "range": "string", "multivalued": true, @@ -8133,7 +8818,7 @@ }, { "name": "strategies", - "description": "Description of the sampling strategy (deterministic, probabilistic, etc.).\n", + "description": "One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, systematic).\n", "slot_uri": "d4d:strategies", "range": "string", "multivalued": true, @@ -8162,7 +8847,10 @@ { "name": "missing", "description": "Description of the missing data fields or elements.\n", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:missingDataDescription", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8170,7 +8858,10 @@ { "name": "why_missing", "description": "Explanation of why each piece of data is missing.\n", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:missingDataCause", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8196,7 +8887,7 @@ "attributes": [ { "name": "relationship_details", - "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "description": "Free-text description of how relationships between instances are represented (e.g., graph edges, ratings matrices, foreign keys), including relationship types and any associated metadata.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8223,7 +8914,7 @@ "attributes": [ { "name": "split_details", - "description": "Details on recommended data splits and their rationale.\n", + "description": "Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how they are defined, and the rationale for the split strategy.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8250,8 +8941,11 @@ "attributes": [ { "name": "anomaly_details", - "description": "Details on errors, noise sources, or redundancies in the dataset.\n", - "slot_uri": "dcterms:description", + "description": "Free-text description of errors, noise sources, or redundancies in the dataset, including their known causes and estimated prevalence.\n", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:anomalyDetails", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8304,7 +8998,7 @@ }, { "name": "affected_subsets", - "description": "Specific subsets or features of the dataset affected by this bias.\n", + "description": "One or more specific subsets or features of the dataset affected by this bias (e.g., \"female participants\", \"non-English text\", \"images taken at night\").\n", "slot_uri": "d4d:affectedSubsets", "range": "string", "multivalued": true, @@ -8388,23 +9082,28 @@ { "name": "future_guarantees", "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:availabilityGuarantee", "range": "string", "multivalued": true, "@type": "SlotDefinition" }, { "name": "archival", - "description": "Indication whether official archival versions of external resources are included.\n", - "slot_uri": "schema:archivedAt", + "description": "Indicates whether official archival versions of external resources are included in the dataset.\n", + "slot_uri": "d4d:hasArchivalVersion", "range": "boolean", - "multivalued": true, "@type": "SlotDefinition" }, { "name": "restrictions", - "description": "Description of any restrictions or fees associated with external resources.\n", - "slot_uri": "dcterms:accessRights", + "description": "One or more descriptions of restrictions or fees associated with accessing these external resources (e.g., paywalls, registration requirements, API limits).\n", + "broad_mappings": [ + "dcterms:accessRights" + ], + "slot_uri": "d4d:externalResourceRestrictions", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8438,7 +9137,7 @@ }, { "name": "confidentiality_details", - "description": "Details on confidential data elements and handling procedures.\n", + "description": "Free-text description of which data elements are confidential, the basis for confidentiality (e.g., legal privilege, patient data), and how they are handled or restricted.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8473,6 +9172,7 @@ }, { "name": "warnings", + "description": "One or more specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content present in the dataset (e.g., violence, profanity, explicit imagery, hate speech).", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8508,14 +9208,22 @@ }, { "name": "identification", - "slot_uri": "dcterms:description", + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:subpopulationIdentification", "range": "string", "multivalued": true, "@type": "SlotDefinition" }, { "name": "distribution", - "slot_uri": "dcterms:description", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:subpopulationDistribution", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8557,8 +9265,8 @@ }, { "name": "identifiers_removed", - "description": "List of identifier types removed during de-identification.", - "slot_uri": "schema:identifier", + "description": "List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision').", + "slot_uri": "d4d:removedIdentifierTypes", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8632,7 +9340,7 @@ { "name": "target_dataset", "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", - "slot_uri": "schema:identifier", + "slot_uri": "dcterms:relation", "range": "string", "required": true, "@type": "SlotDefinition" @@ -8676,35 +9384,35 @@ "attributes": [ { "name": "was_directly_observed", - "description": "Whether the data was directly observed", + "description": "True if the data was directly observed by a researcher or instrument; false if it was obtained through other means (e.g., reported, inferred).", "slot_uri": "d4d:wasDirectlyObserved", "range": "boolean", "@type": "SlotDefinition" }, { "name": "was_reported_by_subjects", - "description": "Whether the data was reported directly by the subjects themselves", + "description": "True if the data was self-reported directly by the subjects themselves (e.g., survey responses, questionnaires); false otherwise.", "slot_uri": "d4d:wasReportedBySubjects", "range": "boolean", "@type": "SlotDefinition" }, { "name": "was_inferred_derived", - "description": "Whether the data was inferred or derived from other data", + "description": "True if the data was computationally inferred or derived from other data (e.g., model outputs, imputed values); false otherwise.", "slot_uri": "d4d:wasInferred", "range": "boolean", "@type": "SlotDefinition" }, { "name": "was_validated_verified", - "description": "Whether the data was validated or verified in any way", + "description": "True if the data underwent a validation or verification process (e.g., expert review, cross-checking with ground truth); false otherwise.", "slot_uri": "d4d:wasValidated", "range": "boolean", "@type": "SlotDefinition" }, { "name": "acquisition_details", - "description": "Details on how data was acquired for each instance.\n", + "description": "Free-text description of how data was acquired for each instance, including instruments, protocols, and any manual steps involved.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8734,7 +9442,7 @@ "attributes": [ { "name": "mechanism_details", - "description": "Details on mechanisms or procedures used to collect the data.\n", + "description": "Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardware model, software API, manual curation process), including how those mechanisms were validated.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8762,14 +9470,14 @@ "attributes": [ { "name": "role", - "description": "Role of the data collector (e.g., researcher, crowdworker)", + "description": "Role of the data collector (e.g., researcher, crowdworker).", "slot_uri": "schema:roleName", "range": "string", "@type": "SlotDefinition" }, { "name": "collector_details", - "description": "Details on who collected the data and their compensation.\n", + "description": "Free-text description of who was involved in data collection (e.g., students, crowdworkers, contractors), their training or qualifications, and how they were compensated.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8801,21 +9509,21 @@ "attributes": [ { "name": "start_date", - "description": "Start date of data collection", + "description": "Start date of data collection.", "slot_uri": "schema:startDate", "range": "date", "@type": "SlotDefinition" }, { "name": "end_date", - "description": "End date of data collection", + "description": "End date of data collection.", "slot_uri": "schema:endDate", "range": "date", "@type": "SlotDefinition" }, { "name": "timeframe_details", - "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "description": "Free-text description of the data collection period and whether this timeframe matches the creation timeframe of the underlying data (e.g., historical records, prospective collection).\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8843,14 +9551,14 @@ "attributes": [ { "name": "is_direct", - "description": "Whether collection was direct from individuals", + "description": "Whether collection was direct from individuals.", "slot_uri": "d4d:isDirect", "range": "boolean", "@type": "SlotDefinition" }, { "name": "collection_details", - "description": "Details on direct vs. indirect collection methods and sources.\n", + "description": "Free-text description of whether data was collected directly from individuals or obtained via third parties or other indirect sources, and what those sources are.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -8898,7 +9606,7 @@ }, { "name": "handling_strategy", - "description": "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation).\n", + "description": "The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple imputation, flagging with sentinel values).\n", "slot_uri": "d4d:handlingStrategy", "range": "string", "@type": "SlotDefinition" @@ -8938,8 +9646,11 @@ }, { "name": "source_type", - "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", - "slot_uri": "dcterms:type", + "description": "One or more types of raw source (e.g., sensor, database, user input, web scraping).\n", + "broad_mappings": [ + "dcterms:type" + ], + "slot_uri": "d4d:sourceType", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -8953,7 +9664,7 @@ }, { "name": "raw_data_format", - "description": "Format of the raw data before any preprocessing.\n", + "description": "One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON).\n", "slot_uri": "d4d:rawDataFormat", "range": "string", "multivalued": true, @@ -8983,7 +9694,7 @@ "attributes": [ { "name": "preprocessing_details", - "description": "Details on preprocessing steps applied to the data.\n", + "description": "Free-text description of preprocessing steps applied to the data, including tools used, parameters, order of operations, and rationale for each step.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9013,7 +9724,7 @@ "attributes": [ { "name": "cleaning_details", - "description": "Details on data cleaning procedures applied.\n", + "description": "Free-text description of data cleaning procedures applied, including criteria for removing or correcting instances, tools used, and how removed instances are accounted for.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9045,12 +9756,10 @@ "attributes": [ { "name": "data_annotation_platform", - "description": "Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", - "exact_mappings": [ - "rai:dataAnnotationPlatform" - ], - "slot_uri": "schema:instrument", + "description": "One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", + "slot_uri": "rai:dataAnnotationPlatform", "range": "string", + "multivalued": true, "@type": "SlotDefinition" }, { @@ -9066,6 +9775,13 @@ }, { "name": "annotations_per_item", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "3 (three independent annotators per item)", + "@type": "Annotation" + } + ], "description": "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement.", "exact_mappings": [ "rai:annotationsPerItem" @@ -9083,7 +9799,7 @@ }, { "name": "annotator_demographics", - "description": "Demographic information about annotators, if available and relevant (e.g., geographic location, language background, expertise level).", + "description": "One or more demographic characteristics of the annotators, if available and relevant (e.g., geographic location, language background, expertise level, native language).", "exact_mappings": [ "rai:annotatorDemographics" ], @@ -9094,7 +9810,7 @@ }, { "name": "labeling_details", - "description": "Details on labeling/annotation procedures and quality metrics.\n", + "description": "Free-text description of the labeling or annotation procedures, including annotation guidelines, task definitions, and quality control metrics.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9122,14 +9838,21 @@ "attributes": [ { "name": "access_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/raw/raw-data.zip", + "@type": "Annotation" + } + ], "description": "URL or access point for the raw data.", - "slot_uri": "dcat:accessURL", + "slot_uri": "d4d:rawDataAccessURL", "range": "uri", "@type": "SlotDefinition" }, { "name": "raw_data_details", - "description": "Details on raw data availability and access procedures.\n", + "description": "Free-text description of raw data availability, access procedures, and any conditions or restrictions on accessing the raw data.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9281,7 +10004,10 @@ { "name": "tools", "description": "List of automated annotation tools with their versions. Format each entry as \"ToolName version\" (e.g., \"spaCy 3.5.0\", \"NLTK 3.8\", \"GPT-4 turbo\"). Use \"unknown\" for version if not available (e.g., \"Custom NER Model unknown\").\n", - "slot_uri": "schema:name", + "broad_mappings": [ + "schema:name" + ], + "slot_uri": "d4d:toolNames", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -9296,7 +10022,7 @@ }, { "name": "tool_accuracy", - "description": "Known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", + "description": "One or more known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", "slot_uri": "d4d:toolAccuracy", "range": "string", "multivalued": true, @@ -9336,7 +10062,7 @@ { "name": "UseRepository", "definition_uri": "https://w3id.org/bridge2ai/data-sheets-schema/uses#UseRepository", - "description": "Is there a repository that links to any or all papers or systems that use the dataset? If so, provide a link or other access point.\n", + "description": "A repository or registry of known uses of this dataset by third parties. Documents where the dataset has been applied, enabling discoverability of downstream use cases and impact tracking.", "from_schema": "https://w3id.org/bridge2ai/data-sheets-schema/uses", "is_a": "DatasetProperty", "slots": [ @@ -9351,13 +10077,20 @@ "attributes": [ { "name": "repository_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/known-uses", + "@type": "Annotation" + } + ], "description": "URL to a repository of known dataset uses.", "range": "uri", "@type": "SlotDefinition" }, { "name": "repository_details", - "description": "Details on the repository of known dataset uses.\n", + "description": "Free-text description of the repository of known dataset uses, including how it is maintained and how to contribute new use cases.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9384,7 +10117,7 @@ "attributes": [ { "name": "task_details", - "description": "Details on other potential tasks the dataset could be used for.\n", + "description": "Free-text description of other potential tasks the dataset could support, including any prerequisites or limitations for those uses.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9414,7 +10147,7 @@ "attributes": [ { "name": "impact_details", - "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "description": "Free-text description of potential future impacts or risks arising from the dataset's composition or collection (e.g., unfair treatment, privacy violations, legal or financial risks), and any recommended mitigation strategies.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9441,7 +10174,7 @@ "attributes": [ { "name": "discouragement_details", - "description": "Details on tasks for which the dataset should not be used.\n", + "description": "Free-text description of tasks or applications for which the dataset is not recommended, with explanation of why (e.g., out-of-scope, risk of harm, poor coverage).\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9480,13 +10213,13 @@ }, { "name": "usage_notes", - "description": "Notes or caveats about using the dataset for intended purposes.", + "description": "A note or caveat about using the dataset for its intended purposes.", "range": "string", "@type": "SlotDefinition" }, { "name": "use_category", - "description": "Category of intended use (e.g., research, clinical, educational, commercial, policy).", + "description": "One or more categories of intended use (e.g., research, clinical, educational, commercial, policy).", "slot_uri": "d4d:useCategory", "range": "string", "multivalued": true, @@ -9513,7 +10246,7 @@ "attributes": [ { "name": "prohibition_reason", - "description": "Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", + "description": "One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", "slot_uri": "d4d:prohibitionReason", "range": "string", "multivalued": true, @@ -9541,7 +10274,7 @@ { "name": "is_shared", "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", - "slot_uri": "dcterms:accessRights", + "slot_uri": "d4d:isExternallyShared", "range": "boolean", "@type": "SlotDefinition" } @@ -9566,9 +10299,16 @@ "attributes": [ { "name": "access_urls", - "description": "Details of the distribution channel(s) or format(s).", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/download", + "@type": "Annotation" + } + ], + "description": "One or more URLs providing access to the distribution channel(s) or format(s).", "slot_uri": "dcat:accessURL", - "range": "string", + "range": "uri", "multivalued": true, "@type": "SlotDefinition" } @@ -9593,7 +10333,7 @@ "attributes": [ { "name": "release_dates", - "description": "Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases.\n", + "description": "One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., \"2024-03-15\") or as a descriptive string (e.g., \"Q2 2024\"). Use multiple values for staged or scheduled releases.\n", "slot_uri": "dcterms:available", "range": "string", "multivalued": true, @@ -9628,7 +10368,7 @@ }, { "name": "maintainer_details", - "description": "Details on who will support, host, or maintain the dataset.\n", + "description": "Free-text description of the organization, team, or individual responsible for maintaining the dataset, including contact information and hosting arrangements.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9656,14 +10396,21 @@ "attributes": [ { "name": "erratum_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/errata/2024-01-15", + "@type": "Annotation" + } + ], "description": "URL or access point for the erratum.", - "slot_uri": "dcat:accessURL", + "slot_uri": "d4d:erratumURL", "range": "uri", "@type": "SlotDefinition" }, { "name": "erratum_details", - "description": "Details on any errata or corrections to the dataset.\n", + "description": "Free-text description of the error, its scope, the affected data or records, and the correction applied.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9701,7 +10448,7 @@ }, { "name": "update_details", - "description": "Details on update plans, responsible parties, and communication methods.\n", + "description": "Free-text description of planned update types (e.g., corrections, additions, deletions), responsible parties, and how updates will be communicated to users.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9736,7 +10483,7 @@ }, { "name": "retention_details", - "description": "Details on data retention limits and enforcement procedures.\n", + "description": "Free-text description of applicable retention limits, legal or ethical basis for those limits, and how they will be enforced (e.g., automated deletion, anonymization after the retention period).\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9765,9 +10512,9 @@ "attributes": [ { "name": "latest_version_doi", - "description": "DOI or URL of the latest dataset version.", - "slot_uri": "schema:identifier", - "range": "string", + "description": "DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567').", + "slot_uri": "dcterms:hasVersion", + "range": "uriorcurie", "@type": "SlotDefinition" }, { @@ -9780,7 +10527,7 @@ }, { "name": "version_details", - "description": "Details on version support policies and obsolescence communication.\n", + "description": "Free-text description of version support policies, how long older versions will be hosted, and how dataset consumers will be notified when versions become obsolete.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9808,14 +10555,21 @@ "attributes": [ { "name": "contribution_url", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "https://example.org/dataset/contributing", + "@type": "Annotation" + } + ], "description": "URL for contribution guidelines or process.", - "slot_uri": "dcat:landingPage", + "slot_uri": "d4d:contributionURL", "range": "uri", "@type": "SlotDefinition" }, { "name": "extension_details", - "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "description": "Free-text description of how third parties can contribute to the dataset, how contributions are validated (e.g., peer review, automated tests), and how accepted contributions will be communicated to the community.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9845,10 +10599,10 @@ { "name": "contact_person", "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", - "exact_mappings": [ + "broad_mappings": [ "schema:contactPoint" ], - "slot_uri": "schema:contactPoint", + "slot_uri": "d4d:ethicsContactPoint", "range": "Person", "@type": "SlotDefinition" }, @@ -9864,7 +10618,7 @@ }, { "name": "review_details", - "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "description": "Free-text description of the ethical review process, board decisions, outcomes, and any supporting documentation (e.g., IRB approval number, ethics committee name).\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9891,7 +10645,7 @@ "attributes": [ { "name": "impact_details", - "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "description": "Free-text description of the data protection impact analysis, including methodology, privacy risks identified, mitigation measures taken, and any regulatory findings.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9918,7 +10672,7 @@ "attributes": [ { "name": "notification_details", - "description": "Details on how individuals were notified about data collection.\n", + "description": "Free-text description of how individuals were notified about data collection, including the notification method (e.g., email, poster, in-person), timing, and the language or text of the notification itself if available.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9945,7 +10699,7 @@ "attributes": [ { "name": "consent_details", - "description": "Details on how consent was requested, provided, and documented.\n", + "description": "Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, and documented, including the language individuals consented to.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -9972,7 +10726,7 @@ "attributes": [ { "name": "revocation_details", - "description": "Details on consent revocation mechanisms and procedures.\n", + "description": "Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out portal, written request), the scope of revocation (full withdrawal or specific uses), and what happens to their data after revocation.\n", "slot_uri": "dcterms:description", "range": "string", "multivalued": true, @@ -10285,15 +11039,19 @@ "attributes": [ { "name": "license_terms", - "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", - "slot_uri": "dcterms:license", + "description": "Description of the dataset's license and terms of use, including links, costs, or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', 'CC BY-NC-SA 4.0', 'proprietary - contact data@example.org for access').", + "broad_mappings": [ + "dcterms:license", + "dcterms:rights" + ], + "slot_uri": "d4d:licenseDescription", "range": "string", "multivalued": true, "@type": "SlotDefinition" }, { "name": "data_use_permission", - "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO.", "exact_mappings": [ "DUO:0000001" ], @@ -10305,10 +11063,10 @@ { "name": "contact_person", "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", - "exact_mappings": [ + "broad_mappings": [ "schema:contactPoint" ], - "slot_uri": "schema:contactPoint", + "slot_uri": "d4d:licenseContactPoint", "range": "Person", "@type": "SlotDefinition" } @@ -10333,7 +11091,7 @@ "attributes": [ { "name": "restrictions", - "description": "Explanation of third-party IP restrictions.", + "description": "One or more explanations of third-party IP restrictions or associated fees.", "broad_mappings": [ "DUO:0000046", "DUO:0000045" @@ -10368,7 +11126,7 @@ "attributes": [ { "name": "regulatory_restrictions", - "description": "Export or regulatory restrictions on the dataset.", + "description": "One or more export controls or regulatory restrictions applicable to the dataset (e.g., HIPAA, ITAR, GDPR).", "broad_mappings": [ "DUO:0000021", "DUO:0000022", @@ -10404,10 +11162,10 @@ { "name": "governance_committee_contact", "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", - "exact_mappings": [ + "broad_mappings": [ "schema:contactPoint" ], - "slot_uri": "schema:contactPoint", + "slot_uri": "d4d:governanceContactPoint", "range": "Person", "@type": "SlotDefinition" } @@ -10452,10 +11210,11 @@ { "name": "variable_name", "description": "The name or identifier of the variable as it appears in the data files.", - "exact_mappings": [ - "schema:name" + "broad_mappings": [ + "schema:name", + "schema:identifier" ], - "slot_uri": "schema:name", + "slot_uri": "d4d:variableName", "range": "string", "required": true, "@type": "SlotDefinition" @@ -10491,6 +11250,13 @@ }, { "name": "minimum_value", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "0.0", + "@type": "Annotation" + } + ], "description": "The minimum value that the variable can take. Applicable to numeric variables.", "slot_uri": "schema:minValue", "range": "float", @@ -10498,6 +11264,13 @@ }, { "name": "maximum_value", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "100.0", + "@type": "Annotation" + } + ], "description": "The maximum value that the variable can take. Applicable to numeric variables.", "slot_uri": "schema:maxValue", "range": "float", @@ -10505,7 +11278,7 @@ }, { "name": "categories", - "description": "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", + "description": "One or more permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", "slot_uri": "schema:valueReference", "range": "string", "multivalued": true, @@ -10522,7 +11295,7 @@ { "name": "is_identifier", "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", - "slot_uri": "schema:identifier", + "slot_uri": "d4d:isIdentifier", "range": "boolean", "@type": "SlotDefinition" }, @@ -10535,6 +11308,13 @@ }, { "name": "precision", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "2 (two decimal places, e.g., 3.14)", + "@type": "Annotation" + } + ], "description": "The precision or number of decimal places for numeric variables.", "slot_uri": "schema:valuePrecision", "range": "integer", @@ -10557,7 +11337,10 @@ { "name": "quality_notes", "description": "Notes about data quality, reliability, or known issues specific to this variable.", - "slot_uri": "dcterms:description", + "broad_mappings": [ + "dcterms:description" + ], + "slot_uri": "d4d:qualityNotes", "range": "string", "multivalued": true, "@type": "SlotDefinition" @@ -10694,6 +11477,13 @@ }, { "name": "file_count", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "47", + "@type": "Annotation" + } + ], "description": "Number of files in this collection.", "slot_uri": "d4d:fileCount", "range": "integer", @@ -10701,7 +11491,14 @@ }, { "name": "total_bytes", - "description": "Total size of all files in bytes.", + "annotations": [ + { + "tag": "d4d:docExample", + "value": "1073741824 (1 GiB = 1024\u00b3 bytes)", + "@type": "Annotation" + } + ], + "description": "Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize.", "slot_uri": "dcat:byteSize", "range": "integer", "@type": "SlotDefinition" @@ -10713,9 +11510,9 @@ ], "metamodel_version": "1.7.0", "source_file": "data_sheets_schema.yaml", - "source_file_date": "2026-04-07T13:01:39", - "source_file_size": 18558, - "generation_date": "2026-04-07T13:03:27", + "source_file_date": "2026-04-09T00:48:57", + "source_file_size": 31873, + "generation_date": "2026-04-09T10:17:07", "@type": "SchemaDefinition", "@context": [ "project/jsonld/data_sheets_schema.context.jsonld", diff --git a/project/jsonschema/data_sheets_schema.schema.json b/project/jsonschema/data_sheets_schema.schema.json index c97dc721..a6d992cb 100644 --- a/project/jsonschema/data_sheets_schema.schema.json +++ b/project/jsonschema/data_sheets_schema.schema.json @@ -219,7 +219,7 @@ "type": "string" }, "Boolean": { - "description": "", + "description": "Three-valued boolean logic supporting true, false, and unknown states.", "enum": [ "true", "false", @@ -254,7 +254,7 @@ "description": "Was any cleaning of the data done (e.g., removal of instances, processing of missing values)?", "properties": { "cleaning_details": { - "description": "Details on data cleaning procedures applied.\n", + "description": "Free-text description of data cleaning procedures applied, including criteria for removing or correcting instances, tools used, and how removed instances are accounted for.\n", "items": { "type": "string" }, @@ -303,7 +303,7 @@ "description": "Did the individuals in question consent to the collection and use of their data? If so, how was consent requested and provided, and what language did individuals consent to?", "properties": { "consent_details": { - "description": "Details on how consent was requested, provided, and documented.\n", + "description": "Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, and documented, including the language individuals consented to.\n", "items": { "type": "string" }, @@ -366,7 +366,7 @@ ] }, "mechanism_details": { - "description": "Details on mechanisms or procedures used to collect the data.\n", + "description": "Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardware model, software API, manual curation process), including how those mechanisms were validated.\n", "items": { "type": "string" }, @@ -422,7 +422,7 @@ ] }, "notification_details": { - "description": "Details on how individuals were notified about data collection.\n", + "description": "Free-text description of how individuals were notified about data collection, including the notification method (e.g., email, poster, in-person), timing, and the language or text of the notification itself if available.\n", "items": { "type": "string" }, @@ -457,7 +457,7 @@ ] }, "end_date": { - "description": "End date of data collection", + "description": "End date of data collection.", "format": "date", "type": [ "string", @@ -479,7 +479,7 @@ ] }, "start_date": { - "description": "Start date of data collection", + "description": "Start date of data collection.", "format": "date", "type": [ "string", @@ -487,7 +487,7 @@ ] }, "timeframe_details": { - "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "description": "Free-text description of the data collection period and whether this timeframe matches the creation timeframe of the underlying data (e.g., historical records, prospective collection).\n", "items": { "type": "string" }, @@ -523,7 +523,7 @@ "type": "string" }, "CompressionEnum": { - "description": "", + "description": "Compression algorithms and formats for file compression.", "enum": [ "gzip", "bzip2", @@ -548,7 +548,7 @@ ] }, "confidentiality_details": { - "description": "Details on confidential data elements and handling procedures.\n", + "description": "Free-text description of which data elements are confidential, the basis for confidentiality (e.g., legal privilege, patient data), and how they are handled or restricted.\n", "items": { "type": "string" }, @@ -628,7 +628,7 @@ ] }, "revocation_details": { - "description": "Details on consent revocation mechanisms and procedures.\n", + "description": "Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out portal, written request), the scope of revocation (full withdrawal or specific uses), and what happens to their data after revocation.\n", "items": { "type": "string" }, @@ -694,6 +694,7 @@ ] }, "warnings": { + "description": "One or more specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content present in the dataset (e.g., violence, profanity, explicit imagery, hate speech).", "items": { "type": "string" }, @@ -721,7 +722,7 @@ ] }, "credit_roles": { - "description": "Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team. Specifies the specific contributions made to this dataset (e.g., Conceptualization, Data Curation, Methodology). Note: roles are specified here rather than on Person directly, since the same person may have different roles across different datasets.", + "description": "One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team (e.g., Conceptualization, Data Curation, Methodology).", "items": { "$ref": "#/$defs/CRediTRoleEnum" }, @@ -795,7 +796,7 @@ "description": "Are there any errors, sources of noise, or redundancies in the dataset?", "properties": { "anomaly_details": { - "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "description": "Free-text description of errors, noise sources, or redundancies in the dataset, including their known causes and estimated prevalence.\n", "items": { "type": "string" }, @@ -844,7 +845,7 @@ "description": "Who was involved in the data collection (e.g., students, crowdworkers, contractors), and how they were compensated.", "properties": { "collector_details": { - "description": "Details on who collected the data and their compensation.\n", + "description": "Free-text description of who was involved in data collection (e.g., students, crowdworkers, contractors), their training or qualifications, and how they were compensated.\n", "items": { "type": "string" }, @@ -875,7 +876,7 @@ ] }, "role": { - "description": "Role of the data collector (e.g., researcher, crowdworker)", + "description": "Role of the data collector (e.g., researcher, crowdworker).", "type": [ "string", "null" @@ -914,7 +915,7 @@ ] }, "impact_details": { - "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "description": "Free-text description of the data protection impact analysis, including methodology, privacy risks identified, mitigation measures taken, and any regulatory findings.\n", "items": { "type": "string" }, @@ -949,6 +950,7 @@ "description": "A subset of a dataset, likely containing multiple files of multiple potential purposes and properties.", "properties": { "acquisition_methods": { + "description": "Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Collection module describing how data was sourced, whether directly observed or derived.", "items": { "$ref": "#/$defs/InstanceAcquisition" }, @@ -958,6 +960,7 @@ ] }, "addressing_gaps": { + "description": "Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation module, each describing a gap in existing datasets or knowledge that this dataset fills.", "items": { "$ref": "#/$defs/AddressingGap" }, @@ -977,6 +980,7 @@ ] }, "anomalies": { + "description": "Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects from the Composition module, each documenting a specific anomaly and its potential impact.", "items": { "$ref": "#/$defs/DataAnomaly" }, @@ -1004,6 +1008,7 @@ ] }, "cleaning_strategies": { + "description": "Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy objects from the Preprocessing module describing outlier removal, deduplication, and error correction steps.", "items": { "$ref": "#/$defs/CleaningStrategy" }, @@ -1012,7 +1017,18 @@ "null" ] }, + "collection_consents": { + "description": "Consent obtained from individuals for data collection and use. List of CollectionConsent objects from the Ethics module describing how consent was requested, provided, and documented.", + "items": { + "$ref": "#/$defs/CollectionConsent" + }, + "type": [ + "array", + "null" + ] + }, "collection_mechanisms": { + "description": "Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from the Collection module describing sensors, surveys, APIs, or other collection instruments.", "items": { "$ref": "#/$defs/CollectionMechanism" }, @@ -1021,7 +1037,18 @@ "null" ] }, + "collection_notifications": { + "description": "Notifications provided to individuals about data collection. List of CollectionNotification objects from the Ethics module describing how and when individuals were informed about the data collection.", + "items": { + "$ref": "#/$defs/CollectionNotification" + }, + "type": [ + "array", + "null" + ] + }, "collection_timeframes": { + "description": "Time periods during which data was collected. List of CollectionTimeframe objects from the Collection module describing collection start and end dates, and any gaps in the collection period.", "items": { "$ref": "#/$defs/CollectionTimeframe" }, @@ -1032,9 +1059,10 @@ }, "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "confidential_elements": { + "description": "Confidential or restricted information within the dataset that requires access controls. List of Confidentiality objects describing what is confidential and why it cannot be released.", "items": { "$ref": "#/$defs/Confidentiality" }, @@ -1044,24 +1072,38 @@ ] }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, + "consent_revocations": { + "description": "Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects from the Ethics module describing how revocation works and what happens to data after revocation.", + "items": { + "$ref": "#/$defs/ConsentRevocation" + }, + "type": [ + "array", + "null" + ] + }, "content_warnings": { + "description": "Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ContentWarning objects alerting users to sensitive content categories.", "items": { "$ref": "#/$defs/ContentWarning" }, @@ -1071,12 +1113,14 @@ ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -1084,6 +1128,7 @@ ] }, "creators": { + "description": "Individuals or organizations who created the dataset. List of Creator objects describing authorship, roles, and affiliations of dataset creators.", "items": { "$ref": "#/$defs/Creator" }, @@ -1093,6 +1138,7 @@ ] }, "data_collectors": { + "description": "Individuals or organizations responsible for collecting the data. List of DataCollector objects from the Collection module describing who performed data collection and their roles.", "items": { "$ref": "#/$defs/DataCollector" }, @@ -1102,6 +1148,7 @@ ] }, "data_protection_impacts": { + "description": "Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact objects from the Ethics module documenting privacy risk assessments and mitigation measures.", "items": { "$ref": "#/$defs/DataProtectionImpact" }, @@ -1117,7 +1164,18 @@ "null" ] }, + "direct_collection": { + "description": "Whether data was collected directly from individuals or via third parties. List of DirectCollection objects from the Collection module describing direct vs. indirect collection methods and sources.", + "items": { + "$ref": "#/$defs/DirectCollection" + }, + "type": [ + "array", + "null" + ] + }, "discouraged_uses": { + "description": "Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List of DiscouragedUse objects from the Uses module explaining why certain applications should be avoided.", "items": { "$ref": "#/$defs/DiscouragedUse" }, @@ -1127,6 +1185,7 @@ ] }, "distribution_dates": { + "description": "Dates when the dataset was or will be distributed or released. List of DistributionDate objects from the Distribution module describing initial release dates, version release dates, and planned future releases.", "items": { "$ref": "#/$defs/DistributionDate" }, @@ -1136,6 +1195,7 @@ ] }, "distribution_formats": { + "description": "Formats in which the dataset is distributed or made available. List of DistributionFormat objects from the Distribution module describing file formats, compression, and access methods.", "items": { "$ref": "#/$defs/DistributionFormat" }, @@ -1145,7 +1205,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -1160,6 +1220,7 @@ ] }, "errata": { + "description": "Known errors or corrections to the dataset since publication. List of Erratum objects from the Maintenance module describing discovered errors, affected records, and correction procedures.", "items": { "$ref": "#/$defs/Erratum" }, @@ -1169,6 +1230,7 @@ ] }, "ethical_reviews": { + "description": "Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the Ethics module describing IRB approvals, ethics committee reviews, and compliance certifications.", "items": { "$ref": "#/$defs/EthicalReview" }, @@ -1178,6 +1240,7 @@ ] }, "existing_uses": { + "description": "Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the Uses module describing research, commercial, or other applications of the dataset.", "items": { "$ref": "#/$defs/ExistingUse" }, @@ -1194,7 +1257,8 @@ { "type": "null" } - ] + ], + "description": "Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintenance module describing how others can propose additions, corrections, or expansions." }, "external_resources": { "description": "External resources referenced at the dataset level (e.g., related publications, repositories, documentation). For file-level external resources, use FileCollection.external_resources.", @@ -1217,6 +1281,7 @@ ] }, "funders": { + "description": "Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing grants, contracts, or other funding sources including grantors and grant identifiers.", "items": { "$ref": "#/$defs/FundingMechanism" }, @@ -1226,6 +1291,7 @@ ] }, "future_use_impacts": { + "description": "Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects from the Uses module describing foreseeable consequences of using this dataset in new applications.", "items": { "$ref": "#/$defs/FutureUseImpact" }, @@ -1250,7 +1316,7 @@ "type": "string" }, "imputation_protocols": { - "description": "Data imputation methodology and techniques.", + "description": "Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from the Preprocessing module describing the imputation technique, affected variables, and rationale.", "items": { "$ref": "#/$defs/ImputationProtocol" }, @@ -1270,6 +1336,7 @@ ] }, "instances": { + "description": "Individual data instances or records in the dataset. List of Instance objects from the Composition module describing what each data point represents, its type, and associated label information.", "items": { "$ref": "#/$defs/Instance" }, @@ -1296,7 +1363,8 @@ { "type": "null" } - ] + ], + "description": "Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the Data Governance module describing copyright, trademark, or other IP considerations." }, "is_data_split": { "description": "Is this subset a split of the larger dataset, e.g., is it a set for model training, testing, or validation?", @@ -1313,7 +1381,8 @@ { "type": "null" } - ] + ], + "description": "De-identification status and procedures applied to the dataset. Deidentification object describing whether the dataset contains personal data, what de-identification methods were applied, and any residual re-identification risks." }, "is_subpopulation": { "description": "Is this subset a subpopulation of the larger dataset, e.g., is it a set of data for a specific demographic?", @@ -1323,12 +1392,14 @@ ] }, "is_tabular": { + "description": "Whether the dataset is in tabular format (rows and columns). True if the data is structured as a table (e.g., CSV, TSV, relational database); false for unstructured formats such as images or free text.", "type": [ "boolean", "null" ] }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -1336,6 +1407,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -1365,6 +1437,7 @@ ] }, "labeling_strategies": { + "description": "Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the Preprocessing module describing annotation procedures, annotator qualifications, and quality controls.", "items": { "$ref": "#/$defs/LabelingStrategy" }, @@ -1374,13 +1447,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -1388,6 +1462,7 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" @@ -1401,7 +1476,8 @@ { "type": "null" } - ] + ], + "description": "License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Governance module describing the applicable license, permitted uses, and any restrictions." }, "machine_annotation_tools": { "description": "Automated annotation tools used in dataset creation.", @@ -1414,6 +1490,7 @@ ] }, "maintainers": { + "description": "Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects from the Maintenance module describing maintenance contacts, roles, and support channels.", "items": { "$ref": "#/$defs/Maintainer" }, @@ -1433,6 +1510,7 @@ ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -1446,6 +1524,7 @@ ] }, "other_tasks": { + "description": "Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from the Uses module describing potential applications not originally planned by the dataset creators.", "items": { "$ref": "#/$defs/OtherTask" }, @@ -1455,6 +1534,7 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" @@ -1491,6 +1571,7 @@ ] }, "preprocessing_strategies": { + "description": "Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preprocessing module describing normalization, transformation, and other preparation steps.", "items": { "$ref": "#/$defs/PreprocessingStrategy" }, @@ -1510,12 +1591,14 @@ ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" ] }, "purposes": { + "description": "Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each describing a specific creation goal or intended application.", "items": { "$ref": "#/$defs/Purpose" }, @@ -1525,7 +1608,7 @@ ] }, "raw_data_sources": { - "description": "Description of raw data sources before preprocessing.", + "description": "List of raw data sources before preprocessing. Each RawDataSource object describes where the original data came from and how it can be accessed.", "items": { "$ref": "#/$defs/RawDataSource" }, @@ -1535,6 +1618,7 @@ ] }, "raw_sources": { + "description": "Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the Preprocessing module describing original data sources and their formats.", "items": { "$ref": "#/$defs/RawData" }, @@ -1551,7 +1635,8 @@ { "type": "null" } - ] + ], + "description": "Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestrictions object from the Data Governance module describing compliance requirements such as ITAR, EAR, or GDPR." }, "related_datasets": { "description": "Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use DatasetRelationship class to specify relationship types.", @@ -1563,6 +1648,16 @@ "null" ] }, + "relationships": { + "description": "Explicit relationships between individual instances in the dataset. List of Relationships objects from the Composition module describing how instances relate (e.g., graph edges, ratings, social network links).", + "items": { + "$ref": "#/$defs/Relationships" + }, + "type": [ + "array", + "null" + ] + }, "resources": { "description": "Sub-resources or component datasets that are part of this dataset. Note: For file collections, use the file_collections attribute instead.", "items": { @@ -1581,9 +1676,11 @@ { "type": "null" } - ] + ], + "description": "Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance module describing how long the dataset will be available and any deletion schedules." }, "sampling_strategies": { + "description": "Strategies used to select data instances from a larger population. List of SamplingStrategy objects from the Collection module describing sampling methodology, inclusion criteria, and limitations.", "items": { "$ref": "#/$defs/SamplingStrategy" }, @@ -1593,6 +1690,7 @@ ] }, "sensitive_elements": { + "description": "Sensitive data elements requiring special handling or access controls. List of SensitiveElement objects identifying sensitive attributes such as personal identifiers, protected health information, or legally sensitive content.", "items": { "$ref": "#/$defs/SensitiveElement" }, @@ -1601,13 +1699,25 @@ "null" ] }, + "splits": { + "description": "Recommended data splits for this dataset. List of Splits objects from the Composition module describing train/validation/test partitions and the rationale for each split strategy.", + "items": { + "$ref": "#/$defs/Splits" + }, + "type": [ + "array", + "null" + ] + }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "subpopulations": { + "description": "Subpopulations represented within the dataset. List of Subpopulation objects from the Composition module describing demographic or other groups, their representation, and any imbalances.", "items": { "$ref": "#/$defs/Subpopulation" }, @@ -1617,6 +1727,7 @@ ] }, "subsets": { + "description": "Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each representing a logical partition such as training, validation, or test splits, or demographic subgroups.", "items": { "$ref": "#/$defs/DataSubset" }, @@ -1626,6 +1737,7 @@ ] }, "tasks": { + "description": "Tasks the dataset is intended to support. List of Task objects from the Motivation module describing specific machine learning, research, or analytical tasks.", "items": { "$ref": "#/$defs/Task" }, @@ -1634,8 +1746,18 @@ "null" ] }, + "third_party_sharing": { + "description": "Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distribution module describing whether and how the dataset is shared with entities outside the creating organization.", + "items": { + "$ref": "#/$defs/ThirdPartySharing" + }, + "type": [ + "array", + "null" + ] + }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" @@ -1663,9 +1785,11 @@ { "type": "null" } - ] + ], + "description": "Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module describing update frequency, versioning policy, and planned enhancements." }, "use_repository": { + "description": "Repositories or registries tracking how the dataset has been used. List of UseRepository objects from the Uses module pointing to papers with code, citation indices, or other use-tracking resources.", "items": { "$ref": "#/$defs/UseRepository" }, @@ -1685,6 +1809,7 @@ ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" @@ -1698,9 +1823,11 @@ { "type": "null" } - ] + ], + "description": "Information about access to different versions of the dataset. VersionAccess object from the Maintenance module describing where older versions can be found and how version history is maintained." }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -1747,6 +1874,7 @@ "description": "A single component of related observations and/or information that can be read, manipulated, transformed, and otherwise interpreted.", "properties": { "acquisition_methods": { + "description": "Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Collection module describing how data was sourced, whether directly observed or derived.", "items": { "$ref": "#/$defs/InstanceAcquisition" }, @@ -1756,6 +1884,7 @@ ] }, "addressing_gaps": { + "description": "Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation module, each describing a gap in existing datasets or knowledge that this dataset fills.", "items": { "$ref": "#/$defs/AddressingGap" }, @@ -1775,6 +1904,7 @@ ] }, "anomalies": { + "description": "Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects from the Composition module, each documenting a specific anomaly and its potential impact.", "items": { "$ref": "#/$defs/DataAnomaly" }, @@ -1802,6 +1932,7 @@ ] }, "cleaning_strategies": { + "description": "Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy objects from the Preprocessing module describing outlier removal, deduplication, and error correction steps.", "items": { "$ref": "#/$defs/CleaningStrategy" }, @@ -1810,7 +1941,18 @@ "null" ] }, + "collection_consents": { + "description": "Consent obtained from individuals for data collection and use. List of CollectionConsent objects from the Ethics module describing how consent was requested, provided, and documented.", + "items": { + "$ref": "#/$defs/CollectionConsent" + }, + "type": [ + "array", + "null" + ] + }, "collection_mechanisms": { + "description": "Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from the Collection module describing sensors, surveys, APIs, or other collection instruments.", "items": { "$ref": "#/$defs/CollectionMechanism" }, @@ -1819,7 +1961,18 @@ "null" ] }, + "collection_notifications": { + "description": "Notifications provided to individuals about data collection. List of CollectionNotification objects from the Ethics module describing how and when individuals were informed about the data collection.", + "items": { + "$ref": "#/$defs/CollectionNotification" + }, + "type": [ + "array", + "null" + ] + }, "collection_timeframes": { + "description": "Time periods during which data was collected. List of CollectionTimeframe objects from the Collection module describing collection start and end dates, and any gaps in the collection period.", "items": { "$ref": "#/$defs/CollectionTimeframe" }, @@ -1830,9 +1983,10 @@ }, "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "confidential_elements": { + "description": "Confidential or restricted information within the dataset that requires access controls. List of Confidentiality objects describing what is confidential and why it cannot be released.", "items": { "$ref": "#/$defs/Confidentiality" }, @@ -1842,24 +1996,38 @@ ] }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, + "consent_revocations": { + "description": "Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects from the Ethics module describing how revocation works and what happens to data after revocation.", + "items": { + "$ref": "#/$defs/ConsentRevocation" + }, + "type": [ + "array", + "null" + ] + }, "content_warnings": { + "description": "Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ContentWarning objects alerting users to sensitive content categories.", "items": { "$ref": "#/$defs/ContentWarning" }, @@ -1869,12 +2037,14 @@ ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -1882,6 +2052,7 @@ ] }, "creators": { + "description": "Individuals or organizations who created the dataset. List of Creator objects describing authorship, roles, and affiliations of dataset creators.", "items": { "$ref": "#/$defs/Creator" }, @@ -1891,6 +2062,7 @@ ] }, "data_collectors": { + "description": "Individuals or organizations responsible for collecting the data. List of DataCollector objects from the Collection module describing who performed data collection and their roles.", "items": { "$ref": "#/$defs/DataCollector" }, @@ -1900,6 +2072,7 @@ ] }, "data_protection_impacts": { + "description": "Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact objects from the Ethics module documenting privacy risk assessments and mitigation measures.", "items": { "$ref": "#/$defs/DataProtectionImpact" }, @@ -1915,7 +2088,18 @@ "null" ] }, + "direct_collection": { + "description": "Whether data was collected directly from individuals or via third parties. List of DirectCollection objects from the Collection module describing direct vs. indirect collection methods and sources.", + "items": { + "$ref": "#/$defs/DirectCollection" + }, + "type": [ + "array", + "null" + ] + }, "discouraged_uses": { + "description": "Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List of DiscouragedUse objects from the Uses module explaining why certain applications should be avoided.", "items": { "$ref": "#/$defs/DiscouragedUse" }, @@ -1925,6 +2109,7 @@ ] }, "distribution_dates": { + "description": "Dates when the dataset was or will be distributed or released. List of DistributionDate objects from the Distribution module describing initial release dates, version release dates, and planned future releases.", "items": { "$ref": "#/$defs/DistributionDate" }, @@ -1934,6 +2119,7 @@ ] }, "distribution_formats": { + "description": "Formats in which the dataset is distributed or made available. List of DistributionFormat objects from the Distribution module describing file formats, compression, and access methods.", "items": { "$ref": "#/$defs/DistributionFormat" }, @@ -1943,7 +2129,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -1958,6 +2144,7 @@ ] }, "errata": { + "description": "Known errors or corrections to the dataset since publication. List of Erratum objects from the Maintenance module describing discovered errors, affected records, and correction procedures.", "items": { "$ref": "#/$defs/Erratum" }, @@ -1967,6 +2154,7 @@ ] }, "ethical_reviews": { + "description": "Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the Ethics module describing IRB approvals, ethics committee reviews, and compliance certifications.", "items": { "$ref": "#/$defs/EthicalReview" }, @@ -1976,6 +2164,7 @@ ] }, "existing_uses": { + "description": "Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the Uses module describing research, commercial, or other applications of the dataset.", "items": { "$ref": "#/$defs/ExistingUse" }, @@ -1992,7 +2181,8 @@ { "type": "null" } - ] + ], + "description": "Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintenance module describing how others can propose additions, corrections, or expansions." }, "external_resources": { "description": "External resources referenced at the dataset level (e.g., related publications, repositories, documentation). For file-level external resources, use FileCollection.external_resources.", @@ -2015,6 +2205,7 @@ ] }, "funders": { + "description": "Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing grants, contracts, or other funding sources including grantors and grant identifiers.", "items": { "$ref": "#/$defs/FundingMechanism" }, @@ -2024,6 +2215,7 @@ ] }, "future_use_impacts": { + "description": "Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects from the Uses module describing foreseeable consequences of using this dataset in new applications.", "items": { "$ref": "#/$defs/FutureUseImpact" }, @@ -2048,7 +2240,7 @@ "type": "string" }, "imputation_protocols": { - "description": "Data imputation methodology and techniques.", + "description": "Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from the Preprocessing module describing the imputation technique, affected variables, and rationale.", "items": { "$ref": "#/$defs/ImputationProtocol" }, @@ -2068,6 +2260,7 @@ ] }, "instances": { + "description": "Individual data instances or records in the dataset. List of Instance objects from the Composition module describing what each data point represents, its type, and associated label information.", "items": { "$ref": "#/$defs/Instance" }, @@ -2094,7 +2287,8 @@ { "type": "null" } - ] + ], + "description": "Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the Data Governance module describing copyright, trademark, or other IP considerations." }, "is_deidentified": { "anyOf": [ @@ -2104,15 +2298,18 @@ { "type": "null" } - ] + ], + "description": "De-identification status and procedures applied to the dataset. Deidentification object describing whether the dataset contains personal data, what de-identification methods were applied, and any residual re-identification risks." }, "is_tabular": { + "description": "Whether the dataset is in tabular format (rows and columns). True if the data is structured as a table (e.g., CSV, TSV, relational database); false for unstructured formats such as images or free text.", "type": [ "boolean", "null" ] }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -2120,6 +2317,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -2149,6 +2347,7 @@ ] }, "labeling_strategies": { + "description": "Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the Preprocessing module describing annotation procedures, annotator qualifications, and quality controls.", "items": { "$ref": "#/$defs/LabelingStrategy" }, @@ -2158,13 +2357,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -2172,6 +2372,7 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" @@ -2185,7 +2386,8 @@ { "type": "null" } - ] + ], + "description": "License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Governance module describing the applicable license, permitted uses, and any restrictions." }, "machine_annotation_tools": { "description": "Automated annotation tools used in dataset creation.", @@ -2198,6 +2400,7 @@ ] }, "maintainers": { + "description": "Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects from the Maintenance module describing maintenance contacts, roles, and support channels.", "items": { "$ref": "#/$defs/Maintainer" }, @@ -2217,6 +2420,7 @@ ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -2230,6 +2434,7 @@ ] }, "other_tasks": { + "description": "Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from the Uses module describing potential applications not originally planned by the dataset creators.", "items": { "$ref": "#/$defs/OtherTask" }, @@ -2239,6 +2444,7 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" @@ -2275,6 +2481,7 @@ ] }, "preprocessing_strategies": { + "description": "Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preprocessing module describing normalization, transformation, and other preparation steps.", "items": { "$ref": "#/$defs/PreprocessingStrategy" }, @@ -2294,12 +2501,14 @@ ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" ] }, "purposes": { + "description": "Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each describing a specific creation goal or intended application.", "items": { "$ref": "#/$defs/Purpose" }, @@ -2309,7 +2518,7 @@ ] }, "raw_data_sources": { - "description": "Description of raw data sources before preprocessing.", + "description": "List of raw data sources before preprocessing. Each RawDataSource object describes where the original data came from and how it can be accessed.", "items": { "$ref": "#/$defs/RawDataSource" }, @@ -2319,6 +2528,7 @@ ] }, "raw_sources": { + "description": "Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the Preprocessing module describing original data sources and their formats.", "items": { "$ref": "#/$defs/RawData" }, @@ -2335,7 +2545,8 @@ { "type": "null" } - ] + ], + "description": "Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestrictions object from the Data Governance module describing compliance requirements such as ITAR, EAR, or GDPR." }, "related_datasets": { "description": "Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use DatasetRelationship class to specify relationship types.", @@ -2347,6 +2558,16 @@ "null" ] }, + "relationships": { + "description": "Explicit relationships between individual instances in the dataset. List of Relationships objects from the Composition module describing how instances relate (e.g., graph edges, ratings, social network links).", + "items": { + "$ref": "#/$defs/Relationships" + }, + "type": [ + "array", + "null" + ] + }, "resources": { "description": "Sub-resources or component datasets that are part of this dataset. Note: For file collections, use the file_collections attribute instead.", "items": { @@ -2365,9 +2586,11 @@ { "type": "null" } - ] + ], + "description": "Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance module describing how long the dataset will be available and any deletion schedules." }, "sampling_strategies": { + "description": "Strategies used to select data instances from a larger population. List of SamplingStrategy objects from the Collection module describing sampling methodology, inclusion criteria, and limitations.", "items": { "$ref": "#/$defs/SamplingStrategy" }, @@ -2377,6 +2600,7 @@ ] }, "sensitive_elements": { + "description": "Sensitive data elements requiring special handling or access controls. List of SensitiveElement objects identifying sensitive attributes such as personal identifiers, protected health information, or legally sensitive content.", "items": { "$ref": "#/$defs/SensitiveElement" }, @@ -2385,13 +2609,25 @@ "null" ] }, + "splits": { + "description": "Recommended data splits for this dataset. List of Splits objects from the Composition module describing train/validation/test partitions and the rationale for each split strategy.", + "items": { + "$ref": "#/$defs/Splits" + }, + "type": [ + "array", + "null" + ] + }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "subpopulations": { + "description": "Subpopulations represented within the dataset. List of Subpopulation objects from the Composition module describing demographic or other groups, their representation, and any imbalances.", "items": { "$ref": "#/$defs/Subpopulation" }, @@ -2401,6 +2637,7 @@ ] }, "subsets": { + "description": "Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each representing a logical partition such as training, validation, or test splits, or demographic subgroups.", "items": { "$ref": "#/$defs/DataSubset" }, @@ -2410,6 +2647,7 @@ ] }, "tasks": { + "description": "Tasks the dataset is intended to support. List of Task objects from the Motivation module describing specific machine learning, research, or analytical tasks.", "items": { "$ref": "#/$defs/Task" }, @@ -2418,8 +2656,18 @@ "null" ] }, + "third_party_sharing": { + "description": "Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distribution module describing whether and how the dataset is shared with entities outside the creating organization.", + "items": { + "$ref": "#/$defs/ThirdPartySharing" + }, + "type": [ + "array", + "null" + ] + }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" @@ -2447,9 +2695,11 @@ { "type": "null" } - ] + ], + "description": "Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module describing update frequency, versioning policy, and planned enhancements." }, "use_repository": { + "description": "Repositories or registries tracking how the dataset has been used. List of UseRepository objects from the Uses module pointing to papers with code, citation indices, or other use-tracking resources.", "items": { "$ref": "#/$defs/UseRepository" }, @@ -2469,6 +2719,7 @@ ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" @@ -2482,9 +2733,11 @@ { "type": "null" } - ] + ], + "description": "Information about access to different versions of the dataset. VersionAccess object from the Maintenance module describing where older versions can be found and how version history is maintained." }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -2502,7 +2755,7 @@ "description": "Documents known biases present in the dataset. Biases are systematic errors or prejudices that may affect the representativeness or fairness of the data. Distinct from anomalies (data quality issues) and limitations (scope constraints).", "properties": { "affected_subsets": { - "description": "Specific subsets or features of the dataset affected by this bias.\n", + "description": "One or more specific subsets or features of the dataset affected by this bias (e.g., \"female participants\", \"non-English text\", \"images taken at night\").\n", "items": { "type": "string" }, @@ -2570,33 +2823,38 @@ "properties": { "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -2611,7 +2869,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -2630,6 +2888,7 @@ "type": "string" }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -2637,6 +2896,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -2646,13 +2906,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -2660,12 +2921,14 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -2679,12 +2942,14 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" @@ -2701,25 +2966,28 @@ ] }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" ] }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -2943,7 +3211,7 @@ ] }, "identifiers_removed": { - "description": "List of identifier types removed during de-identification.", + "description": "List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision').", "items": { "type": "string" }, @@ -2985,7 +3253,7 @@ "description": "Indicates whether the data was collected directly from the individuals in question or obtained via third parties/other sources.", "properties": { "collection_details": { - "description": "Details on direct vs. indirect collection methods and sources.\n", + "description": "Free-text description of whether data was collected directly from individuals or obtained via third parties or other indirect sources, and what those sources are.\n", "items": { "type": "string" }, @@ -3009,7 +3277,7 @@ ] }, "is_direct": { - "description": "Whether collection was direct from individuals", + "description": "Whether collection was direct from individuals.", "type": [ "boolean", "null" @@ -3048,7 +3316,7 @@ ] }, "discouragement_details": { - "description": "Details on tasks for which the dataset should not be used.\n", + "description": "Free-text description of tasks or applications for which the dataset is not recommended, with explanation of why (e.g., out-of-scope, risk of harm, poor coverage).\n", "items": { "type": "string" }, @@ -3111,7 +3379,7 @@ ] }, "release_dates": { - "description": "Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases.\n", + "description": "One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., \"2024-03-15\") or as a descriptive string (e.g., \"Q2 2024\"). Use multiple values for staged or scheduled releases.\n", "items": { "type": "string" }, @@ -3139,7 +3407,7 @@ "description": "How will the dataset be distributed (e.g., tarball on a website, API, GitHub)?", "properties": { "access_urls": { - "description": "Details of the distribution channel(s) or format(s).", + "description": "One or more URLs providing access to the distribution channel(s) or format(s).", "items": { "type": "string" }, @@ -3184,7 +3452,7 @@ "type": "object" }, "EncodingEnum": { - "description": "", + "description": "Character encoding schemes for text representation in different languages and scripts.", "enum": [ "ASCII", "Big5", @@ -3219,7 +3487,16 @@ "UTF-16", "UTF-32", "UTF-7", - "UTF-8" + "UTF-8", + "Windows-1250", + "Windows-1251", + "Windows-1252", + "Windows-1253", + "Windows-1254", + "Windows-1255", + "Windows-1256", + "Windows-1257", + "Windows-1258" ], "title": "EncodingEnum", "type": "string" @@ -3236,7 +3513,7 @@ ] }, "erratum_details": { - "description": "Details on any errata or corrections to the dataset.\n", + "description": "Free-text description of the error, its scope, the affected data or records, and the correction applied.\n", "items": { "type": "string" }, @@ -3313,7 +3590,7 @@ ] }, "review_details": { - "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "description": "Free-text description of the ethical review process, board decisions, outcomes, and any supporting documentation (e.g., IRB approval number, ethics committee name).\n", "items": { "type": "string" }, @@ -3443,7 +3720,7 @@ ] }, "regulatory_restrictions": { - "description": "Export or regulatory restrictions on the dataset.", + "description": "One or more export controls or regulatory restrictions applicable to the dataset (e.g., HIPAA, ITAR, GDPR).", "items": { "type": "string" }, @@ -3485,7 +3762,7 @@ ] }, "extension_details": { - "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "description": "Free-text description of how third parties can contribute to the dataset, how contributions are validated (e.g., peer review, automated tests), and how accepted contributions will be communicated to the community.\n", "items": { "type": "string" }, @@ -3527,12 +3804,9 @@ "description": "Is the dataset self-contained or does it rely on external resources (e.g., websites, other datasets)? If external, are there guarantees that those resources will remain available and unchanged?", "properties": { "archival": { - "description": "Indication whether official archival versions of external resources are included.\n", - "items": { - "type": "boolean" - }, + "description": "Indicates whether official archival versions of external resources are included in the dataset.\n", "type": [ - "array", + "boolean", "null" ] }, @@ -3578,7 +3852,7 @@ ] }, "restrictions": { - "description": "Description of any restrictions or fees associated with external resources.\n", + "description": "One or more descriptions of restrictions or fees associated with accessing these external resources (e.g., paywalls, registration requirements, API limits).\n", "items": { "type": "string" }, @@ -3614,33 +3888,38 @@ }, "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -3662,7 +3941,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -3678,7 +3957,7 @@ }, "encoding": { "$ref": "#/$defs/EncodingEnum", - "description": "the character encoding of the data" + "description": "The character encoding of the data." }, "file_type": { "$ref": "#/$defs/FileTypeEnum", @@ -3689,7 +3968,7 @@ "description": "The file format, physical medium, or dimensions of a resource. This should be a file extension or MIME type." }, "hash": { - "description": "hash of the data", + "description": "Cryptographic hash value of the data for integrity verification (e.g., SHA-256: 'e3b0c44298fc1c149afb...', MD5: 'd41d8cd98f00b204e9800998ecf8427e').", "type": [ "string", "null" @@ -3700,6 +3979,7 @@ "type": "string" }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -3707,6 +3987,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -3716,13 +3997,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -3730,13 +4012,14 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" ] }, "md5": { - "description": "md5 hash of the data", + "description": "MD5 hash value of the data (128-bit cryptographic hash).", "type": [ "string", "null" @@ -3747,6 +4030,7 @@ "description": "The media type of the data. This should be a MIME type." }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -3760,50 +4044,56 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" ] }, "path": { + "description": "The file path or URL where the content is located.", "type": [ "string", "null" ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" ] }, "sha256": { - "description": "sha256 hash of the data", + "description": "SHA-256 hash value of the data (256-bit cryptographic hash, recommended).", "type": [ "string", "null" ] }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" ] }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -3835,30 +4125,35 @@ "description": "Compression format if the collection is packaged as a compressed archive (e.g., gzip, zip, bzip2). Omit this field for uncompressed collections or purely logical groupings." }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -3873,7 +4168,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -3909,6 +4204,7 @@ "type": "string" }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -3916,6 +4212,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -3925,13 +4222,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -3939,12 +4237,14 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -3958,6 +4258,7 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" @@ -3971,6 +4272,7 @@ ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" @@ -3995,32 +4297,35 @@ ] }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" ] }, "total_bytes": { - "description": "Total size of all files in bytes.", + "description": "Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize.", "type": [ "integer", "null" ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" ] }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -4068,33 +4373,38 @@ }, "FormatDialect": { "additionalProperties": false, - "description": "Additional format information for a file", + "description": "Additional format information for a file.", "properties": { "comment_prefix": { + "description": "Character(s) used to indicate comment lines (e.g., \"#\" for CSV comments).", "type": [ "string", "null" ] }, "delimiter": { + "description": "Field delimiter character (e.g., \",\" for CSV, \"\\t\" for TSV).", "type": [ "string", "null" ] }, "double_quote": { + "description": "Whether quotes within quoted fields are escaped by doubling them. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "type": [ "string", "null" ] }, "header": { + "description": "Whether the first row of the file contains column headers. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification.", "type": [ "string", "null" ] }, "quote_char": { + "description": "Character used for quoting fields (e.g., '\"' for CSV).", "type": [ "string", "null" @@ -4105,7 +4415,7 @@ "type": "object" }, "FormatEnum": { - "description": "", + "description": "Common file format extensions for data files and documents.", "enum": [ "CSV", "TSV", @@ -4204,7 +4514,7 @@ ] }, "impact_details": { - "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "description": "Free-text description of potential future impacts or risks arising from the dataset's composition or collection (e.g., unfair treatment, privacy violations, legal or financial risks), and any recommended mitigation strategies.\n", "items": { "type": "string" }, @@ -4487,7 +4797,7 @@ ] }, "restrictions": { - "description": "Explanation of third-party IP restrictions.", + "description": "One or more explanations of third-party IP restrictions or associated fees.", "items": { "type": "string" }, @@ -4592,33 +4902,38 @@ "properties": { "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -4633,7 +4948,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -4652,6 +4967,7 @@ "type": "string" }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -4659,6 +4975,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -4668,13 +4985,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -4682,12 +5000,14 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -4701,37 +5021,42 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" ] }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" ] }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" @@ -4870,7 +5195,7 @@ ] }, "instance_type": { - "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "description": "The type or types of instances in the dataset (e.g., \"movie\", \"user\", \"rating\", \"clinical record\"). Use when the dataset contains multiple instance types with different structures.\n", "type": [ "string", "null" @@ -4936,7 +5261,7 @@ "description": "Describes how data associated with each instance was acquired (e.g., directly observed, reported by subjects, inferred).", "properties": { "acquisition_details": { - "description": "Details on how data was acquired for each instance.\n", + "description": "Free-text description of how data was acquired for each instance, including instruments, protocols, and any manual steps involved.\n", "items": { "type": "string" }, @@ -4977,28 +5302,28 @@ ] }, "was_directly_observed": { - "description": "Whether the data was directly observed", + "description": "True if the data was directly observed by a researcher or instrument; false if it was obtained through other means (e.g., reported, inferred).", "type": [ "boolean", "null" ] }, "was_inferred_derived": { - "description": "Whether the data was inferred or derived from other data", + "description": "True if the data was computationally inferred or derived from other data (e.g., model outputs, imputed values); false otherwise.", "type": [ "boolean", "null" ] }, "was_reported_by_subjects": { - "description": "Whether the data was reported directly by the subjects themselves", + "description": "True if the data was self-reported directly by the subjects themselves (e.g., survey responses, questionnaires); false otherwise.", "type": [ "boolean", "null" ] }, "was_validated_verified": { - "description": "Whether the data was validated or verified in any way", + "description": "True if the data underwent a validation or verification process (e.g., expert review, cross-checking with ground truth); false otherwise.", "type": [ "boolean", "null" @@ -5044,14 +5369,14 @@ ] }, "usage_notes": { - "description": "Notes or caveats about using the dataset for intended purposes.", + "description": "A note or caveat about using the dataset for its intended purposes.", "type": [ "string", "null" ] }, "use_category": { - "description": "Category of intended use (e.g., research, clinical, educational, commercial, policy).", + "description": "One or more categories of intended use (e.g., research, clinical, educational, commercial, policy).", "items": { "type": "string" }, @@ -5086,7 +5411,7 @@ ] }, "annotator_demographics": { - "description": "Demographic information about annotators, if available and relevant (e.g., geographic location, language background, expertise level).", + "description": "One or more demographic characteristics of the annotators, if available and relevant (e.g., geographic location, language background, expertise level, native language).", "items": { "type": "string" }, @@ -5096,9 +5421,12 @@ ] }, "data_annotation_platform": { - "description": "Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", + "description": "One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", + "items": { + "type": "string" + }, "type": [ - "string", + "array", "null" ] }, @@ -5134,7 +5462,7 @@ ] }, "labeling_details": { - "description": "Details on labeling/annotation procedures and quality metrics.\n", + "description": "Free-text description of the labeling or annotation procedures, including annotation guidelines, task definitions, and quality control metrics.\n", "items": { "type": "string" }, @@ -5176,7 +5504,7 @@ ] }, "data_use_permission": { - "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO.", "items": { "$ref": "#/$defs/DataUsePermissionEnum" }, @@ -5200,7 +5528,7 @@ ] }, "license_terms": { - "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "description": "Description of the dataset's license and terms of use, including links, costs, or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', 'CC BY-NC-SA 4.0', 'proprietary - contact data@example.org for access').", "items": { "type": "string" }, @@ -5270,7 +5598,7 @@ ] }, "tool_accuracy": { - "description": "Known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", + "description": "One or more known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., \"spaCy F1: 0.95\", \"GPT-4 Accuracy: 92%\").\n", "items": { "type": "string" }, @@ -5332,7 +5660,7 @@ ] }, "maintainer_details": { - "description": "Details on who will support, host, or maintain the dataset.\n", + "description": "Free-text description of the organization, team, or individual responsible for maintaining the dataset, including contact information and hosting arrangements.\n", "items": { "type": "string" }, @@ -5367,7 +5695,7 @@ "type": "object" }, "MediaTypeEnum": { - "description": "", + "description": "MIME media types (Internet Media Types) for file content identification.", "enum": [ "text/csv", "text/tab-separated-values", @@ -5404,7 +5732,7 @@ ] }, "handling_strategy": { - "description": "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation).\n", + "description": "The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple imputation, flagging with sentinel values).\n", "type": [ "string", "null" @@ -5604,7 +5932,7 @@ ] }, "task_details": { - "description": "Details on other potential tasks the dataset could be used for.\n", + "description": "Free-text description of other potential tasks the dataset could support, including any prerequisites or limitations for those uses.\n", "items": { "type": "string" }, @@ -5786,7 +6114,7 @@ ] }, "preprocessing_details": { - "description": "Details on preprocessing steps applied to the data.\n", + "description": "Free-text description of preprocessing steps applied to the data, including tools used, parameters, order of operations, and rationale for each step.\n", "items": { "type": "string" }, @@ -5835,7 +6163,7 @@ ] }, "prohibition_reason": { - "description": "Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", + "description": "One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint).", "items": { "type": "string" }, @@ -5937,7 +6265,7 @@ ] }, "raw_data_details": { - "description": "Details on raw data availability and access procedures.\n", + "description": "Free-text description of raw data availability, access procedures, and any conditions or restrictions on accessing the raw data.\n", "items": { "type": "string" }, @@ -5993,7 +6321,7 @@ ] }, "raw_data_format": { - "description": "Format of the raw data before any preprocessing.\n", + "description": "One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON).\n", "items": { "type": "string" }, @@ -6007,7 +6335,7 @@ "type": "string" }, "source_type": { - "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "description": "One or more types of raw source (e.g., sensor, database, user input, web scraping).\n", "items": { "type": "string" }, @@ -6059,7 +6387,7 @@ ] }, "relationship_details": { - "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "description": "Free-text description of how relationships between instances are represented (e.g., graph edges, ratings matrices, foreign keys), including relationship types and any associated metadata.\n", "items": { "type": "string" }, @@ -6108,7 +6436,7 @@ ] }, "retention_details": { - "description": "Details on data retention limits and enforcement procedures.\n", + "description": "Free-text description of applicable retention limits, legal or ethical basis for those limits, and how they will be enforced (e.g., automated deletion, anonymization after the retention period).\n", "items": { "type": "string" }, @@ -6158,31 +6486,22 @@ }, "is_random": { "description": "Indicates whether the sample is random.", - "items": { - "type": "boolean" - }, "type": [ - "array", + "boolean", "null" ] }, "is_representative": { "description": "Indicates whether the sample is representative of the larger set.\n", - "items": { - "type": "boolean" - }, "type": [ - "array", + "boolean", "null" ] }, "is_sample": { "description": "Indicates whether it is a sample of a larger set.", - "items": { - "type": "boolean" - }, "type": [ - "array", + "boolean", "null" ] }, @@ -6194,7 +6513,7 @@ ] }, "representative_verification": { - "description": "Explanation of how representativeness was validated or verified.\n", + "description": "One or more explanations of how representativeness was validated or verified (e.g., statistical tests, domain expert review).\n", "items": { "type": "string" }, @@ -6204,7 +6523,7 @@ ] }, "source_data": { - "description": "Description of the larger set from which the sample was drawn, if any.\n", + "description": "One or more descriptions of the larger sets from which the sample was drawn, if applicable.\n", "items": { "type": "string" }, @@ -6214,7 +6533,7 @@ ] }, "strategies": { - "description": "Description of the sampling strategy (deterministic, probabilistic, etc.).\n", + "description": "One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, systematic).\n", "items": { "type": "string" }, @@ -6234,7 +6553,7 @@ ] }, "why_not_representative": { - "description": "Explanation of why the sample is not representative, if applicable.\n", + "description": "One or more explanations of why the sample is not representative of the larger set, if applicable.\n", "items": { "type": "string" }, @@ -6319,6 +6638,7 @@ "type": "string" }, "license": { + "description": "The license under which the software is distributed (e.g., \"MIT\", \"Apache-2.0\", \"GPL-3.0\").", "type": [ "string", "null" @@ -6332,12 +6652,14 @@ ] }, "url": { + "description": "URL where the software can be found (e.g., homepage, repository, or documentation).", "type": [ "string", "null" ] }, "version": { + "description": "The version identifier of the software (e.g., \"1.0.0\", \"2.3.1-beta\").", "type": [ "string", "null" @@ -6376,7 +6698,7 @@ ] }, "split_details": { - "description": "Details on recommended data splits and their rationale.\n", + "description": "Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how they are defined, and the rationale for the split strategy.\n", "items": { "type": "string" }, @@ -6411,6 +6733,7 @@ ] }, "distribution": { + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", "items": { "type": "string" }, @@ -6427,6 +6750,7 @@ ] }, "identification": { + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", "items": { "type": "string" }, @@ -6588,7 +6912,7 @@ ] }, "update_details": { - "description": "Details on update plans, responsible parties, and communication methods.\n", + "description": "Free-text description of planned update types (e.g., corrections, additions, deletions), responsible parties, and how updates will be communicated to users.\n", "items": { "type": "string" }, @@ -6613,7 +6937,7 @@ }, "UseRepository": { "additionalProperties": false, - "description": "Is there a repository that links to any or all papers or systems that use the dataset? If so, provide a link or other access point.", + "description": "A repository or registry of known uses of this dataset by third parties. Documents where the dataset has been applied, enabling discoverability of downstream use cases and impact tracking.", "properties": { "description": { "description": "A human-readable description for this property.", @@ -6637,7 +6961,7 @@ ] }, "repository_details": { - "description": "Details on the repository of known dataset uses.\n", + "description": "Free-text description of the repository of known dataset uses, including how it is maintained and how to contribute new use cases.\n", "items": { "type": "string" }, @@ -6672,7 +6996,7 @@ "description": "Metadata describing an individual variable, field, or column in a dataset. Variables may represent measurements, observations, derived values, or categorical attributes.", "properties": { "categories": { - "description": "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", + "description": "One or more permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", "items": { "type": "string" }, @@ -6852,7 +7176,7 @@ ] }, "latest_version_doi": { - "description": "DOI or URL of the latest dataset version.", + "description": "DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567').", "type": [ "string", "null" @@ -6876,7 +7200,7 @@ ] }, "version_details": { - "description": "Details on version support policies and obsolescence communication.\n", + "description": "Free-text description of version support policies, how long older versions will be hosted, and how dataset consumers will be notified when versions become obsolete.\n", "items": { "type": "string" }, @@ -6904,16 +7228,7 @@ "enum": [ "MAJOR", "MINOR", - "PATCH", - "Windows-1250", - "Windows-1251", - "Windows-1252", - "Windows-1253", - "Windows-1254", - "Windows-1255", - "Windows-1256", - "Windows-1257", - "Windows-1258" + "PATCH" ], "title": "VersionTypeEnum", "type": "string" @@ -6927,33 +7242,38 @@ "properties": { "compression": { "$ref": "#/$defs/CompressionEnum", - "description": "compression format used, if any. e.g., gzip, bzip2, zip" + "description": "Compression format used, if any (e.g., gzip, bzip2, zip)." }, "conforms_to": { + "description": "An established standard, specification, or schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_class": { + "description": "The specific class or type within a schema to which the resource conforms.", "type": [ "string", "null" ] }, "conforms_to_schema": { + "description": "The schema or data model to which the resource conforms.", "type": [ "string", "null" ] }, "created_by": { + "description": "The person or organization primarily responsible for creating the resource.", "type": [ "string", "null" ] }, "created_on": { + "description": "The date and time when the resource was created.", "format": "date-time", "type": [ "string", @@ -6968,7 +7288,7 @@ ] }, "doi": { - "description": "digital object identifier", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567').", "pattern": "10\\.\\d{4,}\\/.+", "type": [ "string", @@ -6987,6 +7307,7 @@ "type": "string" }, "issued": { + "description": "Date of formal issuance or publication of the resource.", "format": "date-time", "type": [ "string", @@ -6994,6 +7315,7 @@ ] }, "keywords": { + "description": "Keywords or tags describing the resource for discovery and classification.", "items": { "type": "string" }, @@ -7003,13 +7325,14 @@ ] }, "language": { - "description": "language in which the information is expressed", + "description": "Language in which the information is expressed.", "type": [ "string", "null" ] }, "last_updated_on": { + "description": "The date and time when the resource was most recently modified or updated.", "format": "date-time", "type": [ "string", @@ -7017,12 +7340,14 @@ ] }, "license": { + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", "type": [ "string", "null" ] }, "modified_by": { + "description": "A person or organization that contributed to modifying or updating the resource.", "type": [ "string", "null" @@ -7036,12 +7361,14 @@ ] }, "page": { + "description": "A landing page or web page providing access to or information about the resource.", "type": [ "string", "null" ] }, "publisher": { + "description": "The organization or entity responsible for making the resource available.", "type": [ "string", "null" @@ -7058,25 +7385,28 @@ ] }, "status": { + "description": "The status of the resource (e.g., draft, published, deprecated).", "type": [ "string", "null" ] }, "title": { - "description": "the official title of the element", + "description": "The official title of the element.", "type": [ "string", "null" ] }, "version": { + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", "type": [ "string", "null" ] }, "was_derived_from": { + "description": "A resource from which this resource was derived, in whole or in part.", "type": [ "string", "null" diff --git a/project/owl/data_sheets_schema.owl.ttl b/project/owl/data_sheets_schema.owl.ttl index eee1fd30..06b9daa9 100644 --- a/project/owl/data_sheets_schema.owl.ttl +++ b/project/owl/data_sheets_schema.owl.ttl @@ -15,10 +15,10 @@ data_sheets_schema:DatasetCollection a owl:Class, linkml:ClassDefinition ; rdfs:label "DatasetCollection" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Dataset ; + owl:minCardinality 0 ; owl:onProperty data_sheets_schema:resources ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom data_sheets_schema:Dataset ; owl:onProperty data_sheets_schema:resources ], data_sheets_schema:Information ; skos:altLabel "data resource collection", @@ -35,168 +35,59 @@ data_sheets_schema:FormatDialect a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:header ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:delimiter ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:quote_char ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:delimiter ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:comment_prefix ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:quote_char ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:header ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:double_quote ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:double_quote ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:double_quote ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:delimiter ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:comment_prefix ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:quote_char ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:header ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:comment_prefix ] ; - skos:definition "Additional format information for a file" ; - skos:inScheme data_sheets_schema:base . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "DirectCollection" ; - rdfs:subClassOf [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty data_sheets_schema:double_quote ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:delimiter ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Indicates whether the data was collected directly from the individuals in question or obtained via third parties/other sources. -""" ; - skos:inScheme data_sheets_schema:collection . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "Relationships" ; - rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty data_sheets_schema:delimiter ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Are relationships between individual instances made explicit (e.g., users' movie ratings, social network links)? -""" ; - skos:inScheme data_sheets_schema:composition . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "Splits" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:quote_char ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Are there recommended data splits (e.g., training, validation, testing)? If so, how are they defined and why? -""" ; - skos:inScheme data_sheets_schema:composition . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "ThirdPartySharing" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty data_sheets_schema:header ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty data_sheets_schema:delimiter ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created? -""" ; - skos:inScheme data_sheets_schema:distribution . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "CollectionConsent" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Did the individuals in question consent to the collection and use of their data? If so, how was consent requested and provided, and what language did individuals consent to? -""" ; - skos:inScheme data_sheets_schema:ethics . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "CollectionNotification" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty data_sheets_schema:comment_prefix ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """Were the individuals in question notified about the data collection? If so, please describe (or show with screenshots, etc.) how notice was provided, and reproduce the language of the notification itself if possible. -""" ; - skos:inScheme data_sheets_schema:ethics . - - a owl:Class, - linkml:ClassDefinition ; - rdfs:label "ConsentRevocation" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty data_sheets_schema:comment_prefix ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; - skos:definition """If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses? If so, please describe. -""" ; - skos:inScheme data_sheets_schema:ethics . + owl:onProperty data_sheets_schema:comment_prefix ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:header ] ; + skos:definition "Additional format information for a file." ; + skos:inScheme data_sheets_schema:base . data_sheets_schema:same_as a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "same_as" ; rdfs:range linkml:Uriorcurie ; - skos:definition "URL of a reference web resource that is the same as this dataset. Used to link to canonical or alternative representations of the same dataset on different platforms (e.g., DOI resolver, institutional repository, data catalog)." ; + skos:definition "One or more URLs or URIs identifying equivalent or related representations of this dataset. Used to link to canonical or alternative representations of the same dataset on different platforms (e.g., DOI resolver, institutional repository, data catalog)." ; skos:exactMatch schema1:sameAs ; - skos:inScheme . + skos:inScheme ; + data_sheets_schema:docExample "doi:10.XXXXX/example-dataset" . data_sheets_schema:themes a owl:ObjectProperty, linkml:SlotDefinition ; @@ -209,23 +100,23 @@ data_sheets_schema:DataSubset a owl:Class, linkml:ClassDefinition ; rdfs:label "DataSubset" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:is_data_split ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Boolean ; owl:onProperty data_sheets_schema:is_subpopulation ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:is_subpopulation ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:is_data_split ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:is_data_split ], + owl:onProperty data_sheets_schema:is_subpopulation ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; - owl:onProperty data_sheets_schema:is_subpopulation ], + owl:onProperty data_sheets_schema:is_data_split ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:is_data_split ], + owl:onProperty data_sheets_schema:is_subpopulation ], data_sheets_schema:Dataset ; skos:definition "A subset of a dataset, likely containing multiple files of multiple potential purposes and properties." ; skos:inScheme . @@ -235,103 +126,103 @@ data_sheets_schema:File a owl:Class, rdfs:label "File" ; rdfs:subClassOf [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:md5 ], + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:EncodingEnum ; - owl:onProperty data_sheets_schema:encoding ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:hash ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:dialect ], + owl:onProperty data_sheets_schema:encoding ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom data_sheets_schema:CompressionEnum ; owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty data_sheets_schema:bytes ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:md5 ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:encoding ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:sha256 ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:bytes ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:path ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:dialect ], + owl:allValuesFrom data_sheets_schema:FileTypeEnum ; + owl:onProperty data_sheets_schema:file_type ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:encoding ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:MediaTypeEnum ; - owl:onProperty data_sheets_schema:media_type ], + owl:onProperty data_sheets_schema:dialect ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:hash ], + owl:onProperty data_sheets_schema:encoding ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:md5 ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:path ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:file_type ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:hash ], + owl:allValuesFrom data_sheets_schema:EncodingEnum ; + owl:onProperty data_sheets_schema:encoding ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:format ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:media_type ], + owl:onProperty data_sheets_schema:sha256 ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom data_sheets_schema:FormatEnum ; owl:onProperty data_sheets_schema:format ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:media_type ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:format ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:compression ], + owl:onProperty data_sheets_schema:bytes ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:path ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:FileTypeEnum ; - owl:onProperty data_sheets_schema:file_type ], + owl:onProperty data_sheets_schema:sha256 ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:format ], + owl:onProperty data_sheets_schema:media_type ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:dialect ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:file_type ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:CompressionEnum ; - owl:onProperty data_sheets_schema:compression ], + owl:allValuesFrom linkml:Integer ; + owl:onProperty data_sheets_schema:bytes ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:path ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:hash ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:sha256 ], + owl:onProperty data_sheets_schema:bytes ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:FormatEnum ; - owl:onProperty data_sheets_schema:format ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:path ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:bytes ], + owl:onProperty data_sheets_schema:media_type ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:sha256 ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:dialect ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:hash ], + owl:onProperty data_sheets_schema:md5 ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:md5 ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:sha256 ], + owl:onProperty data_sheets_schema:hash ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:MediaTypeEnum ; + owl:onProperty data_sheets_schema:media_type ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:file_type ], data_sheets_schema:Information ; skos:altLabel "data file", "file", @@ -345,32 +236,32 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Software" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:license ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:allValuesFrom linkml:Uri ; owl:onProperty data_sheets_schema:url ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:version ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:license ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:version ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:url ], + owl:onProperty data_sheets_schema:license ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:url ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:version ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:version ], + owl:onProperty data_sheets_schema:url ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:license ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:url ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:version ], data_sheets_schema:NamedThing ; skos:definition "A software program or library." ; skos:exactMatch schema1:SoftwareApplication ; @@ -395,29 +286,29 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "CollectionTimeframe" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:Date ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Date ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Date ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Over what timeframe was the data collected, and does this timeframe match the creation timeframe of the underlying data? """ ; @@ -430,68 +321,91 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Who was involved in the data collection (e.g., students, crowdworkers, contractors), and how they were compensated. """ ; skos:inScheme data_sheets_schema:collection . + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "DirectCollection" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Indicates whether the data was collected directly from the individuals in question or obtained via third parties/other sources. +""" ; + skos:inScheme data_sheets_schema:collection . + a owl:Class, linkml:ClassDefinition ; rdfs:label "InstanceAcquisition" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Describes how data associated with each instance was acquired (e.g., directly observed, reported by subjects, inferred). """ ; @@ -501,25 +415,25 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "MissingDataDocumentation" ; rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Documentation of missing data in the dataset, including patterns, causes, and strategies for handling missing values. @@ -533,33 +447,33 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 1 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Description of raw data sources before preprocessing, cleaning, or labeling. Documents where the original data comes from and how it can be accessed. """ ; @@ -570,9 +484,6 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Confidentiality" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; @@ -580,10 +491,13 @@ data_sheets_schema:Software a owl:Class, owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Does the dataset contain data that might be confidential (e.g., protected by legal privilege, patient data, non-public communications)? """ ; @@ -593,20 +507,20 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ContentWarning" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Does the dataset contain any data that might be offensive, insulting, threatening, or otherwise anxiety-provoking if viewed directly? """ ; @@ -616,10 +530,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "DataAnomaly" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Are there any errors, sources of noise, or redundancies in the dataset? @@ -630,11 +544,8 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "DatasetBias" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], @@ -642,26 +553,29 @@ data_sheets_schema:Software a owl:Class, owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom data_sheets_schema:BiasTypeEnum ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:BiasTypeEnum ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Documents known biases present in the dataset. Biases are systematic errors or prejudices that may affect the representativeness or fairness of the data. Distinct from anomalies (data quality issues) and limitations (scope constraints). """ ; @@ -672,41 +586,41 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "DatasetLimitation" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:LimitationTypeEnum ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:LimitationTypeEnum ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Documents known limitations of the dataset that may affect its use or interpretation. Distinct from biases (systematic errors) and anomalies (data quality issues). """ ; @@ -718,32 +632,32 @@ data_sheets_schema:Software a owl:Class, rdfs:label "DatasetRelationship" ; rdfs:subClassOf [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:DatasetRelationshipTypeEnum ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 1 ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 1 ; owl:onProperty ], - data_sheets_schema:DatasetProperty ; + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:DatasetRelationshipTypeEnum ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; skos:definition """Typed relationship to another dataset, enabling precise specification of how datasets relate to each other (e.g., supplements, derives from, is version of). Supports RO-Crate-style dataset interlinking. """ ; skos:inScheme data_sheets_schema:composition . @@ -753,34 +667,34 @@ data_sheets_schema:Software a owl:Class, rdfs:label "Deidentification" ; rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Is it possible to identify individuals in the dataset, either directly or indirectly (in combination with other data)? """ ; @@ -790,71 +704,71 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Instance" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty ], + owl:allValuesFrom linkml:Integer ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Uriorcurie ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Uriorcurie ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty ], + owl:allValuesFrom linkml:Uriorcurie ; + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)? """ ; @@ -866,34 +780,48 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Is any information missing from individual instances? (e.g., unavailable data) """ ; skos:inScheme data_sheets_schema:composition . + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "Relationships" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Are relationships between individual instances made explicit (e.g., users' movie ratings, social network links)? +""" ; + skos:inScheme data_sheets_schema:composition . + a owl:Class, linkml:ClassDefinition ; rdfs:label "SensitiveElement" ; rdfs:subClassOf [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], + [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; @@ -904,30 +832,44 @@ data_sheets_schema:Software a owl:Class, skos:exactMatch ; skos:inScheme data_sheets_schema:composition . + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "Splits" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Are there recommended data splits (e.g., training, validation, testing)? If so, how are they defined and why? +""" ; + skos:inScheme data_sheets_schema:composition . + a owl:Class, linkml:ClassDefinition ; rdfs:label "Subpopulation" ; rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Does the dataset identify any subpopulations (e.g., by age, gender)? If so, how are they identified and what are their distributions? """ ; @@ -939,20 +881,26 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:ComplianceStatusEnum ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom data_sheets_schema:Person ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; @@ -962,18 +910,12 @@ data_sheets_schema:Software a owl:Class, owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:ComplianceStatusEnum ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Do any export controls or other regulatory restrictions apply to the dataset or to individual instances? Includes compliance tracking for regulations like HIPAA and other US regulations. If so, please describe these restrictions and provide a link or copy of any supporting documentation. Maps to DUO terms related to ethics approval, geographic restrictions, and institutional requirements. @@ -984,10 +926,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "IPRestrictions" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Have any third parties imposed IP-based or other restrictions on the data associated with the instances? If so, describe them and note any relevant fees or licensing terms. Maps to DUO terms related to commercial/non-profit use restrictions (NCU, NPU, NPUNCU). @@ -998,25 +940,25 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "LicenseAndUseTerms" ; rdfs:subClassOf [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; owl:allValuesFrom data_sheets_schema:Person ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom data_sheets_schema:DataUsePermissionEnum ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:DataUsePermissionEnum ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Will the dataset be distributed under a copyright or other IP license, and/or under applicable terms of use? Provide a link or copy of relevant licensing terms and any fees. @@ -1041,16 +983,75 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "DistributionFormat" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Uri ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """How will the dataset be distributed (e.g., tarball on a website, API, GitHub)? """ ; skos:inScheme data_sheets_schema:distribution . + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "ThirdPartySharing" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created? +""" ; + skos:inScheme data_sheets_schema:distribution . + + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "CollectionConsent" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Did the individuals in question consent to the collection and use of their data? If so, how was consent requested and provided, and what language did individuals consent to? +""" ; + skos:inScheme data_sheets_schema:ethics . + + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "CollectionNotification" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """Were the individuals in question notified about the data collection? If so, please describe (or show with screenshots, etc.) how notice was provided, and reproduce the language of the notification itself if possible. +""" ; + skos:inScheme data_sheets_schema:ethics . + + a owl:Class, + linkml:ClassDefinition ; + rdfs:label "ConsentRevocation" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; + skos:definition """If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses? If so, please describe. +""" ; + skos:inScheme data_sheets_schema:ethics . + a owl:Class, linkml:ClassDefinition ; rdfs:label "DataProtectionImpact" ; @@ -1069,29 +1070,29 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "EthicalReview" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom data_sheets_schema:Organization ; owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Person ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Organization ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom data_sheets_schema:Person ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Were any ethical or compliance review processes conducted (e.g., by an institutional review board)? If so, please provide a description of these review processes, including the frequency of review and documentation of outcomes, as well as a link or other access point to any supporting documentation. """ ; @@ -1101,32 +1102,32 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "AtRiskPopulations" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Information about protections for at-risk populations in human subjects research. """ ; @@ -1138,27 +1139,27 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], @@ -1171,38 +1172,38 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "HumanSubjectResearch" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Information about whether the dataset involves human subjects research and what regulatory or ethical review processes were followed. """ ; @@ -1212,38 +1213,38 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "InformedConsent" ; rdfs:subClassOf [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Details about informed consent procedures used in human subjects research. """ ; @@ -1253,29 +1254,29 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ParticipantPrivacy" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Information about privacy protections and anonymization procedures for human research participants. """ ; @@ -1288,16 +1289,16 @@ data_sheets_schema:Software a owl:Class, owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Uri ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Uri ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Is there an erratum? If so, please provide a link or other access point. @@ -1308,20 +1309,20 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ExtensionMechanism" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:Uri ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Uri ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please describe how those contributions are validated and communicated. """ ; @@ -1331,11 +1332,11 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Maintainer" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], @@ -1355,10 +1356,10 @@ data_sheets_schema:Software a owl:Class, rdfs:label "RetentionLimits" ; rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], @@ -1366,7 +1367,7 @@ data_sheets_schema:Software a owl:Class, owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """If the dataset relates to people, are there applicable limits on the retention of their data (e.g., were individuals told their data would be deleted after a certain time)? If so, please describe these limits and how they will be enforced. @@ -1378,16 +1379,16 @@ data_sheets_schema:Software a owl:Class, rdfs:label "UpdatePlan" ; rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], @@ -1404,23 +1405,23 @@ data_sheets_schema:Software a owl:Class, owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Uriorcurie ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Will older versions of the dataset continue to be supported/hosted/maintained? If so, how? If not, how will obsolescence be communicated to dataset consumers? """ ; @@ -1430,10 +1431,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "AddressingGap" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; @@ -1448,21 +1449,21 @@ data_sheets_schema:Software a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:Person ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:CRediTRoleEnum ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Person ; - owl:onProperty ], + owl:allValuesFrom data_sheets_schema:CRediTRoleEnum ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom data_sheets_schema:Organization ; owl:onProperty ], @@ -1479,16 +1480,16 @@ data_sheets_schema:Software a owl:Class, owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Who funded the creation of the dataset? If there is an associated grant, please provide the name of the grantor and the grant name and number. """ ; @@ -1498,14 +1499,14 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Grant" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], data_sheets_schema:NamedThing ; skos:definition """The name and/or identifier of the specific mechanism providing monetary support or other resources supporting creation of the dataset. """ ; @@ -1523,13 +1524,13 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "Purpose" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition "For what purpose was the dataset created?" ; @@ -1542,10 +1543,10 @@ data_sheets_schema:Software a owl:Class, owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition "Was there a specific task in mind for the dataset's application?" ; @@ -1555,35 +1556,29 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "AnnotationAnalysis" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Float ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], @@ -1591,9 +1586,15 @@ data_sheets_schema:Software a owl:Class, owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - data_sheets_schema:DatasetProperty ; + owl:allValuesFrom linkml:Float ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + data_sheets_schema:DatasetProperty ; skos:definition """Analysis of annotation quality, inter-annotator agreement metrics, and systematic patterns in annotation disagreements. """ ; skos:exactMatch ; @@ -1618,31 +1619,31 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ImputationProtocol" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Description of data imputation methodology, including techniques used to handle missing values and rationale for chosen approaches. @@ -1654,50 +1655,47 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "LabelingStrategy" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:Integer ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Was any labeling of the data done (e.g., part-of-speech tagging)? This class documents the annotation process and quality metrics. """ ; @@ -1708,22 +1706,22 @@ data_sheets_schema:Software a owl:Class, rdfs:label "MachineAnnotationTools" ; rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Automated or machine-learning-based annotation tools used in dataset creation, including NLP pipelines, computer vision models, or other automated labeling systems. """ ; @@ -1756,10 +1754,10 @@ data_sheets_schema:Software a owl:Class, owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], @@ -1786,10 +1784,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ExistingUse" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Has the dataset been used for any tasks already? @@ -1800,10 +1798,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "FutureUseImpact" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Is there anything about the dataset's composition or collection that might impact future uses or create risks/harm (e.g., unfair treatment, legal or financial risks)? If so, describe these impacts and any mitigation strategies. @@ -1815,14 +1813,11 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "IntendedUse" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], @@ -1834,7 +1829,10 @@ data_sheets_schema:Software a owl:Class, owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Explicit statement of intended uses for this dataset. Complements FutureUseImpact by focusing on positive, recommended applications rather than risks. Aligns with RO-Crate "Intended Use" field. """ ; @@ -1859,10 +1857,10 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "ProhibitedUse" ; rdfs:subClassOf [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Explicit statement of prohibited or forbidden uses for this dataset. Stronger than DiscouragedUse - these are uses that are explicitly not permitted by license, ethics, or policy. Aligns with RO-Crate "Prohibited Uses" field. @@ -1873,23 +1871,22 @@ data_sheets_schema:Software a owl:Class, linkml:ClassDefinition ; rdfs:label "UseRepository" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Uri ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Uri ; owl:onProperty ], data_sheets_schema:DatasetProperty ; - skos:definition """Is there a repository that links to any or all papers or systems that use the dataset? If so, provide a link or other access point. -""" ; + skos:definition "A repository or registry of known uses of this dataset by third parties. Documents where the dataset has been applied, enabling discoverability of downstream use cases and impact tracking." ; skos:inScheme data_sheets_schema:uses . a owl:Class, @@ -1897,118 +1894,118 @@ data_sheets_schema:Software a owl:Class, rdfs:label "VariableMetadata" ; rdfs:subClassOf [ a owl:Restriction ; owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:minCardinality 1 ; owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:Float ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:allValuesFrom linkml:Uriorcurie ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:Float ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom linkml:Integer ; owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Uriorcurie ; - owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Float ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom data_sheets_schema:VariableTypeEnum ; owl:onProperty ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty ], + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition "Metadata describing an individual variable, field, or column in a dataset. Variables may represent measurements, observations, derived values, or categorical attributes." ; skos:exactMatch schema1:PropertyValue ; @@ -2619,31 +2616,70 @@ data_sheets_schema:ConfigurationFile a owl:Class, rdfs:label "UTF-8" ; rdfs:subClassOf data_sheets_schema:EncodingEnum . -data_sheets_schema:FileCollection a owl:Class, - linkml:ClassDefinition ; - rdfs:label "FileCollection" ; - rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:external_resources ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:path ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty data_sheets_schema:total_bytes ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:collection_type ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty data_sheets_schema:file_count ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:file_count ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:external_resources ], - [ a owl:Restriction ; + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1250" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1251" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1252" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1253" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1254" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1255" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1256" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1257" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + + a owl:Class, + data_sheets_schema:EncodingEnum ; + rdfs:label "Windows-1258" ; + rdfs:subClassOf data_sheets_schema:EncodingEnum . + +data_sheets_schema:FileCollection a owl:Class, + linkml:ClassDefinition ; + rdfs:label "FileCollection" ; + rdfs:subClassOf [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:file_count ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Integer ; + owl:onProperty data_sheets_schema:total_bytes ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:external_resources ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:CompressionEnum ; + owl:onProperty data_sheets_schema:compression ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:collection_type ], + [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:resources ], [ a owl:Restriction ; @@ -2651,31 +2687,37 @@ data_sheets_schema:FileCollection a owl:Class, owl:onProperty data_sheets_schema:resources ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:path ], + owl:onProperty data_sheets_schema:file_count ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Integer ; owl:onProperty data_sheets_schema:file_count ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:compression ], + owl:onProperty data_sheets_schema:path ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:total_bytes ], + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:path ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty data_sheets_schema:total_bytes ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:path ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:external_resources ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:compression ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:total_bytes ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:path ], [ a owl:Restriction ; owl:allValuesFrom data_sheets_schema:FileCollectionTypeEnum ; owl:onProperty data_sheets_schema:collection_type ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:CompressionEnum ; - owl:onProperty data_sheets_schema:compression ], data_sheets_schema:Information ; skos:altLabel "data files", "file collection", @@ -3026,59 +3068,16 @@ data_sheets_schema:ValidationSplit a owl:Class, rdfs:label "PATCH" ; rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1250" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1251" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1252" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1253" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1254" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1255" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1256" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1257" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - - a owl:Class, - data_sheets_schema:VersionTypeEnum ; - rdfs:label "Windows-1258" ; - rdfs:subClassOf data_sheets_schema:VersionTypeEnum . - data_sheets_schema:acquisition_methods a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "acquisition_methods" ; + skos:definition "Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Collection module describing how data was sourced, whether directly observed or derived." ; skos:inScheme . data_sheets_schema:addressing_gaps a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "addressing_gaps" ; + skos:definition "Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation module, each describing a gap in existing datasets or knowledge that this dataset fills." ; skos:inScheme . data_sheets_schema:affiliation a owl:ObjectProperty, @@ -3096,38 +3095,40 @@ data_sheets_schema:annotation_analyses a owl:ObjectProperty, data_sheets_schema:anomalies a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "anomalies" ; + skos:definition "Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects from the Composition module, each documenting a specific anomaly and its potential impact." ; skos:inScheme . data_sheets_schema:cleaning_strategies a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "cleaning_strategies" ; + skos:definition "Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy objects from the Preprocessing module describing outlier removal, deduplication, and error correction steps." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "acquisition_details" ; - skos:definition """Details on how data was acquired for each instance. + skos:definition """Free-text description of how data was acquired for each instance, including instruments, protocols, and any manual steps involved. """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "collection_details" ; - skos:definition """Details on direct vs. indirect collection methods and sources. + skos:definition """Free-text description of whether data was collected directly from individuals or obtained via third parties or other indirect sources, and what those sources are. """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "collector_details" ; - skos:definition """Details on who collected the data and their compensation. + skos:definition """Free-text description of who was involved in data collection (e.g., students, crowdworkers, contractors), their training or qualifications, and how they were compensated. """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "mechanism_details" ; - skos:definition """Details on mechanisms or procedures used to collect the data. + skos:definition """Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardware model, software API, manual curation process), including how those mechanisms were validated. """ ; skos:inScheme data_sheets_schema:collection . @@ -3148,32 +3149,47 @@ data_sheets_schema:cleaning_strategies a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "raw_data_format" ; - skos:definition """Format of the raw data before any preprocessing. + skos:definition """One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON). """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "source_type" ; - skos:definition """Type of raw source (sensor, database, user input, web scraping, etc.). + skos:broadMatch dcterms:type ; + skos:definition """One or more types of raw source (e.g., sensor, database, user input, web scraping). """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "timeframe_details" ; - skos:definition """Details on the collection timeframe and relationship to data creation dates. + skos:definition """Free-text description of the data collection period and whether this timeframe matches the creation timeframe of the underlying data (e.g., historical records, prospective collection). """ ; skos:inScheme data_sheets_schema:collection . +data_sheets_schema:collection_consents a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "collection_consents" ; + skos:definition "Consent obtained from individuals for data collection and use. List of CollectionConsent objects from the Ethics module describing how consent was requested, provided, and documented." ; + skos:inScheme . + data_sheets_schema:collection_mechanisms a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "collection_mechanisms" ; + skos:definition "Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from the Collection module describing sensors, surveys, APIs, or other collection instruments." ; + skos:inScheme . + +data_sheets_schema:collection_notifications a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "collection_notifications" ; + skos:definition "Notifications provided to individuals about data collection. List of CollectionNotification objects from the Ethics module describing how and when individuals were informed about the data collection." ; skos:inScheme . data_sheets_schema:collection_timeframes a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "collection_timeframes" ; + skos:definition "Time periods during which data was collected. List of CollectionTimeframe objects from the Collection module describing collection start and end dates, and any gaps in the collection period." ; skos:inScheme . data_sheets_schema:collection_type a owl:ObjectProperty, @@ -3187,28 +3203,31 @@ data_sheets_schema:collection_type a owl:ObjectProperty, rdfs:label "ExternalResource" ; rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:external_resources ], + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:minCardinality 0 ; owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty data_sheets_schema:external_resources ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:external_resources ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], data_sheets_schema:DatasetProperty ; skos:definition """Is the dataset self-contained or does it rely on external resources (e.g., websites, other datasets)? If external, are there guarantees that those resources will remain available and unchanged? """ ; @@ -3218,14 +3237,23 @@ data_sheets_schema:collection_type a owl:ObjectProperty, linkml:ClassDefinition ; rdfs:label "SamplingStrategy" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Boolean ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty ], @@ -3234,28 +3262,28 @@ data_sheets_schema:collection_type a owl:ObjectProperty, owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty ], + owl:maxCardinality 1 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:allValuesFrom linkml:String ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty ], + owl:onProperty ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty ], + owl:onProperty ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty ], @@ -3267,28 +3295,22 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "affected_subsets" ; - skos:definition """Specific subsets or features of the dataset affected by this bias. + skos:definition """One or more specific subsets or features of the dataset affected by this bias (e.g., "female participants", "non-English text", "images taken at night"). """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "anomaly_details" ; - skos:definition """Details on errors, noise sources, or redundancies in the dataset. -""" ; - skos:inScheme data_sheets_schema:composition . - - a owl:ObjectProperty, - linkml:SlotDefinition ; - rdfs:label "archival" ; - skos:definition """Indication whether official archival versions of external resources are included. + skos:broadMatch dcterms:description ; + skos:definition """Free-text description of errors, noise sources, or redundancies in the dataset, including their known causes and estimated prevalence. """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "confidentiality_details" ; - skos:definition """Details on confidential data elements and handling procedures. + skos:definition """Free-text description of which data elements are confidential, the basis for confidentiality (e.g., legal privilege, patient data), and how they are handled or restricted. """ ; skos:inScheme data_sheets_schema:composition . @@ -3302,11 +3324,14 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "distribution" ; + skos:broadMatch dcterms:description ; + skos:definition "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup." ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "future_guarantees" ; + skos:broadMatch dcterms:description ; skos:definition """Explanation of any commitments that external resources will remain available and stable over time. """ ; skos:inScheme data_sheets_schema:composition . @@ -3314,36 +3339,20 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "identification" ; + skos:broadMatch dcterms:description ; + skos:definition "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics)." ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "identifiers_removed" ; - skos:definition "List of identifier types removed during de-identification." ; - skos:inScheme data_sheets_schema:composition . - - a owl:ObjectProperty, - linkml:SlotDefinition ; - rdfs:label "is_random" ; - skos:definition "Indicates whether the sample is random." ; - skos:inScheme data_sheets_schema:composition . - - a owl:ObjectProperty, - linkml:SlotDefinition ; - rdfs:label "is_representative" ; - skos:definition """Indicates whether the sample is representative of the larger set. -""" ; - skos:inScheme data_sheets_schema:composition . - - a owl:ObjectProperty, - linkml:SlotDefinition ; - rdfs:label "is_sample" ; - skos:definition "Indicates whether it is a sample of a larger set." ; + skos:definition "List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision')." ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "missing" ; + skos:broadMatch dcterms:description ; skos:definition """Description of the missing data fields or elements. """ ; skos:inScheme data_sheets_schema:composition . @@ -3358,21 +3367,23 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "relationship_details" ; - skos:definition """Details on relationships between instances (e.g., graph edges, ratings). + skos:definition """Free-text description of how relationships between instances are represented (e.g., graph edges, ratings matrices, foreign keys), including relationship types and any associated metadata. """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "representative_verification" ; - skos:definition """Explanation of how representativeness was validated or verified. + skos:broadMatch schema1:description ; + skos:definition """One or more explanations of how representativeness was validated or verified (e.g., statistical tests, domain expert review). """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "restrictions" ; - skos:definition """Description of any restrictions or fees associated with external resources. + skos:broadMatch dcterms:accessRights ; + skos:definition """One or more descriptions of restrictions or fees associated with accessing these external resources (e.g., paywalls, registration requirements, API limits). """ ; skos:inScheme data_sheets_schema:composition . @@ -3393,32 +3404,34 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "source_data" ; - skos:definition """Description of the larger set from which the sample was drawn, if any. + skos:definition """One or more descriptions of the larger sets from which the sample was drawn, if applicable. """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "split_details" ; - skos:definition """Details on recommended data splits and their rationale. + skos:definition """Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how they are defined, and the rationale for the split strategy. """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "strategies" ; - skos:definition """Description of the sampling strategy (deterministic, probabilistic, etc.). + skos:definition """One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, systematic). """ ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "warnings" ; + skos:definition "One or more specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content present in the dataset (e.g., violence, profanity, explicit imagery, hate speech)." ; skos:inScheme data_sheets_schema:composition . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "why_missing" ; + skos:broadMatch dcterms:description ; skos:definition """Explanation of why each piece of data is missing. """ ; skos:inScheme data_sheets_schema:composition . @@ -3426,37 +3439,47 @@ data_sheets_schema:collection_type a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "why_not_representative" ; - skos:definition """Explanation of why the sample is not representative, if applicable. + skos:definition """One or more explanations of why the sample is not representative of the larger set, if applicable. """ ; skos:inScheme data_sheets_schema:composition . data_sheets_schema:confidential_elements a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "confidential_elements" ; + skos:definition "Confidential or restricted information within the dataset that requires access controls. List of Confidentiality objects describing what is confidential and why it cannot be released." ; + skos:inScheme . + +data_sheets_schema:consent_revocations a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "consent_revocations" ; + skos:definition "Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects from the Ethics module describing how revocation works and what happens to data after revocation." ; skos:inScheme . data_sheets_schema:content_warnings a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "content_warnings" ; + skos:definition "Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ContentWarning objects alerting users to sensitive content categories." ; skos:inScheme . data_sheets_schema:creators a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "creators" ; + skos:definition "Individuals or organizations who created the dataset. List of Creator objects describing authorship, roles, and affiliations of dataset creators." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "data_use_permission" ; - skos:definition "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO" ; + skos:definition "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO." ; skos:exactMatch ; skos:inScheme data_sheets_schema:data-governance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "license_terms" ; - skos:definition """Description of the dataset's license and terms of use (including links, costs, or usage constraints). -""" ; + skos:broadMatch dcterms:license, + dcterms:rights ; + skos:definition "Description of the dataset's license and terms of use, including links, costs, or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', 'CC BY-NC-SA 4.0', 'proprietary - contact data@example.org for access')." ; skos:inScheme data_sheets_schema:data-governance . a owl:ObjectProperty, @@ -3471,7 +3494,7 @@ data_sheets_schema:creators a owl:ObjectProperty, skos:broadMatch , , ; - skos:definition "Export or regulatory restrictions on the dataset." ; + skos:definition "One or more export controls or regulatory restrictions applicable to the dataset (e.g., HIPAA, ITAR, GDPR)." ; skos:inScheme data_sheets_schema:data-governance . a owl:ObjectProperty, @@ -3479,95 +3502,110 @@ data_sheets_schema:creators a owl:ObjectProperty, rdfs:label "restrictions" ; skos:broadMatch , ; - skos:definition "Explanation of third-party IP restrictions." ; + skos:definition "One or more explanations of third-party IP restrictions or associated fees." ; skos:inScheme data_sheets_schema:data-governance . data_sheets_schema:data_collectors a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "data_collectors" ; + skos:definition "Individuals or organizations responsible for collecting the data. List of DataCollector objects from the Collection module describing who performed data collection and their roles." ; skos:inScheme . data_sheets_schema:data_protection_impacts a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "data_protection_impacts" ; + skos:definition "Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact objects from the Ethics module documenting privacy risk assessments and mitigation measures." ; + skos:inScheme . + +data_sheets_schema:direct_collection a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "direct_collection" ; + skos:definition "Whether data was collected directly from individuals or via third parties. List of DirectCollection objects from the Collection module describing direct vs. indirect collection methods and sources." ; skos:inScheme . data_sheets_schema:discouraged_uses a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "discouraged_uses" ; + skos:definition "Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List of DiscouragedUse objects from the Uses module explaining why certain applications should be avoided." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "access_urls" ; - skos:definition "Details of the distribution channel(s) or format(s)." ; - skos:inScheme data_sheets_schema:distribution . + skos:definition "One or more URLs providing access to the distribution channel(s) or format(s)." ; + skos:inScheme data_sheets_schema:distribution ; + data_sheets_schema:docExample "https://example.org/dataset/download" . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "release_dates" ; - skos:definition """Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases. + skos:definition """One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., "2024-03-15") or as a descriptive string (e.g., "Q2 2024"). Use multiple values for staged or scheduled releases. """ ; skos:inScheme data_sheets_schema:distribution . data_sheets_schema:distribution_dates a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "distribution_dates" ; + skos:definition "Dates when the dataset was or will be distributed or released. List of DistributionDate objects from the Distribution module describing initial release dates, version release dates, and planned future releases." ; skos:inScheme . data_sheets_schema:distribution_formats a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "distribution_formats" ; + skos:definition "Formats in which the dataset is distributed or made available. List of DistributionFormat objects from the Distribution module describing file formats, compression, and access methods." ; skos:inScheme . data_sheets_schema:errata a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "errata" ; + skos:definition "Known errors or corrections to the dataset since publication. List of Erratum objects from the Maintenance module describing discovered errors, affected records, and correction procedures." ; skos:inScheme . data_sheets_schema:ethical_reviews a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "ethical_reviews" ; + skos:definition "Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the Ethics module describing IRB approvals, ethics committee reviews, and compliance certifications." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "consent_details" ; - skos:definition """Details on how consent was requested, provided, and documented. + skos:definition """Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, and documented, including the language individuals consented to. """ ; skos:inScheme data_sheets_schema:ethics . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "impact_details" ; - skos:definition """Details on data protection impact analysis, outcomes, and documentation. + skos:definition """Free-text description of the data protection impact analysis, including methodology, privacy risks identified, mitigation measures taken, and any regulatory findings. """ ; skos:inScheme data_sheets_schema:ethics . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "notification_details" ; - skos:definition """Details on how individuals were notified about data collection. + skos:definition """Free-text description of how individuals were notified about data collection, including the notification method (e.g., email, poster, in-person), timing, and the language or text of the notification itself if available. """ ; skos:inScheme data_sheets_schema:ethics . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "review_details" ; - skos:definition """Details on ethical review processes, outcomes, and supporting documentation. + skos:definition """Free-text description of the ethical review process, board decisions, outcomes, and any supporting documentation (e.g., IRB approval number, ethics committee name). """ ; skos:inScheme data_sheets_schema:ethics . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "revocation_details" ; - skos:definition """Details on consent revocation mechanisms and procedures. + skos:definition """Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out portal, written request), the scope of revocation (full withdrawal or specific uses), and what happens to their data after revocation. """ ; skos:inScheme data_sheets_schema:ethics . data_sheets_schema:existing_uses a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "existing_uses" ; + skos:definition "Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the Uses module describing research, commercial, or other applications of the dataset." ; skos:inScheme . data_sheets_schema:file_collections a owl:ObjectProperty, @@ -3580,11 +3618,13 @@ data_sheets_schema:file_collections a owl:ObjectProperty, data_sheets_schema:funders a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "funders" ; + skos:definition "Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing grants, contracts, or other funding sources including grantors and grant identifiers." ; skos:inScheme . data_sheets_schema:future_use_impacts a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "future_use_impacts" ; + skos:definition "Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects from the Uses module describing foreseeable consequences of using this dataset in new applications." ; skos:inScheme . a owl:ObjectProperty, @@ -3716,7 +3756,7 @@ data_sheets_schema:future_use_impacts a owl:ObjectProperty, data_sheets_schema:imputation_protocols a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "imputation_protocols" ; - skos:definition "Data imputation methodology and techniques." ; + skos:definition "Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from the Preprocessing module describing the imputation technique, affected variables, and rationale." ; skos:inScheme . data_sheets_schema:informed_consent a owl:ObjectProperty, @@ -3728,6 +3768,7 @@ data_sheets_schema:informed_consent a owl:ObjectProperty, data_sheets_schema:instances a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "instances" ; + skos:definition "Individual data instances or records in the dataset. List of Instance objects from the Composition module describing what each data point represents, its type, and associated label information." ; skos:inScheme . data_sheets_schema:intended_uses a owl:ObjectProperty, @@ -3739,6 +3780,7 @@ data_sheets_schema:intended_uses a owl:ObjectProperty, data_sheets_schema:keywords a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "keywords" ; + skos:definition "Keywords or tags describing the resource for discovery and classification." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:known_biases a owl:ObjectProperty, @@ -3756,6 +3798,7 @@ data_sheets_schema:known_limitations a owl:ObjectProperty, data_sheets_schema:labeling_strategies a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "labeling_strategies" ; + skos:definition "Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the Preprocessing module describing annotation procedures, annotator qualifications, and quality controls." ; skos:inScheme . data_sheets_schema:machine_annotation_tools a owl:ObjectProperty, @@ -3767,47 +3810,48 @@ data_sheets_schema:machine_annotation_tools a owl:ObjectProperty, data_sheets_schema:maintainers a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "maintainers" ; + skos:definition "Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects from the Maintenance module describing maintenance contacts, roles, and support channels." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "erratum_details" ; - skos:definition """Details on any errata or corrections to the dataset. + skos:definition """Free-text description of the error, its scope, the affected data or records, and the correction applied. """ ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "extension_details" ; - skos:definition """Details on extension mechanisms, contribution validation, and communication. + skos:definition """Free-text description of how third parties can contribute to the dataset, how contributions are validated (e.g., peer review, automated tests), and how accepted contributions will be communicated to the community. """ ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "maintainer_details" ; - skos:definition """Details on who will support, host, or maintain the dataset. + skos:definition """Free-text description of the organization, team, or individual responsible for maintaining the dataset, including contact information and hosting arrangements. """ ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "retention_details" ; - skos:definition """Details on data retention limits and enforcement procedures. + skos:definition """Free-text description of applicable retention limits, legal or ethical basis for those limits, and how they will be enforced (e.g., automated deletion, anonymization after the retention period). """ ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "update_details" ; - skos:definition """Details on update plans, responsible parties, and communication methods. + skos:definition """Free-text description of planned update types (e.g., corrections, additions, deletions), responsible parties, and how updates will be communicated to users. """ ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "version_details" ; - skos:definition """Details on version support policies and obsolescence communication. + skos:definition """Free-text description of version support policies, how long older versions will be hosted, and how dataset consumers will be notified when versions become obsolete. """ ; skos:inScheme data_sheets_schema:maintenance . @@ -3826,13 +3870,14 @@ data_sheets_schema:missing_data_documentation a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "affiliations" ; + skos:broadMatch schema1:affiliation ; skos:definition "Organizations with which the creator or team is affiliated." ; skos:inScheme data_sheets_schema:motivation . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "credit_roles" ; - skos:definition "Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team. Specifies the specific contributions made to this dataset (e.g., Conceptualization, Data Curation, Methodology). Note: roles are specified here rather than on Person directly, since the same person may have different roles across different datasets." ; + skos:definition "One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or creator team (e.g., Conceptualization, Data Curation, Methodology)." ; skos:inScheme data_sheets_schema:motivation . a owl:ObjectProperty, @@ -3844,6 +3889,7 @@ data_sheets_schema:missing_data_documentation a owl:ObjectProperty, data_sheets_schema:other_tasks a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "other_tasks" ; + skos:definition "Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from the Uses module describing potential applications not originally planned by the dataset creators." ; skos:inScheme . data_sheets_schema:parent_datasets a owl:ObjectProperty, @@ -3875,17 +3921,23 @@ data_sheets_schema:participant_privacy a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "annotator_demographics" ; - skos:definition "Demographic information about annotators, if available and relevant (e.g., geographic location, language background, expertise level)." ; + skos:definition "One or more demographic characteristics of the annotators, if available and relevant (e.g., geographic location, language background, expertise level, native language)." ; skos:exactMatch ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "cleaning_details" ; - skos:definition """Details on data cleaning procedures applied. + skos:definition """Free-text description of data cleaning procedures applied, including criteria for removing or correcting instances, tools used, and how removed instances are accounted for. """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . + a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "data_annotation_platform" ; + skos:definition "One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool)." ; + skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . + a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "data_annotation_protocol" ; @@ -3924,28 +3976,28 @@ data_sheets_schema:participant_privacy a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "labeling_details" ; - skos:definition """Details on labeling/annotation procedures and quality metrics. + skos:definition """Free-text description of the labeling or annotation procedures, including annotation guidelines, task definitions, and quality control metrics. """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "preprocessing_details" ; - skos:definition """Details on preprocessing steps applied to the data. + skos:definition """Free-text description of preprocessing steps applied to the data, including tools used, parameters, order of operations, and rationale for each step. """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "raw_data_details" ; - skos:definition """Details on raw data availability and access procedures. + skos:definition """Free-text description of raw data availability, access procedures, and any conditions or restrictions on accessing the raw data. """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "tool_accuracy" ; - skos:definition """Known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., "spaCy F1: 0.95", "GPT-4 Accuracy: 92%"). + skos:definition """One or more known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., "spaCy F1: 0.95", "GPT-4 Accuracy: 92%"). """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . @@ -3959,6 +4011,7 @@ data_sheets_schema:participant_privacy a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "tools" ; + skos:broadMatch schema1:name ; skos:definition """List of automated annotation tools with their versions. Format each entry as "ToolName version" (e.g., "spaCy 3.5.0", "NLTK 3.8", "GPT-4 turbo"). Use "unknown" for version if not available (e.g., "Custom NER Model unknown"). """ ; skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . @@ -3966,6 +4019,7 @@ data_sheets_schema:participant_privacy a owl:ObjectProperty, data_sheets_schema:preprocessing_strategies a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "preprocessing_strategies" ; + skos:definition "Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preprocessing module describing normalization, transformation, and other preparation steps." ; skos:inScheme . data_sheets_schema:prohibited_uses a owl:ObjectProperty, @@ -3977,17 +4031,19 @@ data_sheets_schema:prohibited_uses a owl:ObjectProperty, data_sheets_schema:purposes a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "purposes" ; + skos:definition "Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each describing a specific creation goal or intended application." ; skos:inScheme . data_sheets_schema:raw_data_sources a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "raw_data_sources" ; - skos:definition "Description of raw data sources before preprocessing." ; + skos:definition "List of raw data sources before preprocessing. Each RawDataSource object describes where the original data came from and how it can be accessed." ; skos:inScheme . data_sheets_schema:raw_sources a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "raw_sources" ; + skos:definition "Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the Preprocessing module describing original data sources and their formats." ; skos:inScheme . data_sheets_schema:related_datasets a owl:ObjectProperty, @@ -3996,35 +4052,58 @@ data_sheets_schema:related_datasets a owl:ObjectProperty, skos:definition "Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use DatasetRelationship class to specify relationship types." ; skos:inScheme . +data_sheets_schema:relationships a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "relationships" ; + skos:definition "Explicit relationships between individual instances in the dataset. List of Relationships objects from the Composition module describing how instances relate (e.g., graph edges, ratings, social network links)." ; + skos:inScheme . + data_sheets_schema:sampling_strategies a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "sampling_strategies" ; + skos:definition "Strategies used to select data instances from a larger population. List of SamplingStrategy objects from the Collection module describing sampling methodology, inclusion criteria, and limitations." ; skos:inScheme . data_sheets_schema:sensitive_elements a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "sensitive_elements" ; + skos:definition "Sensitive data elements requiring special handling or access controls. List of SensitiveElement objects identifying sensitive attributes such as personal identifiers, protected health information, or legally sensitive content." ; + skos:inScheme . + +data_sheets_schema:splits a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "splits" ; + skos:definition "Recommended data splits for this dataset. List of Splits objects from the Composition module describing train/validation/test partitions and the rationale for each split strategy." ; skos:inScheme . data_sheets_schema:subpopulations a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "subpopulations" ; + skos:definition "Subpopulations represented within the dataset. List of Subpopulation objects from the Composition module describing demographic or other groups, their representation, and any imbalances." ; skos:inScheme . data_sheets_schema:subsets a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "subsets" ; - skos:exactMatch schema1:distribution ; + skos:definition "Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each representing a logical partition such as training, validation, or test splits, or demographic subgroups." ; skos:inScheme . data_sheets_schema:tasks a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "tasks" ; + skos:definition "Tasks the dataset is intended to support. List of Task objects from the Motivation module describing specific machine learning, research, or analytical tasks." ; + skos:inScheme . + +data_sheets_schema:third_party_sharing a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "third_party_sharing" ; + skos:definition "Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distribution module describing whether and how the dataset is shared with entities outside the creating organization." ; skos:inScheme . data_sheets_schema:use_repository a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "use_repository" ; + skos:definition "Repositories or registries tracking how the dataset has been used. List of UseRepository objects from the Uses module pointing to papers with code, citation indices, or other use-tracking resources." ; skos:inScheme . data_sheets_schema:used_software a owl:ObjectProperty, @@ -4036,47 +4115,47 @@ data_sheets_schema:used_software a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "discouragement_details" ; - skos:definition """Details on tasks for which the dataset should not be used. + skos:definition """Free-text description of tasks or applications for which the dataset is not recommended, with explanation of why (e.g., out-of-scope, risk of harm, poor coverage). """ ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "impact_details" ; - skos:definition """Details on potential impacts, risks, and mitigation strategies. + skos:definition """Free-text description of potential future impacts or risks arising from the dataset's composition or collection (e.g., unfair treatment, privacy violations, legal or financial risks), and any recommended mitigation strategies. """ ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "prohibition_reason" ; - skos:definition "Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint)." ; + skos:definition "One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint)." ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "repository_details" ; - skos:definition """Details on the repository of known dataset uses. + skos:definition """Free-text description of the repository of known dataset uses, including how it is maintained and how to contribute new use cases. """ ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "task_details" ; - skos:definition """Details on other potential tasks the dataset could be used for. + skos:definition """Free-text description of other potential tasks the dataset could support, including any prerequisites or limitations for those uses. """ ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "use_category" ; - skos:definition "Category of intended use (e.g., research, clinical, educational, commercial, policy)." ; + skos:definition "One or more categories of intended use (e.g., research, clinical, educational, commercial, policy)." ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "categories" ; - skos:definition "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning." ; + skos:definition "One or more permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning." ; skos:inScheme data_sheets_schema:variables . a owl:ObjectProperty, @@ -4094,6 +4173,7 @@ data_sheets_schema:used_software a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "quality_notes" ; + skos:broadMatch dcterms:description ; skos:definition "Notes about data quality, reliability, or known issues specific to this variable." ; skos:inScheme data_sheets_schema:variables . @@ -4157,26 +4237,26 @@ data_sheets_schema:citation a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "end_date" ; - skos:definition "End date of data collection" ; + skos:definition "End date of data collection." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "handling_strategy" ; - skos:definition """Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation). + skos:definition """The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple imputation, flagging with sentinel values). """ ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "is_direct" ; - skos:definition "Whether collection was direct from individuals" ; + skos:definition "Whether collection was direct from individuals." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "role" ; - skos:definition "Role of the data collector (e.g., researcher, crowdworker)" ; + skos:definition "Role of the data collector (e.g., researcher, crowdworker)." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, @@ -4189,38 +4269,46 @@ data_sheets_schema:citation a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "start_date" ; - skos:definition "Start date of data collection" ; + skos:definition "Start date of data collection." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "was_directly_observed" ; - skos:definition "Whether the data was directly observed" ; + skos:definition "True if the data was directly observed by a researcher or instrument; false if it was obtained through other means (e.g., reported, inferred)." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "was_inferred_derived" ; - skos:definition "Whether the data was inferred or derived from other data" ; + skos:definition "True if the data was computationally inferred or derived from other data (e.g., model outputs, imputed values); false otherwise." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "was_reported_by_subjects" ; - skos:definition "Whether the data was reported directly by the subjects themselves" ; + skos:definition "True if the data was self-reported directly by the subjects themselves (e.g., survey responses, questionnaires); false otherwise." ; skos:inScheme data_sheets_schema:collection . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "was_validated_verified" ; - skos:definition "Whether the data was validated or verified in any way" ; + skos:definition "True if the data underwent a validation or verification process (e.g., expert review, cross-checking with ground truth); false otherwise." ; skos:inScheme data_sheets_schema:collection . data_sheets_schema:comment_prefix a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "comment_prefix" ; + skos:definition "Character(s) used to indicate comment lines (e.g., \"#\" for CSV comments)." ; skos:inScheme data_sheets_schema:base . + a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "archival" ; + skos:definition """Indicates whether official archival versions of external resources are included in the dataset. +""" ; + skos:inScheme data_sheets_schema:composition . + a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "bias_description" ; @@ -4252,7 +4340,8 @@ data_sheets_schema:comment_prefix a owl:ObjectProperty, rdfs:label "counts" ; skos:definition """How many instances are there in total (of each type, if appropriate)? """ ; - skos:inScheme data_sheets_schema:composition . + skos:inScheme data_sheets_schema:composition ; + data_sheets_schema:docExample "42000 (42,000 patient records)" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4283,28 +4372,49 @@ data_sheets_schema:comment_prefix a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "instance_type" ; - skos:definition """Multiple types of instances? (e.g., movies, users, and ratings). + skos:broadMatch dcterms:type ; + skos:definition """The type or types of instances in the dataset (e.g., "movie", "user", "rating", "clinical record"). Use when the dataset contains multiple instance types with different structures. """ ; skos:inScheme data_sheets_schema:composition . - a owl:ObjectProperty, + a owl:ObjectProperty, linkml:SlotDefinition ; - rdfs:label "label" ; - skos:definition """Is there a label or target associated with each instance? -""" ; + rdfs:label "is_random" ; + skos:definition "Indicates whether the sample is random." ; skos:inScheme data_sheets_schema:composition . - a owl:ObjectProperty, + a owl:ObjectProperty, linkml:SlotDefinition ; - rdfs:label "label_description" ; - skos:definition """If labeled, what pattern or format do labels follow? + rdfs:label "is_representative" ; + skos:definition """Indicates whether the sample is representative of the larger set. """ ; skos:inScheme data_sheets_schema:composition . - a owl:ObjectProperty, + a owl:ObjectProperty, linkml:SlotDefinition ; - rdfs:label "limitation_description" ; - skos:definition """Detailed description of the limitation and its implications. + rdfs:label "is_sample" ; + skos:definition "Indicates whether it is a sample of a larger set." ; + skos:inScheme data_sheets_schema:composition . + + a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "label" ; + skos:definition """Is there a label or target associated with each instance? +""" ; + skos:inScheme data_sheets_schema:composition . + + a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "label_description" ; + skos:broadMatch schema1:description ; + skos:definition """If labeled, what pattern or format do labels follow? +""" ; + skos:inScheme data_sheets_schema:composition . + + a owl:ObjectProperty, + linkml:SlotDefinition ; + rdfs:label "limitation_description" ; + skos:definition """Detailed description of the limitation and its implications. """ ; skos:inScheme data_sheets_schema:composition . @@ -4369,27 +4479,34 @@ data_sheets_schema:comment_prefix a owl:ObjectProperty, data_sheets_schema:conforms_to a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "conforms_to" ; + skos:definition "An established standard, specification, or schema to which the resource conforms." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:conforms_to_class a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "conforms_to_class" ; + skos:broadMatch dcterms:conformsTo ; + skos:definition "The specific class or type within a schema to which the resource conforms." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:conforms_to_schema a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "conforms_to_schema" ; + skos:broadMatch dcterms:conformsTo ; + skos:definition "The schema or data model to which the resource conforms." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:created_by a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "created_by" ; + skos:definition "The person or organization primarily responsible for creating the resource." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:created_on a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "created_on" ; rdfs:range linkml:Datetime ; + skos:definition "The date and time when the resource was created." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, @@ -4401,15 +4518,15 @@ data_sheets_schema:created_on a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "contact_person" ; + skos:broadMatch schema1:contactPoint ; skos:definition "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions." ; - skos:exactMatch schema1:contactPoint ; skos:inScheme data_sheets_schema:data-governance . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "governance_committee_contact" ; + skos:broadMatch schema1:contactPoint ; skos:definition "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms." ; - skos:exactMatch schema1:contactPoint ; skos:inScheme data_sheets_schema:data-governance . a owl:ObjectProperty, @@ -4421,6 +4538,7 @@ data_sheets_schema:created_on a owl:ObjectProperty, data_sheets_schema:delimiter a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "delimiter" ; + skos:definition "Field delimiter character (e.g., \",\" for CSV, \"\\t\" for TSV)." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:dialect a owl:ObjectProperty, @@ -4442,12 +4560,15 @@ data_sheets_schema:doi a owl:ObjectProperty, rdfs:range [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ( [ xsd:pattern "10\\.\\d{4,}\\/.+" ] ) ] ; - skos:definition "digital object identifier" ; + skos:broadMatch dcterms:identifier ; + skos:definition "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567')." ; + skos:exactMatch schema1:identifier ; skos:inScheme data_sheets_schema:base . data_sheets_schema:double_quote a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "double_quote" ; + skos:definition "Whether quotes within quoted fields are escaped by doubling them. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:download_url a owl:ObjectProperty, @@ -4468,14 +4589,14 @@ data_sheets_schema:encoding a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "encoding" ; rdfs:range data_sheets_schema:EncodingEnum ; - skos:definition "the character encoding of the data" ; + skos:definition "The character encoding of the data." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "contact_person" ; + skos:broadMatch schema1:contactPoint ; skos:definition "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID." ; - skos:exactMatch schema1:contactPoint ; skos:inScheme data_sheets_schema:ethics . a owl:ObjectProperty, @@ -4488,13 +4609,15 @@ data_sheets_schema:encoding a owl:ObjectProperty, data_sheets_schema:extension_mechanism a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "extension_mechanism" ; + skos:definition "Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintenance module describing how others can propose additions, corrections, or expansions." ; skos:inScheme . data_sheets_schema:file_count a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "file_count" ; skos:definition "Number of files in this collection." ; - skos:inScheme data_sheets_schema:file-collection . + skos:inScheme data_sheets_schema:file-collection ; + data_sheets_schema:docExample "47" . data_sheets_schema:file_type a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4512,12 +4635,14 @@ data_sheets_schema:format a owl:ObjectProperty, data_sheets_schema:hash a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "hash" ; - skos:definition "hash of the data" ; + skos:broadMatch dcterms:identifier ; + skos:definition "Cryptographic hash value of the data for integrity verification (e.g., SHA-256: 'e3b0c44298fc1c149afb...', MD5: 'd41d8cd98f00b204e9800998ecf8427e')." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:header a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "header" ; + skos:definition "Whether the first row of the file contains column headers. Expected values: \"true\" or \"false\" (as strings per CSV dialect specification). Follows the W3C CSV-on-the-Web dialect specification." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, @@ -4554,6 +4679,7 @@ data_sheets_schema:human_subject_research a owl:ObjectProperty, data_sheets_schema:ip_restrictions a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "ip_restrictions" ; + skos:definition "Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the Data Governance module describing copyright, trademark, or other IP considerations." ; skos:inScheme . data_sheets_schema:is_data_split a owl:ObjectProperty, @@ -4565,6 +4691,7 @@ data_sheets_schema:is_data_split a owl:ObjectProperty, data_sheets_schema:is_deidentified a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "is_deidentified" ; + skos:definition "De-identification status and procedures applied to the dataset. Deidentification object describing whether the dataset contains personal data, what de-identification methods were applied, and any residual re-identification risks." ; skos:inScheme . data_sheets_schema:is_subpopulation a owl:ObjectProperty, @@ -4576,18 +4703,20 @@ data_sheets_schema:is_subpopulation a owl:ObjectProperty, data_sheets_schema:is_tabular a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "is_tabular" ; + skos:definition "Whether the dataset is in tabular format (rows and columns). True if the data is structured as a table (e.g., CSV, TSV, relational database); false for unstructured formats such as images or free text." ; skos:inScheme . data_sheets_schema:issued a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "issued" ; rdfs:range linkml:Datetime ; + skos:definition "Date of formal issuance or publication of the resource." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:language a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "language" ; - skos:definition "language in which the information is expressed" ; + skos:definition "Language in which the information is expressed." ; skos:exactMatch schema1:inLanguage ; skos:inScheme data_sheets_schema:base . @@ -4595,24 +4724,28 @@ data_sheets_schema:last_updated_on a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "last_updated_on" ; rdfs:range linkml:Datetime ; + skos:definition "The date and time when the resource was most recently modified or updated." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:license_and_use_terms a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "license_and_use_terms" ; + skos:definition "License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Governance module describing the applicable license, permitted uses, and any restrictions." ; skos:inScheme . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "contribution_url" ; skos:definition "URL for contribution guidelines or process." ; - skos:inScheme data_sheets_schema:maintenance . + skos:inScheme data_sheets_schema:maintenance ; + data_sheets_schema:docExample "https://example.org/dataset/contributing" . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "erratum_url" ; skos:definition "URL or access point for the erratum." ; - skos:inScheme data_sheets_schema:maintenance . + skos:inScheme data_sheets_schema:maintenance ; + data_sheets_schema:docExample "https://example.org/dataset/errata/2024-01-15" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4623,7 +4756,7 @@ data_sheets_schema:license_and_use_terms a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "latest_version_doi" ; - skos:definition "DOI or URL of the latest dataset version." ; + skos:definition "DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567')." ; skos:inScheme data_sheets_schema:maintenance . a owl:ObjectProperty, @@ -4642,7 +4775,8 @@ data_sheets_schema:license_and_use_terms a owl:ObjectProperty, data_sheets_schema:md5 a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "md5" ; - skos:definition "md5 hash of the data" ; + skos:broadMatch dcterms:identifier ; + skos:definition "MD5 hash value of the data (128-bit cryptographic hash)." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:media_type a owl:ObjectProperty, @@ -4656,11 +4790,13 @@ data_sheets_schema:media_type a owl:ObjectProperty, data_sheets_schema:modified_by a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "modified_by" ; + skos:definition "A person or organization that contributed to modifying or updating the resource." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "grant_number" ; + skos:broadMatch schema1:identifier ; skos:definition "The alphanumeric identifier for the grant." ; skos:inScheme data_sheets_schema:motivation . @@ -4673,27 +4809,31 @@ data_sheets_schema:modified_by a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "principal_investigator" ; + skos:broadMatch dcterms:creator, + schema1:creator ; skos:definition "A key individual (Principal Investigator) responsible for or overseeing dataset creation." ; - skos:exactMatch schema1:creator ; skos:inScheme data_sheets_schema:motivation . data_sheets_schema:orcid a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "orcid" ; + skos:broadMatch schema1:identifier ; skos:definition "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification." ; - skos:exactMatch schema1:identifier ; - skos:inScheme data_sheets_schema:base . + skos:inScheme data_sheets_schema:base ; + data_sheets_schema:docExample "0000-0001-2345-6789" . data_sheets_schema:page a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "page" ; + skos:definition "A landing page or web page providing access to or information about the resource." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "access_url" ; skos:definition "URL or access point for the raw data." ; - skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . + skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling ; + data_sheets_schema:docExample "https://example.org/dataset/raw/raw-data.zip" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4714,14 +4854,8 @@ data_sheets_schema:page a owl:ObjectProperty, rdfs:label "annotations_per_item" ; skos:definition "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement." ; skos:exactMatch ; - skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . - - a owl:ObjectProperty, - linkml:SlotDefinition ; - rdfs:label "data_annotation_platform" ; - skos:definition "Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool)." ; - skos:exactMatch ; - skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling . + skos:inScheme data_sheets_schema:preprocessing-cleaning-labeling ; + data_sheets_schema:docExample "3 (three independent annotators per item)" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4747,78 +4881,90 @@ data_sheets_schema:publisher a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "publisher" ; rdfs:range linkml:Uriorcurie ; - skos:inScheme data_sheets_schema:base . + skos:definition "The organization or entity responsible for making the resource available." ; + skos:inScheme data_sheets_schema:base ; + data_sheets_schema:docExample "ror:04t3en479 # use a ROR ID, DOI, or URL — not a plain name" . data_sheets_schema:quote_char a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "quote_char" ; + skos:definition "Character used for quoting fields (e.g., '\"' for CSV)." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:regulatory_restrictions a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "regulatory_restrictions" ; + skos:definition "Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestrictions object from the Data Governance module describing compliance requirements such as ITAR, EAR, or GDPR." ; skos:inScheme . data_sheets_schema:retention_limit a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "retention_limit" ; + skos:definition "Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance module describing how long the dataset will be available and any deletion schedules." ; skos:inScheme . data_sheets_schema:sha256 a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "sha256" ; - skos:definition "sha256 hash of the data" ; + skos:definition "SHA-256 hash value of the data (256-bit cryptographic hash, recommended)." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:status a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "status" ; + skos:definition "The status of the resource (e.g., draft, published, deprecated)." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:title a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "title" ; - skos:definition "the official title of the element" ; + skos:definition "The official title of the element." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:total_bytes a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "total_bytes" ; - skos:definition "Total size of all files in bytes." ; - skos:inScheme data_sheets_schema:file-collection . + skos:definition "Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize." ; + skos:inScheme data_sheets_schema:file-collection ; + data_sheets_schema:docExample "1073741824 (1 GiB = 1024³ bytes)" . data_sheets_schema:total_file_count a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "total_file_count" ; skos:definition "Total number of files across all file collections in this dataset. Can be aggregated from file_collections[].file_count." ; - skos:inScheme . + skos:inScheme ; + data_sheets_schema:docExample "156" . data_sheets_schema:total_size_bytes a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "total_size_bytes" ; skos:definition "Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes." ; - skos:inScheme . + skos:inScheme ; + data_sheets_schema:docExample "10737418240 (10 GiB = 10 × 1024³ bytes)" . data_sheets_schema:updates a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "updates" ; + skos:definition "Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module describing update frequency, versioning policy, and planned enhancements." ; skos:inScheme . data_sheets_schema:url a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "url" ; + skos:definition "URL where the software can be found (e.g., homepage, repository, or documentation)." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "repository_url" ; skos:definition "URL to a repository of known dataset uses." ; - skos:inScheme data_sheets_schema:uses . + skos:inScheme data_sheets_schema:uses ; + data_sheets_schema:docExample "https://example.org/dataset/known-uses" . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "usage_notes" ; - skos:definition "Notes or caveats about using the dataset for intended purposes." ; + skos:definition "A note or caveat about using the dataset for its intended purposes." ; skos:inScheme data_sheets_schema:uses . a owl:ObjectProperty, @@ -4850,7 +4996,8 @@ data_sheets_schema:url a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "maximum_value" ; skos:definition "The maximum value that the variable can take. Applicable to numeric variables." ; - skos:inScheme data_sheets_schema:variables . + skos:inScheme data_sheets_schema:variables ; + data_sheets_schema:docExample "100.0" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4862,13 +5009,15 @@ data_sheets_schema:url a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "minimum_value" ; skos:definition "The minimum value that the variable can take. Applicable to numeric variables." ; - skos:inScheme data_sheets_schema:variables . + skos:inScheme data_sheets_schema:variables ; + data_sheets_schema:docExample "0.0" . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "precision" ; skos:definition "The precision or number of decimal places for numeric variables." ; - skos:inScheme data_sheets_schema:variables . + skos:inScheme data_sheets_schema:variables ; + data_sheets_schema:docExample "2 (two decimal places, e.g., 3.14)" . a owl:ObjectProperty, linkml:SlotDefinition ; @@ -4881,18 +5030,21 @@ data_sheets_schema:url a owl:ObjectProperty, a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "variable_name" ; + skos:broadMatch schema1:identifier, + schema1:name ; skos:definition "The name or identifier of the variable as it appears in the data files." ; - skos:exactMatch schema1:name ; skos:inScheme data_sheets_schema:variables . data_sheets_schema:version_access a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "version_access" ; + skos:definition "Information about access to different versions of the dataset. VersionAccess object from the Maintenance module describing where older versions can be found and how version history is maintained." ; skos:inScheme . data_sheets_schema:was_derived_from a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "was_derived_from" ; + skos:definition "A resource from which this resource was derived, in whole or in part." ; skos:exactMatch dcterms:source ; skos:inScheme data_sheets_schema:base . @@ -4925,183 +5077,183 @@ data_sheets_schema:Information a owl:Class, rdfs:label "Information" ; rdfs:subClassOf [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:compression ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:language ], + owl:onProperty data_sheets_schema:created_by ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:conforms_to_schema ], + owl:onProperty data_sheets_schema:version ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:created_by ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:doi ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Datetime ; - owl:onProperty data_sheets_schema:last_updated_on ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:modified_by ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:keywords ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:title ], [ a owl:Restriction ; - owl:minCardinality 0 ; + owl:allValuesFrom linkml:Datetime ; owl:onProperty data_sheets_schema:issued ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:created_on ], + owl:onProperty data_sheets_schema:last_updated_on ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Uriorcurie ; + owl:onProperty data_sheets_schema:publisher ], + [ a owl:Restriction ; + owl:allValuesFrom [ a rdfs:Datatype ; + owl:onDatatype xsd:string ; + owl:withRestrictions ( [ xsd:pattern "10\\.\\d{4,}\\/.+" ] ) ] ; + owl:onProperty data_sheets_schema:doi ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:language ], + owl:onProperty data_sheets_schema:status ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:doi ], + owl:onProperty data_sheets_schema:publisher ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Datetime ; + owl:onProperty data_sheets_schema:last_updated_on ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:title ], + owl:onProperty data_sheets_schema:was_derived_from ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Uri ; + owl:onProperty data_sheets_schema:download_url ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:conforms_to_class ], + owl:onProperty data_sheets_schema:keywords ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:status ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:last_updated_on ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:conforms_to_schema ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:conforms_to ], + owl:allValuesFrom data_sheets_schema:CompressionEnum ; + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:conforms_to ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:download_url ], + owl:onProperty data_sheets_schema:title ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:doi ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:modified_by ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:license ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:issued ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:page ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:was_derived_from ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Datetime ; - owl:onProperty data_sheets_schema:created_on ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:license ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:version ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:publisher ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:title ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:compression ], + owl:onProperty data_sheets_schema:language ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:conforms_to_schema ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:page ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:created_on ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:conforms_to_class ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Uri ; - owl:onProperty data_sheets_schema:download_url ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:conforms_to ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Uriorcurie ; - owl:onProperty data_sheets_schema:publisher ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:download_url ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:created_by ], + owl:onProperty data_sheets_schema:status ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:compression ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:title ], + owl:onProperty data_sheets_schema:last_updated_on ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:license ], - [ a owl:Restriction ; - owl:allValuesFrom [ a rdfs:Datatype ; - owl:onDatatype xsd:string ; - owl:withRestrictions ( [ xsd:pattern "10\\.\\d{4,}\\/.+" ] ) ] ; - owl:onProperty data_sheets_schema:doi ], + owl:onProperty data_sheets_schema:language ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:was_derived_from ], + owl:onProperty data_sheets_schema:modified_by ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:version ], + owl:onProperty data_sheets_schema:license ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:license ], + owl:onProperty data_sheets_schema:download_url ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:publisher ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:page ], + owl:onProperty data_sheets_schema:conforms_to_schema ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:created_on ], + owl:onProperty data_sheets_schema:page ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:modified_by ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:conforms_to_schema ], + owl:onProperty data_sheets_schema:license ], [ a owl:Restriction ; owl:allValuesFrom linkml:Datetime ; - owl:onProperty data_sheets_schema:issued ], + owl:onProperty data_sheets_schema:created_on ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:publisher ], + owl:onProperty data_sheets_schema:created_on ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:issued ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:conforms_to_class ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:was_derived_from ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:page ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:status ], + owl:onProperty data_sheets_schema:was_derived_from ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:modified_by ], + owl:onProperty data_sheets_schema:conforms_to_class ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:keywords ], - [ a owl:Restriction ; - owl:minCardinality 0 ; owl:onProperty data_sheets_schema:created_by ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:version ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:download_url ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:doi ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:language ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:was_derived_from ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:status ], + owl:onProperty data_sheets_schema:created_by ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:conforms_to_class ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:conforms_to ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:CompressionEnum ; - owl:onProperty data_sheets_schema:compression ], + owl:onProperty data_sheets_schema:issued ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:page ], + owl:onProperty data_sheets_schema:version ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:last_updated_on ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:conforms_to ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:modified_by ], + owl:onProperty data_sheets_schema:status ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:conforms_to_schema ], + owl:onProperty data_sheets_schema:keywords ], data_sheets_schema:NamedThing ; skos:closeMatch schema1:CreativeWork ; skos:definition "Grouping for datasets and data files" ; @@ -5121,30 +5273,30 @@ data_sheets_schema:Person a owl:Class, rdfs:subClassOf [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:orcid ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:affiliation ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:email ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:email ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:orcid ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:email ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Organization ; - owl:onProperty data_sheets_schema:affiliation ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:orcid ], [ a owl:Restriction ; owl:allValuesFrom [ a rdfs:Datatype ; owl:intersectionOf ( linkml:String [ a rdfs:Datatype ; owl:onDatatype xsd:string ; owl:withRestrictions ( [ xsd:pattern "^\\d{4}-\\d{4}-\\d{4}-\\d{3}[0-9X]$" ] ) ] ) ] ; owl:onProperty data_sheets_schema:orcid ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:email ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:Organization ; + owl:onProperty data_sheets_schema:affiliation ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:affiliation ], data_sheets_schema:NamedThing ; skos:definition "An individual human being. This class represents a person in the context of a specific dataset. Attributes like affiliation and email represent the person's current or most relevant contact information for this dataset. For stable cross-dataset identification, use the ORCID field. Note that contributor roles (CRediT) are specified in the usage context (e.g., Creator class) rather than on the Person directly, since roles vary by dataset." ; skos:exactMatch schema1:Person ; @@ -5161,31 +5313,31 @@ data_sheets_schema:NamedThing a owl:Class, linkml:ClassDefinition ; rdfs:label "NamedThing" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom linkml:Uriorcurie ; - owl:onProperty data_sheets_schema:id ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:id ], - [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:description ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:name ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:description ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:name ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:name ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:description ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:Uriorcurie ; + owl:onProperty data_sheets_schema:id ], + [ a owl:Restriction ; + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:name ], [ a owl:Restriction ; owl:minCardinality 1 ; + owl:onProperty data_sheets_schema:id ], + [ a owl:Restriction ; + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:name ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:id ] ; skos:definition "A generic grouping for any identifiable entity." ; skos:exactMatch schema1:Thing ; @@ -5202,431 +5354,473 @@ data_sheets_schema:Dataset a owl:Class, linkml:ClassDefinition ; rdfs:label "Dataset" ; rdfs:subClassOf [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:external_resources ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:at_risk_populations ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:informed_consent ], + owl:onProperty data_sheets_schema:consent_revocations ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:cleaning_strategies ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:instances ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:is_tabular ], + owl:onProperty data_sheets_schema:is_deidentified ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:content_warnings ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:anomalies ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:file_collections ], + owl:onProperty data_sheets_schema:external_resources ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:future_use_impacts ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:external_resources ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:is_deidentified ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:known_limitations ], + owl:onProperty data_sheets_schema:total_size_bytes ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:is_tabular ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:imputation_protocols ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:license_and_use_terms ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:distribution_formats ], + owl:onProperty data_sheets_schema:confidential_elements ], [ a owl:Restriction ; owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:updates ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:Dataset ; owl:onProperty data_sheets_schema:parent_datasets ], [ a owl:Restriction ; - owl:allValuesFrom ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:extension_mechanism ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty data_sheets_schema:anomalies ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:ethical_reviews ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:retention_limit ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:known_biases ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:funders ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:errata ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:labeling_strategies ], [ a owl:Restriction ; - owl:allValuesFrom ; + owl:minCardinality 0 ; owl:onProperty data_sheets_schema:known_limitations ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:total_file_count ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:distribution_formats ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:version_access ], + owl:onProperty data_sheets_schema:missing_data_documentation ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:existing_uses ], + owl:onProperty data_sheets_schema:maintainers ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:citation ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:total_size_bytes ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:future_use_impacts ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:annotation_analyses ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:tasks ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:regulatory_restrictions ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:ethical_reviews ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:human_subject_research ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:sensitive_elements ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:human_subject_research ], - [ a owl:Restriction ; - owl:allValuesFrom ; owl:onProperty data_sheets_schema:ip_restrictions ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:intended_uses ], - [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:anomalies ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:data_collectors ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:addressing_gaps ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:other_tasks ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:labeling_strategies ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:variables ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:version_access ], + owl:onProperty data_sheets_schema:sampling_strategies ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:regulatory_restrictions ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:relationships ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:cleaning_strategies ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:total_file_count ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:known_biases ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:collection_notifications ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:collection_mechanisms ], + owl:onProperty data_sheets_schema:existing_uses ], [ a owl:Restriction ; owl:allValuesFrom ; owl:onProperty data_sheets_schema:data_protection_impacts ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:confidential_elements ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:license_and_use_terms ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:license_and_use_terms ], + owl:onProperty data_sheets_schema:collection_timeframes ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:retention_limit ], + owl:onProperty data_sheets_schema:subsets ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:participant_privacy ], + owl:onProperty data_sheets_schema:participant_compensation ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:related_datasets ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:third_party_sharing ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:maintainers ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:use_repository ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:annotation_analyses ], + owl:allValuesFrom data_sheets_schema:Dataset ; + owl:onProperty data_sheets_schema:resources ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:use_repository ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:creators ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:tasks ], + owl:onProperty data_sheets_schema:addressing_gaps ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:variables ], + owl:onProperty data_sheets_schema:future_use_impacts ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:content_warnings ], + owl:onProperty data_sheets_schema:informed_consent ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:regulatory_restrictions ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:resources ], + owl:onProperty data_sheets_schema:parent_datasets ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:preprocessing_strategies ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:third_party_sharing ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:is_tabular ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:cleaning_strategies ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:subsets ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:sensitive_elements ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:creators ], + owl:allValuesFrom linkml:String ; + owl:onProperty data_sheets_schema:citation ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:at_risk_populations ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:content_warnings ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:errata ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:participant_compensation ], + owl:onProperty data_sheets_schema:collection_mechanisms ], [ a owl:Restriction ; owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:acquisition_methods ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:purposes ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:discouraged_uses ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:version_access ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:use_repository ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:addressing_gaps ], + [ a owl:Restriction ; + owl:allValuesFrom ; owl:onProperty data_sheets_schema:ip_restrictions ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:annotation_analyses ], + owl:onProperty data_sheets_schema:content_warnings ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:total_size_bytes ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:missing_data_documentation ], + owl:onProperty data_sheets_schema:total_file_count ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:creators ], + owl:onProperty data_sheets_schema:sensitive_elements ], [ a owl:Restriction ; owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:total_file_count ], + owl:onProperty data_sheets_schema:extension_mechanism ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:total_size_bytes ], + owl:onProperty data_sheets_schema:distribution_formats ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:external_resources ], + owl:onProperty data_sheets_schema:annotation_analyses ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:machine_annotation_tools ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:tasks ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:acquisition_methods ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:other_tasks ], + owl:onProperty data_sheets_schema:participant_privacy ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:DataSubset ; - owl:onProperty data_sheets_schema:subsets ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:maintainers ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:distribution_dates ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:known_limitations ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:subpopulations ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:consent_revocations ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:sensitive_elements ], + owl:allValuesFrom linkml:Integer ; + owl:onProperty data_sheets_schema:total_file_count ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:participant_compensation ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:ethical_reviews ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:at_risk_populations ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:is_deidentified ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:funders ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:human_subject_research ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:purposes ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Boolean ; - owl:onProperty data_sheets_schema:is_tabular ], + owl:onProperty data_sheets_schema:related_datasets ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:instances ], + owl:onProperty data_sheets_schema:data_protection_impacts ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:imputation_protocols ], + owl:onProperty data_sheets_schema:labeling_strategies ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty data_sheets_schema:total_size_bytes ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:retention_limit ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:ip_restrictions ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:collection_mechanisms ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:data_collectors ], + owl:onProperty data_sheets_schema:regulatory_restrictions ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:version_access ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:purposes ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:raw_sources ], + owl:onProperty data_sheets_schema:version_access ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:FileCollection ; - owl:onProperty data_sheets_schema:file_collections ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:informed_consent ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:updates ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:data_collectors ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom ; owl:onProperty data_sheets_schema:updates ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:instances ], - [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:extension_mechanism ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:intended_uses ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:participant_privacy ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:raw_sources ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:is_deidentified ], + owl:onProperty data_sheets_schema:license_and_use_terms ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:collection_mechanisms ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:intended_uses ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:updates ], + owl:onProperty data_sheets_schema:machine_annotation_tools ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:human_subject_research ], + owl:onProperty data_sheets_schema:discouraged_uses ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:future_use_impacts ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:direct_collection ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:retention_limit ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:subpopulations ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:raw_data_sources ], + owl:allValuesFrom linkml:Integer ; + owl:onProperty data_sheets_schema:total_size_bytes ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:imputation_protocols ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:updates ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:related_datasets ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:other_tasks ], + owl:onProperty data_sheets_schema:citation ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:labeling_strategies ], + owl:onProperty data_sheets_schema:collection_consents ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:license_and_use_terms ], + owl:allValuesFrom data_sheets_schema:DataSubset ; + owl:onProperty data_sheets_schema:subsets ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:confidential_elements ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:distribution_dates ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:regulatory_restrictions ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:direct_collection ], [ a owl:Restriction ; - owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:citation ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:participant_privacy ], [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Dataset ; - owl:onProperty data_sheets_schema:resources ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:collection_timeframes ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:at_risk_populations ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:collection_notifications ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:informed_consent ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:funders ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:discouraged_uses ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:existing_uses ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:regulatory_restrictions ], + owl:onProperty data_sheets_schema:cleaning_strategies ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:purposes ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:known_biases ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:addressing_gaps ], + owl:onProperty data_sheets_schema:human_subject_research ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:machine_annotation_tools ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:at_risk_populations ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:acquisition_methods ], + owl:onProperty data_sheets_schema:imputation_protocols ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:distribution_dates ], + owl:onProperty data_sheets_schema:at_risk_populations ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:extension_mechanism ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:related_datasets ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:citation ], + owl:onProperty data_sheets_schema:relationships ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:raw_sources ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:human_subject_research ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:creators ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:prohibited_uses ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:splits ], [ a owl:Restriction ; owl:allValuesFrom ; owl:onProperty data_sheets_schema:variables ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:is_deidentified ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:collection_consents ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:collection_timeframes ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:missing_data_documentation ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:machine_annotation_tools ], + owl:allValuesFrom linkml:Boolean ; + owl:onProperty data_sheets_schema:is_tabular ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:maintainers ], + owl:onProperty data_sheets_schema:distribution_dates ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:prohibited_uses ], + owl:onProperty data_sheets_schema:raw_data_sources ], [ a owl:Restriction ; owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:splits ], + [ a owl:Restriction ; + owl:allValuesFrom ; owl:onProperty data_sheets_schema:sampling_strategies ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:errata ], + owl:onProperty data_sheets_schema:file_collections ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:use_repository ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:existing_uses ], + owl:onProperty data_sheets_schema:extension_mechanism ], [ a owl:Restriction ; - owl:maxCardinality 1 ; + owl:allValuesFrom ; owl:onProperty data_sheets_schema:retention_limit ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:collection_timeframes ], + owl:onProperty data_sheets_schema:known_biases ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:discouraged_uses ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:raw_data_sources ], + [ a owl:Restriction ; + owl:allValuesFrom data_sheets_schema:FileCollection ; + owl:onProperty data_sheets_schema:file_collections ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:preprocessing_strategies ], + owl:onProperty data_sheets_schema:other_tasks ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:missing_data_documentation ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:resources ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:raw_data_sources ], + owl:onProperty data_sheets_schema:ip_restrictions ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:acquisition_methods ], + owl:minCardinality 0 ; + owl:onProperty data_sheets_schema:is_tabular ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:data_protection_impacts ], + owl:onProperty data_sheets_schema:prohibited_uses ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:subpopulations ], + owl:onProperty data_sheets_schema:preprocessing_strategies ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:is_deidentified ], + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:version_access ], [ a owl:Restriction ; - owl:allValuesFrom linkml:Integer ; - owl:onProperty data_sheets_schema:total_file_count ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:raw_sources ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:extension_mechanism ], - [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:data_collectors ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Dataset ; - owl:onProperty data_sheets_schema:parent_datasets ], + owl:onProperty data_sheets_schema:errata ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:prohibited_uses ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:subpopulations ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:intended_uses ], + owl:onProperty data_sheets_schema:tasks ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:funders ], + owl:onProperty data_sheets_schema:instances ], [ a owl:Restriction ; - owl:allValuesFrom ; - owl:onProperty data_sheets_schema:sampling_strategies ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:distribution_formats ], [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:license_and_use_terms ], + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:preprocessing_strategies ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:confidential_elements ], + [ a owl:Restriction ; + owl:allValuesFrom ; + owl:onProperty data_sheets_schema:participant_compensation ], data_sheets_schema:Information ; skos:altLabel "data file", "data package", @@ -5636,6 +5830,13 @@ data_sheets_schema:Dataset a owl:Class, dcat:Distribution ; skos:inScheme . +data_sheets_schema:VersionTypeEnum a owl:Class, + linkml:EnumDefinition ; + owl:unionOf ( ) ; + linkml:permissible_values , + , + . + data_sheets_schema:description a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "description" ; @@ -5654,11 +5855,15 @@ data_sheets_schema:id a owl:ObjectProperty, rdfs:label "id" ; skos:definition "A unique identifier for a thing.", "An optional identifier for this property." ; - skos:inScheme data_sheets_schema:base . + skos:inScheme data_sheets_schema:base ; + data_sheets_schema:docExample "https://example.org/dataset/my-dataset-001", + "https://example.org/dataset/property-001" . data_sheets_schema:license a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "license" ; + skos:definition "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", + "The license under which the software is distributed (e.g., \"MIT\", \"Apache-2.0\", \"GPL-3.0\")." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:name a owl:ObjectProperty, @@ -5671,6 +5876,7 @@ data_sheets_schema:name a owl:ObjectProperty, data_sheets_schema:path a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "path" ; + skos:definition "The file path or URL where the content is located." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:resources a owl:ObjectProperty, @@ -5683,6 +5889,8 @@ data_sheets_schema:resources a owl:ObjectProperty, data_sheets_schema:version a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "version" ; + skos:definition "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", + "The version identifier of the software (e.g., \"1.0.0\", \"2.3.1-beta\")." ; skos:inScheme data_sheets_schema:base . data_sheets_schema:ConfidentialityLevelEnum a owl:Class, @@ -5696,12 +5904,13 @@ data_sheets_schema:compression a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "compression" ; rdfs:range data_sheets_schema:CompressionEnum ; - skos:definition "compression format used, if any. e.g., gzip, bzip2, zip" ; + skos:definition "Compression format used, if any (e.g., gzip, bzip2, zip)." ; skos:inScheme data_sheets_schema:base . a owl:ObjectProperty, linkml:SlotDefinition ; rdfs:label "response" ; + skos:broadMatch dcterms:description ; skos:definition "Short explanation describing the primary purpose of creating the dataset.", "Short explanation describing the specific task or tasks for which this dataset was created.", "Short explanation of the knowledge or resource gap that this dataset was intended to address." ; @@ -5800,22 +6009,6 @@ data_sheets_schema:CreatorOrMaintainerEnum a owl:Class, , . -data_sheets_schema:VersionTypeEnum a owl:Class, - linkml:EnumDefinition ; - owl:unionOf ( ) ; - linkml:permissible_values , - , - , - , - , - , - , - , - , - , - , - . - data_sheets_schema:VariableTypeEnum a owl:Class, linkml:EnumDefinition ; owl:unionOf ( ) ; @@ -5944,38 +6137,38 @@ data_sheets_schema:DatasetProperty a owl:Class, linkml:ClassDefinition ; rdfs:label "DatasetProperty" ; rdfs:subClassOf [ a owl:Restriction ; - owl:maxCardinality 1 ; - owl:onProperty data_sheets_schema:name ], + owl:allValuesFrom data_sheets_schema:Software ; + owl:onProperty data_sheets_schema:used_software ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:id ], + owl:onProperty data_sheets_schema:used_software ], [ a owl:Restriction ; - owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:description ], + owl:allValuesFrom linkml:Uriorcurie ; + owl:onProperty data_sheets_schema:id ], [ a owl:Restriction ; owl:maxCardinality 1 ; owl:onProperty data_sheets_schema:id ], [ a owl:Restriction ; owl:minCardinality 0 ; - owl:onProperty data_sheets_schema:used_software ], + owl:onProperty data_sheets_schema:id ], [ a owl:Restriction ; owl:minCardinality 0 ; owl:onProperty data_sheets_schema:name ], [ a owl:Restriction ; owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:name ], + [ a owl:Restriction ; + owl:minCardinality 0 ; owl:onProperty data_sheets_schema:description ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; owl:onProperty data_sheets_schema:description ], - [ a owl:Restriction ; - owl:allValuesFrom data_sheets_schema:Software ; - owl:onProperty data_sheets_schema:used_software ], - [ a owl:Restriction ; - owl:allValuesFrom linkml:Uriorcurie ; - owl:onProperty data_sheets_schema:id ], [ a owl:Restriction ; owl:allValuesFrom linkml:String ; - owl:onProperty data_sheets_schema:name ] ; + owl:onProperty data_sheets_schema:name ], + [ a owl:Restriction ; + owl:maxCardinality 1 ; + owl:onProperty data_sheets_schema:description ] ; skos:definition "Represents a single property of a dataset, or a set of related properties." ; skos:inScheme data_sheets_schema:base . @@ -5988,7 +6181,7 @@ data_sheets_schema:DatasetProperty a owl:Class, data_sheets_schema:EncodingEnum a owl:Class, linkml:EnumDefinition ; - owl:unionOf ( ) ; + owl:unionOf ( ) ; linkml:permissible_values , , , @@ -6022,5 +6215,14 @@ data_sheets_schema:EncodingEnum a owl:Class, , , , - . + , + , + , + , + , + , + , + , + , + . diff --git a/reports/data_value_analysis.json b/reports/data_value_analysis.json new file mode 100644 index 00000000..ba111e49 --- /dev/null +++ b/reports/data_value_analysis.json @@ -0,0 +1,2299 @@ +{ + "metadata": { + "tool": "data_value_analyzer", + "total_fields": 142, + "total_issues": 43 + }, + "summary": { + "fields_analyzed": 142, + "boolean_fields": 5, + "enum_candidates": 75, + "multivalued_fields": 38 + }, + "issues": [ + { + "field": "id", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "https://chorus4ai.org/", + "https://doi.org/10.13026/37yb-1t42", + "https://doi.org/10.18130/V3/DXWOS5", + "https://fairhub.io/datasets/2" + ] + }, + { + "field": "name", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "AI-READI", + "Bridge2AI-Voice", + "CHoRUS", + "CM4AI" + ] + }, + { + "field": "title", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI)", + "Bridge2AI-Voice - An ethically-sourced, diverse voice dataset linked to health information", + "Cell Maps for Artificial Intelligence (CM4AI)", + "Patient-Focused Collaborative Hospital Repository Uniting Standards (CHoRUS) for Equitable AI" + ] + }, + { + "field": "description", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "CHoRUS for Equitable AI is a Bridge2AI data generation project developing the most diverse, high-resolution, ethically sourced, AI-ready critical care dataset to answer the grand challenge of improving recovery from acute illness. The project spans 20 academic centers (14 data acquisition centers) and creates a publicly available dataset of over 100,000 critically ill patients with multi-modal data including structured EHR, waveform telemetry, medical imaging, EEG, and clinical notes. All data is standardized to the OMOP Common Data Model with additional formats (DICOM, WFDB, OHNLP tokenization) and includes comprehensive metadata schemas. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. A visualization and annotation environment labels data with targets important for prediction. The project emphasizes skills and workforce development for a next generation of diverse academic and community AI scientists through training programs and partnerships with AIM-AHEAD. As of November 2024, the dataset covers 14 different hospitals with 23,400 unique admissions.\n", + "CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institutes of Health's (NIH) Bridge to Artificial Intelligence (Bridge2AI) program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research. The project delivers machine-readable hierarchical maps of cell architecture as AI-Ready data produced from multimodal interrogation of 100 chromatin modifiers and 100 metabolic enzymes involved in cancer, neuropsychiatric, and cardiac disorders in disease-relevant cell lines under perturbed and unperturbed conditions. Data streams include immunofluorescence (IF) subcellular microscopy for spatial proteomics, affinity purification mass spectroscopy (AP-MS) and size exclusion mass spectroscopy (SEC-MS) for protein-protein interaction (PPI) data, and single-cell CRISPR-Cas perturbation screens by cell type. Input data streams are integrated via the Multi-Scale Integrated Cell (MuSIC) software pipeline employing deep learning models and community detection algorithms, and output cell maps are packaged with provenance graphs and rich metadata as AI-Ready datasets in RO-Crate format using the FAIRSCAPE framework.\n", + "The AI-READI is a flagship dataset consisting of multimodal data collected from 4,000 individuals with and without Type 2 Diabetes Mellitus (T2DM), harmonized across 3 data collection sites (Birmingham, Alabama; San Diego, California; Seattle, Washington). The dataset was designed with future AI/Machine Learning studies in mind, including recruitment sampling procedures aimed at achieving approximately equal distribution of participants across diabetes severity (triple-balanced by race/ethnicity, biological sex, and T2DM severity), as well as a multi-domain data acquisition protocol (survey data, physical measurements, clinical data, imaging data, wearable device data, environmental sensors, biospecimens) to enable downstream AI/ML analyses that may not be feasible with existing data sources such as claims or electronic health records data. The goal is to better understand salutogenesis (the pathway from disease to health) in T2DM. The study follows FAIR principles and incorporates ethical and equitable data collection and management practices.\n", + "The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. This comprehensive collection provides voice recordings with corresponding clinical information from participants selected based on known conditions which manifest within the voice waveform including voice disorders, neurological disorders, mood disorders, and respiratory disorders. The dataset is designed to fuel voice AI research, establish data standards, and promote ethical and trustworthy AI/ML development for voice biomarkers of health. Data collection occurs through a multi-institutional collaborative effort using standardized protocols, custom smartphone applications, and rigorous ethical oversight. The initial release (v1.0) provides 12,523 recordings for 306 participants collected across five sites in North America, with derived features such as spectrograms, MFCCs, acoustic features, and clinical phenotype data. Raw audio data is available through controlled access to protect participant privacy.\n" + ] + }, + { + "field": "page", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "https://chorus4ai.org/", + "https://docs.b2ai-voice.org", + "https://fairhub.io/datasets/2", + "https://www.cm4ai.org" + ] + }, + { + "field": "keywords", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "purposes", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "tasks", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "addressing_gaps", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "creators", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "funders", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "instances", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "subsets", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "sampling_strategies", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "subpopulations", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "collection_mechanisms", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "acquisition_methods", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "preprocessing_strategies", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "cleaning_strategies", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "intended_uses", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "discouraged_uses", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "license", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Bridge2AI Voice Registered Access License", + "CC BY-NC 4.0", + "CC BY-NC-SA 4.0", + "Controlled Access with Data Use Agreement" + ] + }, + { + "field": "license_and_use_terms.name", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Bridge2AI Voice Registered Access License", + "CHoRUS Controlled Access License", + "Creative Commons Attribution Non-Commercial", + "Creative Commons Attribution Non-Commercial Share-Alike" + ] + }, + { + "field": "license_and_use_terms.description", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Data licensed for reuse under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/). Attribution is required to the copyright holders and the Cell Maps for Artificial Intelligence project. Any publications referencing this data or derived products should cite the bioRxiv article (Clark T, et al. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BioRXiv, May 2024. doi:10.1101/2024.05.21.589311) and directly cite the data collection. Commercial use requires separate license negotiation with copyright holder (UCSD, Stanford, and/or UCSF depending upon specific data package). A Data Access Committee will supervise ethical matters related to dataset distribution and potential dual licensing for commercial use. Copyright (c) 2025 The Regents of the University of California except where otherwise noted. Spatial proteomics raw image data is copyright (c) 2025 The Board of Trustees of the Leland Stanford Junior University.\n", + "Dataset distributed under controlled access requiring institutional email registration and signed licensing agreement. Access granted after review and approval process. Participants must complete registration form with name, email (institutional, not personal), and institution. Once approved, users receive email with access instructions to CHoRUS secure enclave. Contact for access requests: dbold@emory.edu or jared.houghtaling@tuftsmedicine.org.\n", + "Public access data distributed under Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license. Permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. Controlled access data requires data use agreement. See http://creativecommons.org/licenses/by-nc/4.0/ for full license terms.\n", + "Public access dataset distributed through PhysioNet under Bridge2AI Voice Registered Access License. Only registered users who sign the specified Data Use Agreement (Bridge2AI Voice Registered Access Agreement) can access files. Data covered under Certificate of Confidentiality which must be asserted against compulsory legal demands. Raw audio data available through controlled access only via Data Access Compliance Office (DACO) requiring distinct application. Recipient must adhere to PhysioNet requirements managed by MIT Laboratory for Computational Physiology, supported by NIBIB under grant R01EB030362.\n" + ] + }, + { + "field": "license_and_use_terms.license_terms", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "distribution_formats", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "maintainers", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "updates.name", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Ongoing data collection and expansion", + "Periodic data releases and maintenance plan", + "Quarterly Data Releases and Maintenance Plan", + "Versioned releases with ongoing data collection" + ] + }, + { + "field": "updates.description", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Dataset regularly updated and augmented through end of project in November 2026. Beta releases on quarterly basis with periodic data augmentation. Initial alpha release (v0.5) provided as supplemental data. March 2025 Beta (V1.4) includes perturb-seq in KOLF2.1J iPSCs, SEC-MS in iPSCs and derivatives, and IF images in MDA-MB-468 under three conditions. June 2025 Beta (V2.1) revision adds RGB IF images, ro-crate metadata corrections, and naming convention changes. Future releases will include computed cell maps and complete integration of all data streams. Long-term preservation in University of Virginia Dataverse with committed institutional support.\n", + "Dataset updated continuously as data collection progresses at 14 acquisition centers. As of November 2024, covers 14 hospitals with 23,400 unique admissions. Target exceeds 100,000 critically ill patients. Project timeline extends through November 30, 2026 (with approved no-cost extension). Regular status updates tracked through GitHub project management system. Sites provide updates via GitHub interface or Google Form submissions.\n", + "Dataset updated periodically as enrollment progresses toward target of 4,000 participants by November 2026. Version-specific documentation maintained for each release. Biorepository maintained at UAB CCTS with long-term storage protocols. Data sharing policies under ongoing development by Data Access Committee. Pilot data released May 2024; all data through July 31, 2024 released November 2024.\n", + "Dataset updated with versioned releases as data collection progresses. Initial release v1.0 published January 17, 2025 with 12,523 recordings from 306 participants. v1.1 released January 17, 2025 adding MFCC features. v2.0.0 released April 16, 2025. v2.0.1 released August 18, 2025. Latest version available at https://doi.org/10.13026/37yb-1t42. Data collection ongoing through November 30, 2026. Version-specific documentation maintained. As of v1.1, only adult cohort data available; pediatric cohort data planned for future releases with additional privacy precautions.\n" + ] + }, + { + "field": "updates.frequency", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Continuous updates through November 2026", + "Periodic releases with ongoing enrollment; final release planned for late 2026", + "Periodic versioned releases during data collection period (2022-2026)", + "Quarterly updates through November 2026; long-term preservation thereafter" + ] + }, + { + "field": "updates.update_details", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "retention_limit.name", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Data and biospecimen retention", + "Data retention and disposition", + "Long-Term Preservation Plan", + "Long-term dataset retention" + ] + }, + { + "field": "retention_limit.description", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "Data Transfer and Use Agreement specifies retention requirements. Upon termination or expiration of agreement (two years after start date, project completion, or ethics approval expiration), data shall be destroyed per provider instructions with written certification required within 30 days. Recipient may retain one copy to extent necessary to comply with records retention requirements under law, regulation, institutional policy, and for research integrity and verification purposes. Restrictions apply to archival copies as long as recipient holds data.\n", + "Digital data maintained according to NIH data sharing policies and institutional requirements. Controlled access model ensures long-term availability for research while protecting patient privacy.\n", + "Digital data maintained according to NIH data sharing policies with long-term preservation in University of Virginia's LibraData repository supported by committed institutional funds. No planned sunset for data availability. Archived RO-Crates with persistent identifiers (ARK, future DOIs) ensure long-term accessibility and citability.\n", + "Digital data maintained according to NIH data sharing policies. Biospecimen retention subject to institutional policies and consent agreements. Finite number of biospecimen samples available for distribution.\n" + ] + }, + { + "field": "retention_limit.retention_details", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "human_subject_research.name", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "AI-READI Human Subjects Research", + "Bridge2AI-Voice Human Subjects Research", + "CHoRUS Human Subjects Research", + "CM4AI Non-Human Subjects Research" + ] + }, + { + "field": "human_subject_research.description", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 4 distinct values (enum candidate)", + "sample_values": [ + "CM4AI data are distinctive within Bridge2AI in that they are non-clinical data from tissue cultures and are considered to be de-identified as they cannot be matched, with current knowledge, to a human subject. Both cell lines (MDA-MB-468 and KOLF2.1J) are commercially available, ethically sourced, de-identified cell lines. MDA-MB-468 available from ATCC. KOLF2.1J available from HipSci resource for non-profit organizations via simple MTA. Ethics team developed comprehensive plan for ethical preparation, licensing, dissemination, and data access supervision balancing openness with IP protection and commercialization monitoring.\n", + "Data collection and sharing approved by University of South Florida Institutional Review Board. Participants provided written informed consent for data collection initiative and data sharing. Consent process includes authorization for voice data collection, access to medical information through EHR platforms for gold standard validation, and permission to share research data. Bioethics guidance integrated throughout study design and conduct. Ethics module develops new guidelines for consenting to voice data collection, voice data sharing, and utilization in context of voice AI technology. Project addresses ethical and trustworthy issues from voice data generation and AI/ML research through clinical adoption and downstream health decisions.\n", + "Retrospective data collection from critically ill patients approved through institutional review processes. Community-facing ethics focus groups conducted to determine what data is appropriate for public sharing. Legal framework established for collecting data at scale. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. Project draws expertise from law, ethics, health services, biomedical science, engineering, and scientific journal publications disciplines.\n", + "Study approved by Institutional Review Board (IRB) of University of Washington (approval number STUDY00016228), with reliance agreements from IRBs of University of Alabama at Birmingham and University of California, San Diego. Written informed consent provided by all participants. Bioethics guidance integrated throughout study design. Community Advisory Board of 11 persons with diversity in race and ethnicity contributes to protocol development. Ethical and equitable data collection and management practices implemented.\n" + ] + }, + { + "field": "human_subject_research.involves_human_subjects", + "issue_type": "string_could_be_enum", + "severity": "MEDIUM", + "description": "String field with only 2 distinct values (enum candidate)", + "sample_values": [ + false, + true + ] + }, + { + "field": "human_subject_research.irb_approval", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "human_subject_research.ethics_review_board", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "human_subject_research.special_populations", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "sensitive_elements", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "external_resources", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + }, + { + "field": "labeling_strategies", + "issue_type": "multivalued_in_data", + "severity": "INFO", + "description": "Field contains lists in data - verify schema has multivalued: true", + "sample_values": [] + } + ], + "field_analyses": { + "acquisition_methods": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "acquisition_methods[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 27, + "sample_values": [ + "12-lead ECG data collected using Philips Pagewriter TC30 Cardiograph during study visit with participant sitting in reclining chair or lying supine at recorded position (0\u00b0, 30\u00b0, 60\u00b0, or 90\u00b0 relative to supine).\n", + "Bedside monitor waveform data acquired through gateway/middleware systems. Stored in WFDB format with controlled access and published PhysioNet schema (extended) metadata.\n", + "Clinical notes extracted and tokenized using OHNLP (Open Health Natural Language Processing) toolkit. Stored locally except tokens. Controlled access planned with OHNLP open source schema metadata.\n", + "Complete blood count (CBC) from fresh whole blood at local CLIA-certified labs. Central lab testing at UW Nutrition and Obesity Research Center (NORC) for EDTA plasma tests (NT-proBNP, Troponin-T, C-peptide, insulin), serum tests (CRP-HS, lipid panel, glucose, kidney and liver function markers), whole blood tests (HbA1c), and urine tests (creatinine, albumin, other markers).\n", + "Custom-designed environmental sensor (Karalis Johnson Retina Center, UW) capturing ambient temperature, relative humidity, nitrogen oxides (NO and NO2), volatile organic compounds, particulate matter (PM1.0, PM2.5, PM4, PM10), and multi-spectral light intensity (11 measurements) for 10 days. Data in CSV format.\n", + "Demographics, medication administration, procedures, nursing flowsheets, and diagnoses acquired from electronic health records and standardized to OMOP Common Data Model. Controlled access with published OMOP schema metadata.\n", + "Dexcom G6 Continuous Glucose Monitor capturing blood glucose measurements (mg/dL) every 5 minutes for 10 days. Data exported in CSV format.\n", + "Dosing information time-stamped upon each infusion change or dose administration. Stored in OMOP format with controlled access and OMOP schema metadata.\n", + "EEG recordings from hospital databases in EDF+ and Persyst formats. Extraction in process. Planned controlled access with open source EDF+ and Persyst schema metadata.\n", + "Electronic health record (EHR) access for participants who consent, permitting investigators to access medical information through EHR platforms to perform gold standard validation of diagnoses and symptoms. Linkage to multimodal health biomarkers including radiomics and genomics.\n" + ] + }, + "acquisition_methods[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "acquisition-001", + "acquisition-002", + "acquisition-003", + "acquisition-004", + "acquisition-005", + "acquisition-006", + "acquisition-007", + "acquisition-008", + "acquisition-009", + "acquisition-010" + ] + }, + "acquisition_methods[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 24, + "sample_values": [ + "Biospecimen collection", + "Clinical laboratory testing", + "Clinical notes (tokenized)", + "Cognitive function testing", + "Confocal Microscopy for Subcellular Imaging", + "Continuous glucose monitoring", + "EEG waveforms", + "Electrocardiogram (ECG)", + "Environmental monitoring", + "High-frequency nursing flowsheets" + ] + }, + "addressing_gaps": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "addressing_gaps[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 10, + "sample_values": [ + "Address demographic inequities in T2DM research by recruiting equal proportions across four race/ethnic groups (Asian, Black, Hispanic, White) and both biological sexes, improving upon many previous epidemiological studies and clinical trials that lacked diversity.\n", + "Address ethical and legal challenges in AI through patient-focused efforts that determine approaches to manage privacy and bias while accounting for Social Determinants of Health and performing community-facing ethics focus groups.\n", + "Address the absence of large-scale, diverse, high-resolution multi-center datasets for critical care AI/ML by creating a dataset spanning 20 academic centers with 100,000+ critically ill patients and ensuring balanced, diverse cohorts through federated access and sampling methods.\n", + "Address the limitation that machine learning models in genomics and precision medicine are typically difficult-to-interpret \"black boxes\" by providing hierarchical cell maps that enable visible machine learning systems built directly on knowledge maps of cell and tissue architecture.\n", + "Bridge the gap in diverse AI/ML workforce through comprehensive educational approaches, training programs (including AIM-AHEAD partnership), and cultivation of expertise in lay and scientific communities to improve AI literacy and utilization.\n", + "Create a model for future AI-ready medical datasets through comprehensive metadata, standardized data formats, FAIR compliance, and ethical data governance practices that can be replicated for other health conditions.\n", + "Create integrated datasets combining protein localization (spatial proteomics), protein-protein interactions (AP-MS and SEC-MS), and transcriptional states (CRISPR perturbation screens) at multiple scales, enabling complex multi-modal AI analyses not feasible with single data types.\n", + "Overcome the lack of unified standards in critical care data by harmonizing multi-modal EHR, waveform, imaging, and text data to OMOP Common Data Model and other international standards (DICOM, WFDB, OHNLP).\n", + "Provide a large-scale, harmonized, multi-site, multi-domain dataset enabling AI/ML analyses not feasible with existing sources (e.g., claims or EHR alone). With 4,000 participants and over 10 variable domains, this is the largest publicly accessible dataset of its kind for T2DM research.\n", + "Provide fully provenanced, ethically validated, and FAIR-compliant AI-ready datasets with machine-readable provenance graphs, complete schemas, validation procedures, and data sheets that can be reliably processed by AI applications with full explainability.\n" + ] + }, + "addressing_gaps[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "gap-001", + "gap-002", + "gap-003", + "gap-004" + ] + }, + "addressing_gaps[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 10, + "sample_values": [ + "AI-readiness of medical datasets", + "Black Box AI Models in Genomic Medicine", + "Demographic underrepresentation", + "Insufficient data standardization", + "Integration of Multimodal Cellular Data", + "Lack of AI-Ready Biomedical Datasets with Provenance", + "Lack of diverse multi-center critical care datasets", + "Lack of multimodal T2DM datasets", + "Limited AI workforce diversity", + "Privacy and bias concerns in AI" + ] + }, + "addressing_gaps[item].response": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Address the lack of large, high quality, multi-institutional and diverse voice databases linked to multimodal health biomarkers (demographics, imaging, genomics, risk factors) necessary to fuel voice AI research and answer tangible clinical questions.\n", + "Establish missing standards for voice data collection, acoustic analysis, and ethical frameworks for consenting to voice data collection, sharing, and utilization in the context of voice AI technology development and clinical adoption.\n", + "Fill the gap in pediatric voice and speech analysis research, which is sparser partly due to ethical concerns and challenges in data acquisition for this cohort, particularly for autism and speech delay detection.\n", + "Overcome limitations in existing voice and psychiatric disorder research that has relied on small datasets with limited demographic diversity reporting, lack of standardized data collection protocols precluding meta-analysis, and possible confounders limiting external validity and clinical usability.\n" + ] + }, + "cleaning_strategies": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "cleaning_strategies[item].cleaning_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "cleaning_strategies[item].cleaning_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 43, + "sample_values": [ + "18 identifier categories removed", + "ARK persistent identifier assignment", + "AlphaFoldDB structure integration", + "Audio waveforms excluded from public release", + "Biometric identifiers removed", + "Clinical validation SOP for mappings", + "Common smartphone application", + "Controlled vocabulary mapping for all keywords", + "Cross-site data quality checks", + "Cross-site harmonization procedures" + ] + }, + "cleaning_strategies[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 9, + "sample_values": [ + "All data modalities validated against published metadata schemas (OMOP, DICOM, WFDB, OHNLP, EDF+, Persyst) to ensure compliance and data quality.\n", + "All datasets packaged using FAIRSCAPE framework which creates RO-Crate packages with datasets, metadata, provenance graphs, and software. FAIRSCAPE-CLI validates inputs and creates output RO-Crate packages. FAIRSCAPE server assigns persistent resolvable globally unique identifiers (ARK scheme), decomposes RO-Crates into components, and computes end-to-end provenance entailments using EVI Evidence Graph Ontology.\n", + "Bioinformatics pipeline developed for annotating MuSIC communities with available structural information from PDB, AlphaFoldDB, crosslinking mass spectrometry, and prediction of disordered sequence segments. Communities ranked by structural information amount as proxy for integrative modeling feasibility.\n", + "Data from 14 acquisition centers harmonized through validated semantic mappings and standard operating protocols (SOPs). Ensures consistency and interoperability across diverse institutional EHR systems and clinical practices.\n", + "Data mapped to applicable standards and ontologies including Gene Ontology (GO), Reactome, Protein Data Bank (PDB), AlphaFold Protein Structure Database, Schema.org, and EVI Evidence Graph Ontology. Keywords mapped to controlled vocabularies from NCI Thesaurus, BioAssay Ontology, Cell Ontology, CHEBI, and other ontologies.\n", + "Data standardization across multi-institutional sites through use of standardized protocols, common data collection application, and REDCap data management system. Ensures consistency and quality across five collection sites.\n", + "HIPAA Safe Harbor de-identification applied. Identifiers removed include: names, geographic locators (state/province removed, country retained), dates at resolution finer than years, phone/fax numbers, email addresses, IP addresses, Social Security Numbers, medical record numbers, health plan beneficiary numbers, device identifiers, license numbers, account numbers, vehicle identifiers, website URLs, full face photos, biometric identifiers, and any unique identifiers.\n", + "Privacy protection measures for public release - Audio waveforms omitted from public dataset, only derived features (spectrograms, MFCCs, acoustic features) made available. Free speech transcripts removed. Raw audio available only through controlled access with DACO approval.\n", + "Standardized protocols and procedures across all three data collection sites ensure data consistency and quality. Common equipment, training, and REDCap data management system used to maintain FAIR principles compliance.\n" + ] + }, + "cleaning_strategies[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 3, + "sample_values": [ + "cleaning-001", + "cleaning-002", + "cleaning-003" + ] + }, + "cleaning_strategies[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "Data Standard Mapping", + "FAIRSCAPE AI-Readiness Packaging", + "Integrative Structure Modeling Annotation", + "Metadata schema validation", + "Multi-center data harmonization", + "Multi-site harmonization" + ] + }, + "collection_mechanisms": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "collection_mechanisms[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 17, + "sample_values": [ + "Automated fixation and permeabilization protocols using pipetting robot for MDA-MB-468 and KOLF2.1J cell lines. Immunofluorescence-based staining (ICC-IF) with confocal microscopy to capture spatial subcellular organization. Completed spatial proteomics mapping of 100 chromatin regulators in MDA-MB-468 cells under three conditions (untreated, paclitaxel, vorinostat), with 500 additional proteins pending from genetic perturbations and PPI results. Antibodies from Human Protein Atlas resource. Generated by Lundberg Lab at Stanford University.\n", + "Blood (53 mL) and urine collected during study visit. Local processing for plasma, serum, buffy coats at all sites. Centralized biobanking at UAB CCTS. Standardized operating procedures ensure consistent handling.\n", + "Continuous glucose monitoring (Dexcom G6, 5-minute intervals), physical activity monitoring (Garmin VivoSmart 5), and environmental sensor monitoring (temperature, humidity, air quality) conducted at participants' homes over 10-day monitoring periods.\n", + "Continuous waveform telemetry data captured from bedside monitors through gateway/middleware systems. Stored in WFDB (WaveForm DataBase) format following PhysioNet schema (extended).\n", + "Data collection conducted using a custom smartphone application on tablet with headset used when possible. Standardized protocol for data collection adopted across all sites. Single session sufficient for most participants, though subset required multiple sessions resulting in more than one session per participant in dataset.\n", + "Data collection protocol involved: (1) demographic information collection, (2) health questionnaires, (3) targeted questionnaires about known voice confounders, (4) disease- specific information, (5) voice recording tasks such as sustained phonation of vowel sounds, (6) conventional acoustic tasks including respiratory sounds, cough sounds, and free speech prompts. Data exported and converted from REDCap using open source b2aiprep library.\n", + "Electroencephalography recordings extracted from hospital EEG databases in EDF+ and Persyst formats with metadata following open source schemas.\n", + "Endogenous tagging of genes in cell lines followed by affinity purification mass spectrometry (AP-MS) to map protein-protein interactions. 17 genes endogenously tagged in MDA-MB-468 with AP-MS data acquired under three conditions (untreated, paclitaxel, vorinostat). 34 additional genes currently in tagging process. Orthogonal approach to SEC-MS for comprehensive PPI mapping.\n", + "Medical imaging data acquired from hospital PACS (Picture Archiving and Communication System) and stored in DICOM format with comprehensive metadata following DICOM schema.\n", + "Multi-center network capabilities to acquire, standardize, tokenize, store, visualize, and label data. All structured clinical data transformed to OMOP Common Data Model. Clinical notes tokenized using OHNLP toolkit. Imaging converted to DICOM. Waveforms standardized to WFDB format.\n" + ] + }, + "collection_mechanisms[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "collection-001", + "collection-002", + "collection-003", + "collection-004", + "collection-005" + ] + }, + "collection_mechanisms[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 14, + "sample_values": [ + "Affinity Purification Mass Spectrometry", + "Biospecimen collection and biobanking", + "CRISPR Perturbation Screens", + "EEG recording extraction", + "Electronic health record screening", + "Home-based wearable monitoring", + "Immunofluorescence Spatial Proteomics Imaging", + "In-person data collection visits", + "Medical imaging acquisition", + "Retrospective EHR extraction" + ] + }, + "creators": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "creators[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 38, + "sample_values": [ + "Co-Investigator", + "Co-Investigator, AI-READI Consortium", + "Co-Investigator, AI-READI Consortium (Bioethics)", + "Co-Investigator, Simon Fraser University, Ethics Module Leader", + "Co-Investigator, Stanford University, Data Acquisition Module (Spatial Proteomics)", + "Co-Investigator, University of Alabama", + "Co-Investigator, University of Alabama at Birmingham, Department of Medicine", + "Co-Investigator, University of Alabama at Birmingham, Departments of Ophthalmology and Epidemiology", + "Co-Investigator, University of Alabama at Birmingham, Teaming Module", + "Co-Investigator, University of California San Diego, Data Acquisition Module (Genetic Perturbations)" + ] + }, + "creators[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 20, + "sample_values": [ + "creator-001", + "creator-002", + "creator-003", + "creator-004", + "creator-005", + "creator-006", + "creator-007", + "creator-008", + "creator-009", + "creator-010" + ] + }, + "creators[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 65, + "sample_values": [ + "Aaron Lee", + "Alexandros Sigaras", + "Alistair Johnson", + "Aliyah Geer", + "Alvin Y. Liu", + "Anais Rameau", + "Andrej Sali", + "Andrew Ewing Williams", + "Andrew Williams", + "Ashley Cordes" + ] + }, + "description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "CHoRUS for Equitable AI is a Bridge2AI data generation project developing the most diverse, high-resolution, ethically sourced, AI-ready critical care dataset to answer the grand challenge of improving recovery from acute illness. The project spans 20 academic centers (14 data acquisition centers) and creates a publicly available dataset of over 100,000 critically ill patients with multi-modal data including structured EHR, waveform telemetry, medical imaging, EEG, and clinical notes. All data is standardized to the OMOP Common Data Model with additional formats (DICOM, WFDB, OHNLP tokenization) and includes comprehensive metadata schemas. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. A visualization and annotation environment labels data with targets important for prediction. The project emphasizes skills and workforce development for a next generation of diverse academic and community AI scientists through training programs and partnerships with AIM-AHEAD. As of November 2024, the dataset covers 14 different hospitals with 23,400 unique admissions.\n", + "CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institutes of Health's (NIH) Bridge to Artificial Intelligence (Bridge2AI) program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research. The project delivers machine-readable hierarchical maps of cell architecture as AI-Ready data produced from multimodal interrogation of 100 chromatin modifiers and 100 metabolic enzymes involved in cancer, neuropsychiatric, and cardiac disorders in disease-relevant cell lines under perturbed and unperturbed conditions. Data streams include immunofluorescence (IF) subcellular microscopy for spatial proteomics, affinity purification mass spectroscopy (AP-MS) and size exclusion mass spectroscopy (SEC-MS) for protein-protein interaction (PPI) data, and single-cell CRISPR-Cas perturbation screens by cell type. Input data streams are integrated via the Multi-Scale Integrated Cell (MuSIC) software pipeline employing deep learning models and community detection algorithms, and output cell maps are packaged with provenance graphs and rich metadata as AI-Ready datasets in RO-Crate format using the FAIRSCAPE framework.\n", + "The AI-READI is a flagship dataset consisting of multimodal data collected from 4,000 individuals with and without Type 2 Diabetes Mellitus (T2DM), harmonized across 3 data collection sites (Birmingham, Alabama; San Diego, California; Seattle, Washington). The dataset was designed with future AI/Machine Learning studies in mind, including recruitment sampling procedures aimed at achieving approximately equal distribution of participants across diabetes severity (triple-balanced by race/ethnicity, biological sex, and T2DM severity), as well as a multi-domain data acquisition protocol (survey data, physical measurements, clinical data, imaging data, wearable device data, environmental sensors, biospecimens) to enable downstream AI/ML analyses that may not be feasible with existing data sources such as claims or electronic health records data. The goal is to better understand salutogenesis (the pathway from disease to health) in T2DM. The study follows FAIR principles and incorporates ethical and equitable data collection and management practices.\n", + "The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. This comprehensive collection provides voice recordings with corresponding clinical information from participants selected based on known conditions which manifest within the voice waveform including voice disorders, neurological disorders, mood disorders, and respiratory disorders. The dataset is designed to fuel voice AI research, establish data standards, and promote ethical and trustworthy AI/ML development for voice biomarkers of health. Data collection occurs through a multi-institutional collaborative effort using standardized protocols, custom smartphone applications, and rigorous ethical oversight. The initial release (v1.0) provides 12,523 recordings for 306 participants collected across five sites in North America, with derived features such as spectrograms, MFCCs, acoustic features, and clinical phenotype data. Raw audio data is available through controlled access to protect participant privacy.\n" + ] + }, + "discouraged_uses": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "discouraged_uses[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 13, + "sample_values": [ + "As data collection continues through November 2026 and quality assurance processes are ongoing, early dataset versions should be used with awareness of completeness limitations and ongoing expansion.\n", + "As enrollment is ongoing until November 2026, pilot data releases and periodic updates may not have achieved balanced distribution across all groups. Early versions should be used with awareness of this limitation.\n", + "Attempts to re-identify participants from de-identified data violate ethical principles and data use agreements.\n", + "Attempts to re-identify patients from de-identified data violate ethical principles, data use agreements, and legal frameworks established for privacy protection.\n", + "Dataset is for research purposes. Any AI/ML models developed should undergo appropriate clinical validation before use in patient care or clinical decision-making.\n", + "Dataset is for research purposes. Any AI/ML models developed should undergo appropriate clinical validation, regulatory approval, and institutional review before use in patient care or clinical decision-making.\n", + "Datasets require domain expertise for meaningful analysis and interpretation. Not suitable for use without understanding of functional genomics, proteomics, cell biology, and AI/ML methodologies. Training resources available through CM4AI Skills and Workforce Development module.\n", + "Development of surveillance technologies or applications that could be used for discrimination, bias amplification, or harm to vulnerable populations. Voice data contains sensitive health information and could encode biases that require careful ethical consideration.\n", + "Direct clinical decision-making without appropriate validation. Dataset is for research purposes. Any AI/ML models developed should undergo appropriate clinical validation, regulatory approval, and testing before use in patient care or clinical decision support.\n", + "Laboratory data from cell lines are not to be used in clinical decision-making or any context involving patient care without appropriate regulatory oversight and approval. Requires domain expertise and clinical validation before any clinical applications.\n" + ] + }, + "discouraged_uses[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "discouraged-001", + "discouraged-002", + "discouraged-003", + "discouraged-004" + ] + }, + "discouraged_uses[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 7, + "sample_values": [ + "Analysis Without Domain Expertise", + "Clinical Decision-Making Without Validation", + "Clinical decision-making without validation", + "Re-identification attempts", + "Use During Incomplete Data Release", + "Uses during ongoing data collection", + "Uses during ongoing enrollment" + ] + }, + "distribution_formats": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "distribution_formats[item].access_urls": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "distribution_formats[item].access_urls[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 15, + "sample_values": [ + "Contact DACO@b2ai-voice.org", + "LibraData University of Virginia", + "MassIVE Repository (human cancer cells)", + "MassIVE Repository (human iPSCs)", + "NCBI BioProject", + "Sequence Read Archive (SRA)", + "https://chorus4ai.org/", + "https://docs.aireadi.org/", + "https://doi.org/10.18130/V3/B35XWX", + "https://doi.org/10.18130/V3/DXWOS5" + ] + }, + "distribution_formats[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 19, + "sample_values": [ + "All CM4AI output data packaged as Research Object Crate (RO-Crate) packages containing datasets, metadata, provenance graphs, and software (or resolvable references). RO-Crates assigned persistent globally unique identifiers (ARK scheme, DOIs planned for publishable work) that resolve to machine- and human-readable landing pages with metadata in JSON-LD using Schema.org and EVI vocabularies.\n", + "Archived RO-Crates available in University of Virginia's LibraData data archive (instance of Harvard's Dataverse, an NIH-approved generalist repository). Long-term preservation supported by committed institutional funds. Quarterly updates through November 2026.\n", + "Cell maps shared via Network Data Exchange (NDEx) for visualization and access. Maps can be visualized in web browser or accessed via tools such as Cytoscape, HiView, and Python ndex2 library.\n", + "Clinical notes distributed as OHNLP-tokenized text following open source schema. Stored locally at sites except tokens, planned for controlled access.\n", + "EEG waveform data distributed in EDF+ (European Data Format) and Persyst formats following open source schemas. Extraction in process, planned for controlled access.\n", + "Mass spectrometry data deposited to MassIVE Repository (Proteomics community-supported repository). Separate depositions for human iPSC data and human cancer cell data. Data will be uploaded to Pride when available.\n", + "Medical imaging data distributed in DICOM format with comprehensive metadata. De-identification in process, planned for controlled access in secure enclave.\n", + "Mel-frequency cepstral coefficients stored in Parquet format (mfcc.parquet). Contains 60xN dimension MFCC arrays derived from spectrograms. Compatible with Python datasets library and common data science tools.\n", + "Metadata and data dictionaries provided through REDCap system documentation for survey and study coordination data.\n", + "Original raw audio waveforms available through controlled access only. Interested users contact DACO@b2ai-voice.org for application process. Disseminated through Data Access Compliance Office with formal vetting and approval process.\n" + ] + }, + "distribution_formats[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "format-001", + "format-002", + "format-003", + "format-004", + "format-005" + ] + }, + "distribution_formats[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 19, + "sample_values": [ + "CSV for tabular and time-series data", + "DICOM for imaging", + "DICOM imaging format", + "EDF+ and Persyst EEG formats", + "Hierarchical Cell Maps in NDEx", + "Mass Spectrometry Data in MassIVE", + "OHNLP tokenized text", + "OMOP Common Data Model", + "Parquet for MFCCs", + "Parquet for spectrograms" + ] + }, + "external_resources": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "external_resources[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 38, + "sample_values": [ + "AI-readiness framework documentation, tutorial, and installation instructions", + "Additional dataset documentation and resources", + "Additional dataset documentation and software releases", + "Alternative data repository platform", + "BMJ Open publication describing study design and protocol", + "Centralized standard operating protocol documentation site with interactive workflow diagrams", + "Clark T, et al. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BioRXiv, May 2024.\n", + "Collaboration partner for data curation and integration", + "Comprehensive GitHub organization with 28 repositories including software, documentation, and SOPs", + "Comprehensive dataset documentation with version-specific guides" + ] + }, + "external_resources[item].external_resources": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "external_resources[item].external_resources[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 50, + "sample_values": [ + "AIM-AHEAD Connect platform (mentorship)", + "AIM-AHEAD Consortium website", + "CHoRUS developer repository (chorus-developer)", + "MassIVE Repository", + "NCBI BioProject", + "OHDSI website and tool stack", + "OMOP Common Data Model documentation", + "Package status page (maintained by CHoRUS)", + "Sequence Read Archive (SRA)", + "Training program informational materials" + ] + }, + "external_resources[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "resource-001", + "resource-002", + "resource-003", + "resource-004", + "resource-005", + "resource-006", + "resource-007", + "resource-008", + "resource-009", + "resource-010" + ] + }, + "external_resources[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 35, + "sample_values": [ + "AI-READI Dataset Documentation", + "AI-READI Project Website", + "AIM-AHEAD Training Partnership", + "Bridge2AI Program", + "Bridge2AI-Voice GitHub Repository", + "Bridge2AI-Voice Project Documentation", + "CHoRUS Developer Documentation", + "CHoRUS GitHub Organization", + "CHoRUS Project Website", + "CM4AI Project Website" + ] + }, + "funders": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "funders[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "Funded through National Institutes of Health grant 1OT2OD032701-01, administered by NIH Office of the Director. Opportunity Number: OTA-21-008. Study Section: Data Coordination, Mapping, and Modeling [DCMM]. Fiscal Year 2022. Total funding in 2022: $5,880,300 (all direct costs). Project dates: September 1, 2022 to November 30, 2026 (with no-cost extension approved).\n", + "Funded through National Institutes of Health grant 1OT2OD032742-01 (Bridge2AI Functional Genomics) and 5U54HG012513-02 (Bridge2AI Bridge Center), administered by NIH Office of the Director. Opportunity Number: OTA-21-008. Project dates: September 1, 2022 to August 31, 2026. FY 2025 funding: $5,289,382 (Direct: $4,632,095, Indirect: $657,287). Additional funding from the Frederick Thomas Fund of the University of Virginia.\n", + "Funded through National Institutes of Health grant 3OT2OD032720-01S3 (Bridge2AI: Voice as a Biomarker of Health - Building an ethically sourced, bioaccoustic database to understand disease like never before). Opportunity Number: OTA-21-008. Project dates: September 1, 2022 to November 30, 2026. Total funding in 2025: $4,660,942 (Direct Costs: $4,072,321, Indirect Costs: $588,621). Administered by NIH Office of the Director through the Bridge2AI Program. Study Section: Data Coordination, Mapping, and Modeling [DCMM].\n", + "Funded through National Institutes of Health grant OT2OD032644, administered by NIH Office of the Director. Additional support from grants P30DK035816 (Nutrition and Obesity Research Center), UL1TR003096, and Research to Prevent Blindness. Total funding in 2022: $5,026,499. Opportunity Number: OTA-21-008. Project dates: September 1, 2022 to August 31, 2025.\n", + "NIBIB supports PhysioNet managed by MIT Laboratory for Computational Physiology under NIH grant number R01EB030362, which serves as a distribution platform for the Bridge2AI-Voice dataset.\n" + ] + }, + "funders[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + "funder-001", + "funder-002" + ] + }, + "funders[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 3, + "sample_values": [ + "NIH Common Fund Bridge2AI Program", + "NIH Office of the Director", + "National Institute of Biomedical Imaging and Bioengineering" + ] + }, + "human_subject_research.description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "CM4AI data are distinctive within Bridge2AI in that they are non-clinical data from tissue cultures and are considered to be de-identified as they cannot be matched, with current knowledge, to a human subject. Both cell lines (MDA-MB-468 and KOLF2.1J) are commercially available, ethically sourced, de-identified cell lines. MDA-MB-468 available from ATCC. KOLF2.1J available from HipSci resource for non-profit organizations via simple MTA. Ethics team developed comprehensive plan for ethical preparation, licensing, dissemination, and data access supervision balancing openness with IP protection and commercialization monitoring.\n", + "Data collection and sharing approved by University of South Florida Institutional Review Board. Participants provided written informed consent for data collection initiative and data sharing. Consent process includes authorization for voice data collection, access to medical information through EHR platforms for gold standard validation, and permission to share research data. Bioethics guidance integrated throughout study design and conduct. Ethics module develops new guidelines for consenting to voice data collection, voice data sharing, and utilization in context of voice AI technology. Project addresses ethical and trustworthy issues from voice data generation and AI/ML research through clinical adoption and downstream health decisions.\n", + "Retrospective data collection from critically ill patients approved through institutional review processes. Community-facing ethics focus groups conducted to determine what data is appropriate for public sharing. Legal framework established for collecting data at scale. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. Project draws expertise from law, ethics, health services, biomedical science, engineering, and scientific journal publications disciplines.\n", + "Study approved by Institutional Review Board (IRB) of University of Washington (approval number STUDY00016228), with reliance agreements from IRBs of University of Alabama at Birmingham and University of California, San Diego. Written informed consent provided by all participants. Bioethics guidance integrated throughout study design. Community Advisory Board of 11 persons with diversity in race and ethnicity contributes to protocol development. Ethical and equitable data collection and management practices implemented.\n" + ] + }, + "human_subject_research.ethics_review_board": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "human_subject_research.ethics_review_board[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "Bridge2AI Ethics Working Group participation", + "CM4AI Ethics Module (Vardit Ravitsky, Jean-Christophe B\u00e9lisle-Pipon)", + "Community Advisory Board with 11 members representing diverse race and ethnicity", + "Community-facing ethics focus groups", + "Data Access Committee (Jillian Parker)", + "Institutional review boards at 14 data acquisition centers", + "Legal and ethical advisory teams", + "Privacy and accountability review processes", + "University of Alabama at Birmingham Institutional Review Board (reliance agreement)", + "University of California San Diego Institutional Review Board (reliance agreement)" + ] + }, + "human_subject_research.id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "hsr-001" + ] + }, + "human_subject_research.involves_human_subjects": { + "inferred_type": "bool", + "is_boolean": true, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + false, + true + ] + }, + "human_subject_research.irb_approval": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "human_subject_research.irb_approval[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "Not applicable - de-identified cell lines from commercial sources", + "University of Alabama at Birmingham IRB reliance agreement", + "University of California San Diego IRB reliance agreement", + "University of South Florida Institutional Review Board approval", + "University of Washington IRB approval number STUDY00016228" + ] + }, + "human_subject_research.name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "AI-READI Human Subjects Research", + "Bridge2AI-Voice Human Subjects Research", + "CHoRUS Human Subjects Research", + "CM4AI Non-Human Subjects Research" + ] + }, + "human_subject_research.special_populations": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "human_subject_research.special_populations[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 7, + "sample_values": [ + "Asian populations", + "Black populations", + "Hispanic populations", + "KOLF2.1J derived from healthy male Northern European donor (de-identified)", + "MDA-MB-468 derived from 51-year-old black female (de-identified)", + "Recruitment targeted to include racial and ethnic minorities disproportionately affected by T2DM", + "Tribal consultation planned for Native American cohort participation" + ] + }, + "id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "https://chorus4ai.org/", + "https://doi.org/10.13026/37yb-1t42", + "https://doi.org/10.18130/V3/DXWOS5", + "https://fairhub.io/datasets/2" + ] + }, + "instances": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "instances[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 7, + "sample_values": [ + "Adult participants presenting at specialty clinics and institutions across five sites in North America. Participants were selected based on membership to five predetermined disease cohort groups: Respiratory disorders, Voice disorders, Neurological disorders, Mood disorders, and Pediatric. As of v1.1, only data from the adult cohort is available. The initial release (v1.0) provides 306 participants with 12,523 recordings collected through standardized protocols.\n", + "Human induced pluripotent stem cell (iPSC) line (RRID:CVCL_B5P3) derived from a healthy male Northern European donor, available from the Human Induced Pluripotent Stem Cells Initiative (HipSci) resource. Available for access by non-for-profit organizations via a simple MTA. Analyzed in undifferentiated state and after differentiation into neurons and cardiomyocytes.\n", + "Individual critically ill patients requiring acute or critical care admitted to intensive care units or similar hospital settings across 14 data acquisition centers. As of November 2024, dataset covers 14 different hospitals with 23,400 unique admissions. Target enrollment exceeds 100,000 critically ill patients. Retrospective data collection from patients with acute or critical illness.\n", + "Individual participants aged 40 and older with and without Type 2 Diabetes Mellitus (T2DM). Target enrollment is 4,000 people, triple-balanced by self-reported race/ethnicity (Asian, Black, Hispanic, White), T2DM severity (no diabetes, pre-diabetes/lifestyle-controlled diabetes, diabetes treated with oral medications or non-insulin injections, insulin-controlled diabetes), and biological sex (male, female). Participants must speak, read, and understand English. Exclusion criteria include pregnancy and type 1 diabetes.\n", + "Near-comprehensive set of chromatin regulators encoded by the human genome analyzed via AP-MS, SEC-MS, IF imaging, and CRISPR perturbation screens across different cell states and treatment conditions. 17 genes endogenously tagged in MDA-MB-468 with AP-MS data under three conditions, with 34 additional genes in process. SEC-MS identified 72/100 chromatin modifiers, with 52 being integral components of protein complexes.\n", + "Set of metabolic enzymes involved in cancer, neuropsychiatric, and cardiac disorders analyzed via multimodal interrogation including mass spectrometry, imaging, and perturbation screens.\n", + "Triple negative breast cancer cell line (RRID:CVCL_0419) established from a metastatic site pleural effusion of a 51-year-old black female with a metastatic mammary adenocarcinoma, available from ATCC. This cell line has been extensively used to study triple-negative breast cancer and is well characterized with transcriptomic, mutational profile, and whole-genome sequencing data available. Cells are analyzed under three conditions: untreated, paclitaxel-treated, and vorinostat-treated.\n" + ] + }, + "instances[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "instance-001", + "instance-002", + "instance-003", + "instance-004" + ] + }, + "instances[item].instance_type": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "Cultured cell line from de-identified human tissue, ethically sourced from ATCC", + "Cultured cell line from de-identified human tissue, ethically sourced from HipSci", + "Human participants recruited from specialty clinics at multi-institutional sites. Data collection conducted between 2022 and 2026 through IRB-approved protocols with informed consent.\n", + "Human participants recruited from three health system sites (University of Alabama at Birmingham, University of California San Diego, University of Washington) between 2022 and 2026.\n", + "Human subjects - critically ill patients admitted to participating hospitals between data collection timeframe (specific dates vary by site, ongoing collection through November 2026).\n", + "Protein targets for multi-modal analysis" + ] + }, + "instances[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "100 Chromatin Regulators", + "100 Metabolic Enzymes", + "Critically ill patients", + "Individual participants", + "KOLF2.1J Induced Pluripotent Stem Cells", + "MDA-MB-468 Breast Cancer Cell Line" + ] + }, + "intended_uses": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "intended_uses[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 21, + "sample_values": [ + "Analysis of cellular responses to drug treatments (paclitaxel, vorinostat) to predict drug response and synergy. Cell maps under different treatment conditions enable visible machine learning for drug discovery and personalized medicine applications.\n", + "Development of visible neural networks (VNNs) that use hierarchical cell maps as interpretable model architectures. Unlike black box models, VNNs built on cell maps allow interrogation of how protein assemblies affect cell-level phenotypes, enabling interpretation of genetic variants and mutations in the context of cellular mechanisms.\n", + "Discovery of novel biomarkers for T2DM progression, complications, and salutogenesis using biospecimens from the biorepository.\n", + "Model dataset for ethical AI development in healthcare, demonstrating integration of bioethics guidance, ethical data collection practices, informed consent processes, privacy protection through federated learning, and trustworthy AI/ML development from data generation through clinical adoption.\n", + "Multimodal health research combining voice data with EHR information, radiomics, genomics, and other health biomarkers to understand complex disease relationships and improve diagnostic accuracy.\n", + "Primary intended use is development and training of artificial intelligence and machine learning models to characterize acute and critical care illness, predict complications, and measure treatment response in critically ill patients.\n", + "Primary intended use is development and training of artificial intelligence and machine learning models to study Type 2 Diabetes Mellitus, disease trajectories, and salutogenesis (pathways to health resilience). Designed for pseudotime manifold analysis to predict disease progression.\n", + "Primary intended use is development and validation of AI/ML models for voice as a biomarker of health, supporting screening, diagnosis, and treatment of voice disorders, neurological disorders, mood disorders, respiratory disorders, and pediatric speech disorders.\n", + "Primary intended use is training and development of artificial intelligence and machine learning models for functional genomics research. AI-ready datasets with full provenance, metadata, and validation enable immediate use in AI/ML pipelines without reformatting.\n", + "Provision of holdout test set accessible for model external validation to aid marketplace adoption of AI-developed models for implementation in acute and critical care settings.\n" + ] + }, + "intended_uses[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "use-001", + "use-002", + "use-003", + "use-004", + "use-005", + "use-006" + ] + }, + "intended_uses[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 16, + "sample_values": [ + "AI Model Training for Functional Genomics", + "AI/ML model development for T2DM", + "AI/ML model development for critical care", + "Biomarker discovery", + "Clinical care improvement research", + "Disease Mechanism Research", + "Drug Response and Synergy Prediction", + "Educational and training purposes", + "External validation of AI models", + "Genotype-Phenotype Mapping Research" + ] + }, + "keywords": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "keywords[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 104, + "sample_values": [ + "AI-READI", + "AI-Ready Data", + "AI-ready dataset", + "ALS", + "AP-MS", + "Alzheimer's disease", + "Artificial Intelligence", + "Breast Cancer", + "Bridge2AI", + "CARE principles" + ] + }, + "labeling_strategies": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "labeling_strategies[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "Custom visualization and annotation environment developed to label data with targets important for prediction tasks in critical care AI applications.\n" + ] + }, + "labeling_strategies[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "labeling-001" + ] + }, + "labeling_strategies[item].labeling_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "labeling_strategies[item].labeling_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "Annotation interface for clinical experts", + "Documentation of labeling protocols", + "Interactive visualization tools", + "Labeling of prediction targets", + "Quality control of annotations" + ] + }, + "labeling_strategies[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "Visualization and annotation environment" + ] + }, + "language": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "en" + ] + }, + "license": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Bridge2AI Voice Registered Access License", + "CC BY-NC 4.0", + "CC BY-NC-SA 4.0", + "Controlled Access with Data Use Agreement" + ] + }, + "license_and_use_terms.description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Data licensed for reuse under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/). Attribution is required to the copyright holders and the Cell Maps for Artificial Intelligence project. Any publications referencing this data or derived products should cite the bioRxiv article (Clark T, et al. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BioRXiv, May 2024. doi:10.1101/2024.05.21.589311) and directly cite the data collection. Commercial use requires separate license negotiation with copyright holder (UCSD, Stanford, and/or UCSF depending upon specific data package). A Data Access Committee will supervise ethical matters related to dataset distribution and potential dual licensing for commercial use. Copyright (c) 2025 The Regents of the University of California except where otherwise noted. Spatial proteomics raw image data is copyright (c) 2025 The Board of Trustees of the Leland Stanford Junior University.\n", + "Dataset distributed under controlled access requiring institutional email registration and signed licensing agreement. Access granted after review and approval process. Participants must complete registration form with name, email (institutional, not personal), and institution. Once approved, users receive email with access instructions to CHoRUS secure enclave. Contact for access requests: dbold@emory.edu or jared.houghtaling@tuftsmedicine.org.\n", + "Public access data distributed under Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license. Permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. Controlled access data requires data use agreement. See http://creativecommons.org/licenses/by-nc/4.0/ for full license terms.\n", + "Public access dataset distributed through PhysioNet under Bridge2AI Voice Registered Access License. Only registered users who sign the specified Data Use Agreement (Bridge2AI Voice Registered Access Agreement) can access files. Data covered under Certificate of Confidentiality which must be asserted against compulsory legal demands. Raw audio data available through controlled access only via Data Access Compliance Office (DACO) requiring distinct application. Recipient must adhere to PhysioNet requirements managed by MIT Laboratory for Computational Physiology, supported by NIBIB under grant R01EB030362.\n" + ] + }, + "license_and_use_terms.id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "license-001" + ] + }, + "license_and_use_terms.license_terms": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "license_and_use_terms.license_terms[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 28, + "sample_values": [ + "Appropriate administrative, technical, physical safeguards required", + "Attribution required to copyright holders and authors", + "Certificate of Confidentiality protections apply", + "Changes must be indicated", + "Compliance with applicable laws, rules, regulations, professional standards", + "Compliance with ethical and legal requirements", + "Controlled access data requires separate data use agreement", + "Controlled access through secure enclave", + "Data Access Committee oversight for ethical distribution", + "Data Use Agreement signature mandatory" + ] + }, + "license_and_use_terms.name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Bridge2AI Voice Registered Access License", + "CHoRUS Controlled Access License", + "Creative Commons Attribution Non-Commercial", + "Creative Commons Attribution Non-Commercial Share-Alike" + ] + }, + "maintainers": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "maintainers[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "Active GitHub organization (chorus-ai) housing repositories for software, semantic mappings, standard operating protocols, and project management. 28 repositories with comprehensive documentation and community support.\n", + "Multi-institutional consortium managing dataset maintenance including data acquisition centers, coordinating teams, and infrastructure development.\n", + "Multidisciplinary consortium managing dataset maintenance including University of California San Diego (lead), University of California San Francisco, Stanford University, University of Virginia, Yale University, University of Alabama at Birmingham, Simon Fraser University, and The Hastings Center. Data Governance Committee led by Jillian Parker. Ethical Review by Vardit Ravitsky and Jean-Christophe Belisle-Pipon.\n", + "Multidisciplinary consortium managing dataset maintenance including data collection sites, coordinating centers, and data governance committees.\n", + "Multidisciplinary consortium responsible for dataset maintenance including data collection, curation, standards development, ethics oversight, and distribution. Led by University of South Florida with multi-institutional partnerships.\n", + "PhysioNet platform managed by MIT Laboratory for Computational Physiology serves as primary distribution mechanism for public access dataset. Supported by NIBIB under NIH grant R01EB030362.\n" + ] + }, + "maintainers[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + "maintainer-001", + "maintainer-002" + ] + }, + "maintainers[item].maintainer_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "maintainers[item].maintainer_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 37, + "sample_values": [ + "14 data acquisition centers across United States", + "Bioethics and social science teams", + "Chorus_SOP repository (centralized documentation)", + "Community discussions and issue tracking", + "Data Access Committee (access policies)", + "Data Access Compliance Office (DACO) for controlled access", + "Data Acquisition team (extraction and contribution)", + "Data Governance Committee (Jillian Parker)", + "Documentation team (version-specific guides at https://docs.aireadi.org/)", + "GitHub organization: https://github.com/chorus-ai" + ] + }, + "maintainers[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "AI-READI Consortium", + "Bridge2AI-Voice Consortium", + "CHoRUS Consortium", + "CHoRUS GitHub Organization", + "CM4AI Consortium", + "PhysioNet / MIT Laboratory for Computational Physiology" + ] + }, + "name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "AI-READI", + "Bridge2AI-Voice", + "CHoRUS", + "CM4AI" + ] + }, + "page": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "https://chorus4ai.org/", + "https://docs.b2ai-voice.org", + "https://fairhub.io/datasets/2", + "https://www.cm4ai.org" + ] + }, + "preprocessing_strategies": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "preprocessing_strategies[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 22, + "sample_values": [ + "Acoustic feature extraction using OpenSMILE (Speech and Music Interpretation by Large-space Extraction), capturing temporal dynamics and acoustic characteristics. Features provided in static_features.tsv with one row per unique recording.\n", + "All data mapped to applicable data standard formats such as Observational Medical Outcomes Partnership Common Data Model for clinical data and DICOM format for retinal imaging. Data stored and shared as 'AI-ready' enabling immediate AI/ML research without reformatting.\n", + "All structured electronic health record data standardized to the OMOP (Observational Medical Outcomes Partnership) Common Data Model. Ensures interoperability and enables use of OHDSI (Observational Health Data Sciences and Informatics) tool stack for analysis.\n", + "Clinical notes processed using OHNLP (Open Health Natural Language Processing) toolkit for extraction and tokenization. Protects patient privacy while enabling natural language processing and analysis.\n", + "Community detection performed on co-embedding space using multiscale community detection algorithms implemented in Cytoscape. Produces hierarchical directed acyclic graphs (DAG) of protein assemblies at multiple resolutions, with 10 layers of depth representing communities from large cell compartments to small protein complexes.\n", + "DICOM imaging data undergoing de-identification process to remove patient identifiable information while preserving clinical utility and metadata.\n", + "Data export and conversion from REDCap using open source b2aiprep library developed by the team. Phenotype data merged into tab-delimited format with data dictionary (phenotype.json) providing column descriptions.\n", + "Data harmonized across three collection sites (Birmingham, San Diego, Seattle) using standardized operating procedures, common protocols, and centralized data management through REDCap. Ensures consistency and comparability across sites.\n", + "Data transformed using approaches that limit re-identification while maintaining analytical utility. Multiple preprocessing strategies employed to protect patient privacy across all data modalities.\n", + "Mel-frequency cepstral coefficients (MFCC) extraction - 60 MFCCs extracted from spectrograms. MFCCs capture perceptually-relevant spectral envelope characteristics important for voice analysis. Output dimension 60xN.\n" + ] + }, + "preprocessing_strategies[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 7, + "sample_values": [ + "preproc-001", + "preproc-002", + "preproc-003", + "preproc-004", + "preproc-005", + "preproc-006", + "preproc-007" + ] + }, + "preprocessing_strategies[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 15, + "sample_values": [ + "Biospecimen processing", + "Cell Map Annotation", + "Clinical note tokenization", + "Data mapping to standards", + "Data re-identification limitation", + "Data standardization and harmonization", + "Deep Learning Embedding Generation", + "Hierarchical Community Detection", + "Image format conversion", + "Medical imaging de-identification" + ] + }, + "preprocessing_strategies[item].preprocessing_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "preprocessing_strategies[item].preprocessing_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 96, + "sample_values": [ + "10-layer depth hierarchy from compartments to complexes", + "10ms hop length", + "512-point FFT", + "60 MFCC coefficients extracted", + "Acoustic characteristics quantified", + "Alignment to Gene Ontology for functional annotation", + "Alignment to Reactome pathways for pathway annotation", + "All-by-all similarity computation in co-embedding space", + "Application of de-identification algorithms", + "Automated protocols for consistency" + ] + }, + "purposes": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "purposes[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 10, + "sample_values": [ + "Address the grand challenge of interpretable genotype-phenotype learning in genomics and precision medicine. Machine learning models are often \"black boxes\" predicting phenotypes from genotypes without understanding the mechanisms. CM4AI enables \"visible\" machine learning systems informed by multi-scale cell and tissue architecture, allowing AI tools to interrogate how protein assemblies in the cell affect cell-level phenotype predictions.\n", + "Address the lack of racial and ethnic diversity in T2DM research by creating a dataset that is triple-balanced across race/ethnicity (Asian, Black, Hispanic, White), biological sex (male, female), and diabetes severity (no diabetes, pre-diabetes/lifestyle-controlled, medication-controlled, insulin-controlled).\n", + "Answer the grand challenge of improving recovery from acute illness by developing high-resolution multi-center datasets as a critical first step towards actionable and trustworthy AI in critical care. Address the urgent need for infrastructure to support artificial intelligence and machine learning (AI/ML) in critical care settings.\n", + "Better understand salutogenesis (the pathway from disease to health) in Type 2 Diabetes Mellitus using a hypothesis-agnostic, harmonized, multi-domain dataset designed specifically for AI/ML research. The dataset aims to provide critical insights into how individuals can transition from diabetes toward health resilience through pseudotime manifold analysis.\n", + "Deliver machine-readable hierarchical maps of cell architecture as AI-Ready data from multimodal interrogation of disease-relevant cell lines to enable transformative biomedical AI research. CM4AI produces integrated cell maps from spatial proteomics, protein-protein interactions, and genetic perturbations using state-of-the-art mass spectrometry, cell imaging, and CRISPR technologies.\n", + "Develop a publicly available, AI-ready critical care dataset from more than 100,000 critically ill patients while ensuring methods promote privacy, accountability, and clinical benefit. Generate the most diverse, high-resolution, ethically sourced dataset for AI/ML applications in acute and critical care.\n", + "Ensure comprehensive sets of patient conditions and clinical treatment strategies with appropriate contextual factors such as geographic distance to nearest hospital and Social Determinants of Health. Develop the skills and workforce for a next generation of diverse academic and community AI scientists through comprehensive training and education programs.\n", + "Establish standards, best practices, and guidelines for collection, preparation, and sharing of medical/health data sets targeted for AI/ML applications. This includes guidance from bioethicists on ethical and equitable data collection and management practices, with adherence to FAIR principles.\n", + "Establish standards, best practices, and guidelines for ethical AI-readiness in biomedical data. This includes implementing FAIR principles, computing machine-readable provenance graphs, characterizing and validating all datasets with JSON-Schema mini-data-dictionaries, and mapping data elements to public ontology vocabularies where appropriate.\n", + "Unify standards to harmonize multi-modal EHR, waveform, imaging, and text data. Develop software and tooling to interact with and extract insight from clinical data in diverse formats. Create validated semantic mappings for connecting clinical data in various source formats to international standards (OMOP Common Data Model).\n" + ] + }, + "purposes[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "purpose-001", + "purpose-002", + "purpose-003", + "purpose-004" + ] + }, + "purposes[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 10, + "sample_values": [ + "AI-Ready Cell Architecture Maps for Biomedical AI", + "Addressing demographic inequities in T2DM research", + "Create AI-ready critical care dataset", + "Establish data standards and tools", + "Establishing AI-Readiness Standards for Biomedical Data", + "Establishing AI/ML data standards", + "Improve recovery from acute illness", + "Interpretable Genotype-Phenotype Learning", + "Promote diversity and health equity", + "Understanding T2DM salutogenesis" + ] + }, + "purposes[item].response": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 3, + "sample_values": [ + "Create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health, addressing the pressing need for large, high quality, multi-institutional and diverse voice databases linked to other health biomarkers.\n", + "Establish standards, best practices, and guidelines for voice data collection and analysis to advance the field of acoustic biomarkers by developing new standards that are AI/ML friendly and enable voice to emerge as a biomarker of health.\n", + "Integrate the use of voice as a biomarker of health in clinical care by generating a substantial multi-institutional, ethically sourced, and diverse voice database linked to multimodal health biomarkers to fuel voice AI research and build predictive models to assist in screening, diagnosis, and treatment of a broad range of diseases.\n" + ] + }, + "retention_limit.description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Data Transfer and Use Agreement specifies retention requirements. Upon termination or expiration of agreement (two years after start date, project completion, or ethics approval expiration), data shall be destroyed per provider instructions with written certification required within 30 days. Recipient may retain one copy to extent necessary to comply with records retention requirements under law, regulation, institutional policy, and for research integrity and verification purposes. Restrictions apply to archival copies as long as recipient holds data.\n", + "Digital data maintained according to NIH data sharing policies and institutional requirements. Controlled access model ensures long-term availability for research while protecting patient privacy.\n", + "Digital data maintained according to NIH data sharing policies with long-term preservation in University of Virginia's LibraData repository supported by committed institutional funds. No planned sunset for data availability. Archived RO-Crates with persistent identifiers (ARK, future DOIs) ensure long-term accessibility and citability.\n", + "Digital data maintained according to NIH data sharing policies. Biospecimen retention subject to institutional policies and consent agreements. Finite number of biospecimen samples available for distribution.\n" + ] + }, + "retention_limit.id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "retention-001" + ] + }, + "retention_limit.name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Data and biospecimen retention", + "Data retention and disposition", + "Long-Term Preservation Plan", + "Long-term dataset retention" + ] + }, + "retention_limit.retention_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "retention_limit.retention_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 21, + "sample_values": [ + "Biospecimen retention per institutional policies at UAB CCTS", + "Consent agreements specify retention terms", + "Controlled access model for privacy protection", + "Data destruction required upon termination unless retention justified", + "Finite biospecimen availability", + "Institutional requirements at participating centers", + "Long-term maintenance through CHoRUS Consortium", + "Machine-readable metadata for long-term discoverability", + "NIH data sharing policies govern digital data retention", + "NIH data sharing policies govern retention" + ] + }, + "sampling_strategies": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sampling_strategies[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Federated access enables sampling methods to ensure a balanced and diverse cohort across 20 academic centers (14 data acquisition centers). Legal framework established for collecting data at scale with sampling to ensure comprehensive sets of patient conditions and clinical treatment strategies.\n", + "Patients presenting at specialty clinics and institutions were screened for inclusion and exclusion criteria prior to their visit by project investigators. Participants were selected based on membership to five predetermined disease cohort groups to ensure representation across conditions affecting voice: (1) Voice Disorders - laryngeal cancers, vocal fold paralysis, benign laryngeal lesions; (2) Neurological and Neurodegenerative Disorders - Alzheimer's, Parkinson's, stroke, ALS; (3) Mood and Psychiatric Disorders - depression, schizophrenia, bipolar disorders; (4) Respiratory disorders - pneumonia, COPD, heart failure, obstructive sleep apnea; (5) Pediatric diseases - autism, speech delay.\n", + "Purposive selection of two disease-relevant cell lines: MDA-MB-468 triple negative breast cancer cell line for cancer research, and KOLF2.1J iPSCs for neuropsychiatric and cardiac disorder research. Both cell lines ethically sourced and well-characterized in the literature.\n", + "Recruitment sampling procedures aimed at achieving approximately equal distribution of participants across three dimensions: (1) race/ethnicity (Asian, Black, Hispanic, White), (2) T2DM severity (no diabetes, pre-diabetes/lifestyle-controlled, medication-controlled, insulin-controlled), and (3) biological sex (male, female). This balanced design is critical for developing unbiased machine learning models.\n" + ] + }, + "sampling_strategies[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "sampling-001" + ] + }, + "sampling_strategies[item].is_random": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sampling_strategies[item].is_random[item]": { + "inferred_type": "bool", + "is_boolean": true, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + false + ] + }, + "sampling_strategies[item].is_representative": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sampling_strategies[item].is_representative[item]": { + "inferred_type": "bool", + "is_boolean": true, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + false, + true + ] + }, + "sampling_strategies[item].is_sample": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sampling_strategies[item].is_sample[item]": { + "inferred_type": "bool", + "is_boolean": true, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + false, + true + ] + }, + "sampling_strategies[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 3, + "sample_values": [ + "Disease-Relevant Cell Line Selection", + "Multi-center federated sampling", + "Triple-balanced recruitment" + ] + }, + "sampling_strategies[item].strategies": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sampling_strategies[item].strategies[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 17, + "sample_values": [ + "Both cell lines have extensive existing characterization data", + "Community-facing ethics focus groups to determine appropriate data for public sharing", + "Federated multi-center data collection across 14 hospitals", + "Inclusion of contextual factors (geographic distance to hospital, Social Determinants of Health)", + "KOLF2.1J chosen as reference iPSC line for large-scale collaborative studies", + "Legal framework for data collection at scale", + "MDA-MB-468 chosen for triple-negative breast cancer research applications", + "Multi-institutional enrollment across five sites in North America", + "Personalized invitation letters and emails with REDCap recruitment interface", + "Recruitment from electronic health records screening using ICD-10 codes (R73.09 for pre-diabetes, E11.X for T2DM)" + ] + }, + "sensitive_elements": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sensitive_elements[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "5-digit zip code, detailed race, ethnicity, and sex information available in controlled access dataset only. Public dataset contains de-identified data.\n", + "Complete electronic health records including demographics, diagnoses, procedures, medications, nursing documentation, and clinical notes for critically ill patients. Contains protected health information subject to HIPAA and institutional privacy requirements.\n", + "Contextual factors including geographic information (distance to hospital) and social determinants of health data. Collected to support health equity research while maintaining patient privacy.\n", + "Continuous waveform telemetry and EEG recordings capturing detailed physiological states of critically ill patients. May reveal sensitive health conditions and treatment responses.\n", + "Dataset covered under Certificate of Confidentiality which must be asserted against compulsory legal demands such as court orders and subpoenas for identifying information or characteristics of research participants. Provides additional legal protections beyond standard de-identification.\n", + "Demographic information and geographic data collected but de-identified for public release. State and province removed, only country retained. Protected by HIPAA Safe Harbor de-identification standards.\n", + "Diagnostic imaging studies in DICOM format. Undergoing de-identification to remove embedded patient information while preserving clinical utility.\n", + "Electronic health record (EHR) data accessed with participant consent for gold standard validation of diagnoses and symptoms. Medical information linked to voice data provides sensitive health information requiring protection.\n", + "Genomic DNA extracted from buffy coats, blood derivatives, and urine samples stored with potential for future genetic analyses. Available in controlled access dataset only.\n", + "Past health records, medications, traffic and accident reports available in controlled access dataset only.\n" + ] + }, + "sensitive_elements[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "sensitive-001", + "sensitive-002", + "sensitive-003", + "sensitive-004" + ] + }, + "sensitive_elements[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 8, + "sample_values": [ + "Cell Line Origin Metadata", + "Clinical and medical data", + "Genetic and biospecimen data", + "Geographic and demographic identifiers", + "Medical history and records", + "Medical imaging", + "Physiological monitoring data", + "Social Determinants of Health" + ] + }, + "sensitive_elements[item].sensitive_elements_present": { + "inferred_type": "bool", + "is_boolean": true, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 2, + "sample_values": [ + false, + true + ] + }, + "sensitive_elements[item].sensitivity_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "sensitive_elements[item].sensitivity_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 49, + "sample_values": [ + "5-digit zip code", + "Biological sex", + "Blood derivatives and urine biospecimens", + "CT scans and X-rays", + "Cannot be matched to individuals with current knowledge", + "Certificate of Confidentiality coverage", + "Clinical notes (tokenized for privacy)", + "Contextual factors for equity research", + "Continuous cardiac waveforms", + "Controlled access required for raw audio" + ] + }, + "subpopulations": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "subpopulations[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 22, + "sample_values": [ + "KOLF2.1J iPSCs differentiated into cardiomyocytes", + "KOLF2.1J iPSCs differentiated into neural progenitor cells", + "KOLF2.1J iPSCs differentiated into neurons", + "KOLF2.1J induced pluripotent stem cells in undifferentiated/naive state", + "MDA-MB-468 breast cancer cells in control/untreated condition", + "MDA-MB-468 breast cancer cells treated with paclitaxel chemotherapy", + "MDA-MB-468 breast cancer cells treated with vorinostat chemotherapy", + "Participants with conditions such as Alzheimer's disease, Parkinson's disease, stroke, and ALS exhibiting voice and speech changes including slowed speech, low frequency, monotonous speech, vocal tremor, dysarthria, and aphasia.\n", + "Participants with depression, schizophrenia, bipolar disorders, and anxiety disorders showing vocal changes such as decreased fundamental frequency, monotonous speech, and anxiety-related increases in F0.\n", + "Participants with diabetes treated with oral medications or non-insulin injections, target ~1,000 participants (25% of sample)" + ] + }, + "subpopulations[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 8, + "sample_values": [ + "subpop-001", + "subpop-002", + "subpop-003", + "subpop-004", + "subpop-005", + "subpop-006", + "subpop-007", + "subpop-008" + ] + }, + "subpopulations[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 22, + "sample_values": [ + "Asian participants", + "Black participants", + "Critically ill patients by hospital", + "Hispanic participants", + "Insulin-controlled diabetes", + "KOLF2.1J Undifferentiated iPSCs", + "KOLF2.1J iPSC-Derived Cardiomyocytes", + "KOLF2.1J iPSC-Derived Neural Progenitor Cells (NPCs)", + "KOLF2.1J iPSC-Derived Neurons", + "MDA-MB-468 Paclitaxel-Treated" + ] + }, + "subsets": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "subsets[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 15, + "sample_values": [ + "Affinity purification mass spectrometry (AP-MS) data on endogenously tagged cell lines mapping protein-protein interactions of chromatin regulators. 17 genes endogenously tagged in MDA-MB-468 with data acquired under three conditions (untreated, paclitaxel, vorinostat). Orthogonal approach to SEC-MS for PPI mapping.\n", + "Biobanked samples stored at UAB Center for Clinical and Translational Science (CCTS), including plasma, serum, buffy coats, peripheral blood mononuclear cells (PBMCs), PAXgene RNA, and urine. Available to researchers for future ancillary studies according to procedures and policies in development. Finite number of samples available.\n", + "Clinical notes extracted and tokenized using OHNLP (Open Health Natural Language Processing) toolkit. Stored locally at contributing sites (except tokens) with planned controlled access. Uses OHNLP open source schema for standardization.\n", + "Contains derived features from voice recordings including spectrograms, MFCCs, acoustic features (OpenSMILE), phonetic and prosodic features (Parselmouth and Praat), and transcriptions (OpenAI Whisper). Also includes phenotype data with demographics, acoustic confounders, and responses to validated questionnaires. Available through PhysioNet with registered access requiring data use agreement. HIPAA Safe Harbor identifiers removed, state/province removed, country retained. Audio waveforms omitted, only derived features available. Free speech transcripts removed to protect privacy.\n", + "Electroencephalography (EEG) waveform data from hospital databases. Stored in EDF+ (European Data Format) and Persyst formats. Extraction in process as of November 2024, planned for controlled access. Open source EDF+ and Persyst schema available.\n", + "Genome-scale CRISPRi perturbation cell atlas in undifferentiated KOLF2.1J human induced pluripotent stem cells (hiPSCs) mapping transcriptional and fitness phenotypes associated with 11,739 targeted genes. Single-cell CRISPR screens performed using 10x Genomics 3'HT kit with CRISPR lentiviral library targeting 100 chromatin factors with 6 guide RNAs per gene. Screens conducted in MDA-MB-468 cells under 3 conditions (no treatment, paclitaxel, vorinostat) and KOLF2.1J iPSC in undifferentiated state. Includes raw sequence data and processed cell atlas data.\n", + "Imaging data from PACS (Picture Archiving and Communication System) in DICOM format. De-identification in process as of November 2024, planned for controlled access. DICOM schema metadata available.\n", + "Immunofluorescence-based staining (ICC-IF) and confocal microscopy images displaying spatial localization of 563 proteins of interest in MDA-MB-468 breast cancer cells under three conditions: untreated, paclitaxel-treated, and vorinostat-treated. Nuclei stained with DAPI (blue channel), endoplasmic reticulum with calreticulin antibody (yellow channel), microtubules with tubulin antibody (red channel), and antibody against protein of interest (green channel). Generated by Lundberg Lab at Stanford University using automated fixation and permeabilization protocols.\n", + "Includes data not considered sensitive personal health information, available to the public for download upon agreement with a license. Contains survey data, blood and urine lab results, fitness activity levels, clinical measurements (e.g., monofilament and cognitive function testing), retinal images, ECG, blood glucose levels, and environmental variables such as home air quality. Available at https://fairhub.io/datasets/2.\n", + "Includes sensitive data accessible by entering into a data use agreement. Contains 5-digit zip code, sex, race, ethnicity, genetic sequencing data (from buffy coats), past health records, medications, and traffic and accident reports. Access requirements are being developed by the Data Access Committee.\n" + ] + }, + "subsets[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 5, + "sample_values": [ + "subset-001", + "subset-002", + "subset-003", + "subset-004", + "subset-005" + ] + }, + "subsets[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 14, + "sample_values": [ + "Biorepository", + "CRISPR Perturbation Cell Atlas", + "Clinical Notes (Tokenized)", + "Controlled Access Dataset", + "Controlled Access Raw Audio Dataset", + "EEG Waveforms", + "Hierarchical Cell Maps via MuSIC", + "Medical Imaging", + "Protein-Protein Interaction AP-MS Data", + "Protein-Protein Interaction SEC-MS Data" + ] + }, + "tasks": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "tasks[item].description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "Characterize cell architecture and protein interactions in disease-relevant cell lines including treated and untreated MDA-MB-468 breast cancer cells (with paclitaxel and vorinostat) and differentiated and naive KOLF2.1J induced pluripotent stem cells (iPSCs) differentiated into neurons and cardiomyocytes.\n", + "Develop ethical AI frameworks and governance structures for biomedical data, including Value-Sensitive Design methodologies, axiological repositories, CM4AI Life Cycle framework, and guidelines for responsible design of datasets and AI technologies.\n", + "Enable development of visible neural networks (VNNs) and visible machine learning tools that use hierarchical cell maps as interpretable structures for AI model architectures, allowing interrogation of how protein assemblies affect cell-level phenotypes and interpretation of genetic variants and mutations.\n", + "Enable downstream AI/ML analyses across survey, clinical, imaging, wearable device, environmental, and biospecimen domains related to T2DM that may not be feasible with existing data sources such as claims or electronic health records data alone. The multimodal nature of the data supports complex machine learning model development.\n", + "Enable prediction of complications among patients with acute or critical illness using multi-modal data including structured EHR, waveforms, imaging, and clinical notes.\n", + "Generate data for ML/AI applications aimed at characterizing acute and critical care illness patterns, progression, and outcomes across diverse patient populations and hospital settings.\n", + "Integrate multimodal data streams (spatial proteomics via IF imaging, protein-protein interactions via AP-MS and SEC-MS, and genetic perturbations via CRISPR screens) using the Multi-Scale Integrated Cell (MuSIC) software pipeline employing deep learning models and community detection algorithms to produce hierarchical cell maps.\n", + "Provision a holdout test set accessible for model external validation to aid marketplace adoption of AI-developed models for implementation in acute and critical care settings.\n", + "Study disease trajectories and salutogenesis pathways in T2DM through cross-sectional analysis of participants at different disease stages, enabling pseudotime manifold analysis to predict disease progression and paths to health resilience.\n", + "Support measurement and analysis of treatment response among critically ill patients through high-frequency documentation, medication administration records, and clinical outcomes data.\n" + ] + }, + "tasks[item].id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "task-001", + "task-002", + "task-003", + "task-004", + "task-005", + "task-006" + ] + }, + "tasks[item].name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 12, + "sample_values": [ + "Characterize acute and critical care illness", + "Develop unbiased AI/ML models", + "Disease-Relevant Cell Line Characterization", + "Enable multi-domain AI/ML analyses for T2DM", + "Ethical AI Framework Development", + "External validation for marketplace adoption", + "Label data for prediction targets", + "Measure treatment response", + "Multi-Scale Cell Mapping via MuSIC Pipeline", + "Predict complications in critically ill patients" + ] + }, + "tasks[item].response": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 6, + "sample_values": [ + "Build AI models for pediatric voice and speech disorder detection including autism spectrum disorder and speech delays, addressing the relative scarcity of pediatric voice data and associated ethical challenges.\n", + "Create machine learning models for respiratory disorder screening and therapeutic monitoring using respiratory sounds, cough sounds, and voice, applicable to conditions such as pneumonia, COPD, heart failure, and obstructive sleep apnea.\n", + "Develop AI algorithms for mood and psychiatric disorder detection including depression, schizophrenia, and bipolar disorders, identifying vocal markers such as decreased fundamental frequency, monotonous speech patterns, and anxiety-related increases in F0.\n", + "Enable development of AI/ML predictive models for screening, diagnosis, and treatment of voice disorders including laryngeal cancers, vocal fold paralysis, and benign laryngeal lesions, leveraging acoustic changes in phonation resulting from changes in vocal fold vibratory function.\n", + "Promote application of AI/ML for voice research through workforce development, curriculum creation, and fostering collaborations especially with researchers from underserved communities, building bridges between medical voice research, acoustic engineers, and the AI/ML community.\n", + "Support machine learning models for neurological and neurodegenerative disorders including Alzheimer's disease, Parkinson's disease, stroke, and ALS, detecting voice and speech changes such as slowed speech, low frequency, monotonous speech, vocal tremor, dysarthria, and aphasia.\n" + ] + }, + "title": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI)", + "Bridge2AI-Voice - An ethically-sourced, diverse voice dataset linked to health information", + "Cell Maps for Artificial Intelligence (CM4AI)", + "Patient-Focused Collaborative Hospital Repository Uniting Standards (CHoRUS) for Equitable AI" + ] + }, + "updates.description": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Dataset regularly updated and augmented through end of project in November 2026. Beta releases on quarterly basis with periodic data augmentation. Initial alpha release (v0.5) provided as supplemental data. March 2025 Beta (V1.4) includes perturb-seq in KOLF2.1J iPSCs, SEC-MS in iPSCs and derivatives, and IF images in MDA-MB-468 under three conditions. June 2025 Beta (V2.1) revision adds RGB IF images, ro-crate metadata corrections, and naming convention changes. Future releases will include computed cell maps and complete integration of all data streams. Long-term preservation in University of Virginia Dataverse with committed institutional support.\n", + "Dataset updated continuously as data collection progresses at 14 acquisition centers. As of November 2024, covers 14 hospitals with 23,400 unique admissions. Target exceeds 100,000 critically ill patients. Project timeline extends through November 30, 2026 (with approved no-cost extension). Regular status updates tracked through GitHub project management system. Sites provide updates via GitHub interface or Google Form submissions.\n", + "Dataset updated periodically as enrollment progresses toward target of 4,000 participants by November 2026. Version-specific documentation maintained for each release. Biorepository maintained at UAB CCTS with long-term storage protocols. Data sharing policies under ongoing development by Data Access Committee. Pilot data released May 2024; all data through July 31, 2024 released November 2024.\n", + "Dataset updated with versioned releases as data collection progresses. Initial release v1.0 published January 17, 2025 with 12,523 recordings from 306 participants. v1.1 released January 17, 2025 adding MFCC features. v2.0.0 released April 16, 2025. v2.0.1 released August 18, 2025. Latest version available at https://doi.org/10.13026/37yb-1t42. Data collection ongoing through November 30, 2026. Version-specific documentation maintained. As of v1.1, only adult cohort data available; pediatric cohort data planned for future releases with additional privacy precautions.\n" + ] + }, + "updates.frequency": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Continuous updates through November 2026", + "Periodic releases with ongoing enrollment; final release planned for late 2026", + "Periodic versioned releases during data collection period (2022-2026)", + "Quarterly updates through November 2026; long-term preservation thereafter" + ] + }, + "updates.id": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 1, + "sample_values": [ + "updates-001" + ] + }, + "updates.name": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": true, + "is_multivalued": false, + "value_count": 4, + "sample_values": [ + "Ongoing data collection and expansion", + "Periodic data releases and maintenance plan", + "Quarterly Data Releases and Maintenance Plan", + "Versioned releases with ongoing data collection" + ] + }, + "updates.update_details": { + "inferred_type": "unknown", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": true, + "value_count": 0, + "sample_values": [] + }, + "updates.update_details[item]": { + "inferred_type": "str", + "is_boolean": false, + "could_be_enum": false, + "is_multivalued": false, + "value_count": 31, + "sample_values": [ + "Alpha release v0.5 (supplemental data)", + "Current status: 23,400 unique admissions (as of November 2024)", + "DOI for latest version vs version-specific DOIs", + "Dataset versioning implemented", + "Documentation updates in Chorus_SOP repository", + "Final dataset expected after completion of 4,000 participant enrollment by November 2026", + "Final release expected November 2026", + "Future releases planned with additional participants and pediatric cohort", + "Future releases to include computed cell maps", + "GitHub project tracking for deliverables" + ] + } + } +} \ No newline at end of file diff --git a/reports/dcterms_description_analysis.md b/reports/dcterms_description_analysis.md new file mode 100644 index 00000000..35b16ef2 --- /dev/null +++ b/reports/dcterms_description_analysis.md @@ -0,0 +1,290 @@ +# dcterms:description Conflict Analysis + +**Issue:** 40 slots map to `dcterms:description`, causing semantic flattening +**Severity:** CRITICAL +**Status:** Architectural decision required + +--- + +## Distribution by Pattern + +### Pattern 1: *_details suffix (30 slots - 75%) + +**Data Collection (6):** +- acquisition_details +- collection_details +- collector_details +- mechanism_details +- timeframe_details +- source_description (close to _details pattern) + +**Data Quality/Issues (3):** +- anomaly_details +- bias_description (should group with quality) +- limitation_description (should group with quality) + +**Privacy/Security (3):** +- confidentiality_details +- deidentification_details +- sensitivity_details + +**Ethics (5):** +- consent_details +- impact_details (ethics context) +- notification_details +- review_details +- revocation_details + +**Maintenance/Versioning (6):** +- erratum_details +- extension_details +- maintainer_details +- retention_details +- update_details +- version_details + +**Preprocessing/Processing (4):** +- cleaning_details +- labeling_details +- preprocessing_details +- raw_data_details + +**Use Cases (4):** +- discouragement_details +- impact_details (use case context) +- repository_details +- task_details + +**Data Structure (2):** +- relationship_details +- split_details + +### Pattern 2: Question-oriented (2 slots - 5%) +- why_missing +- missing + +### Pattern 3: Content-specific (8 slots - 20%) +- distribution (subpopulation distribution) +- future_guarantees (external resource commitments) +- identification (subpopulation identification methods) +- quality_notes (variable quality notes) +- response (appears 3× in D4D_Motivation - different question contexts) +- warnings (content warnings) + +--- + +## Semantic Analysis + +### Are these truly "descriptions"? + +**YES - appropriate dcterms:description:** +- bias_description +- limitation_description +- source_description +- quality_notes +- warnings + +**BORDERLINE - could use more specific terms:** +- why_missing → dcterms:description is OK but could be more specific +- missing → describes what's missing (dcterms:description acceptable) +- future_guarantees → commitment/policy statement (not pure description) +- identification → methodology (dcterms:description acceptable) +- distribution → statistical distribution (schema:description more specific?) + +**NO - semantically different from pure description:** +- 30× *_details fields → procedural/technical details, not descriptions +- response → answer to specific question (not description of the dataset) + +--- + +## Options Analysis + +### Option 1: Accept As-Is (Status Quo) + +**Approach:** Keep all 40 slots using `dcterms:description` + +**Pros:** +- ✅ Zero effort +- ✅ No breaking changes +- ✅ Simple schema +- ✅ Uses standard vocabulary + +**Cons:** +- ❌ Semantic flattening - loses distinction between description types +- ❌ RDF consumers can't distinguish collection details from quality notes +- ❌ Misses opportunity for semantic precision +- ❌ Not best practice for domain-specific schemas + +**Use case impact:** +- SPARQL queries: "Get all descriptions" returns mixed semantic content +- RDF reasoning: Cannot distinguish procedural details from quality issues +- Interoperability: Generic mapping provides no domain context + +--- + +### Option 2: Full Differentiation (40 custom terms) + +**Approach:** Create 40 custom d4d: terms (one per slot) + +**Examples:** +- d4d:acquisitionDetails +- d4d:collectorDetails +- d4d:biasDescription +- etc. + +**Pros:** +- ✅ Maximum semantic precision +- ✅ Every field has unique ontology identity +- ✅ Best for machine reasoning + +**Cons:** +- ❌ Significant effort (40 terms to define/document) +- ❌ No reuse of standard vocabulary +- ❌ Maintenance burden +- ❌ Potential over-engineering + +**Use case impact:** +- SPARQL queries: Very specific ("get acquisition details") +- RDF reasoning: Can distinguish every detail type +- Interoperability: Requires d4d ontology understanding + +--- + +### Option 3: Hybrid - Pattern-Based Grouping (RECOMMENDED) + +**Approach:** Group by semantic category, use specific terms where clear standards exist + +#### Phase 1: High-Value Differentiations (immediate) + +**A. Quality/Issues → Specific terms** +- bias_description → KEEP `dcterms:description` (pure description) +- limitation_description → KEEP `dcterms:description` (pure description) +- anomaly_details → `d4d:anomalyDetails` (technical details, not description) + +**B. Question-Answer → Specific terms** +- response (3×) → `d4d:questionResponse` (answer to question, not description) + +**C. Statistical → Schema.org** +- distribution (subpopulation) → Consider `schema:description` or custom + +**D. Content Warnings → Keep** +- warnings → KEEP `dcterms:description` (content description) + +#### Phase 2: *_details Pattern (deferred/optional) + +**Option 3A: Single umbrella term** +- All 30 *_details → `d4d:technicalDetails` +- Broad enough to cover all contexts +- Distinguishes from pure descriptions + +**Option 3B: Category-specific namespaces** +- Collection: `d4d:collectionDetails` +- Ethics: `d4d:ethicsDetails` +- Maintenance: `d4d:maintenanceDetails` +- Preprocessing: `d4d:preprocessingDetails` +- Quality: `d4d:qualityDetails` +- Use: `d4d:useDetails` +- etc. + +**Option 3C: Keep as dcterms:description** +- Accept that *_details is a common pattern +- Semantic overlap acceptable for "details about X" + +--- + +## Recommended Decision: Option 3 - Minimal Hybrid + +### Immediate Actions (4 changes): + +1. **response (3 slots) → d4d:questionResponse** + - File: D4D_Motivation.yaml + - Rationale: Answers to questions, not dataset descriptions + - Impact: Clarifies Q&A pattern vs descriptive text + +2. **anomaly_details → d4d:anomalyDetails** + - File: D4D_Composition.yaml + - Rationale: Technical error/noise details, semantically distinct from quality descriptions + - Impact: Separates technical details from descriptive content + +3. **quality_notes → d4d:qualityNotes** + - File: D4D_Variables.yaml + - Rationale: Variable-level quality notes, specific context + - Impact: Distinguishes variable quality from general descriptions + +4. **future_guarantees → d4d:availabilityGuarantee** + - File: D4D_Composition.yaml + - Rationale: Commitment/policy about external resources, not description + - Impact: Clarifies this is a guarantee statement + +### Accept as dcterms:description (36 slots): + +**Pure descriptions (semantically correct):** +- bias_description +- limitation_description +- source_description +- warnings +- identification +- missing +- why_missing +- distribution + +**Details fields (acceptable semantic overlap):** +- All 30× *_details fields +- Rationale: "Details about X" is a form of description +- Benefit: Reuses standard vocabulary +- Trade-off: Loses some semantic precision but acceptable + +--- + +## Impact Assessment + +### Recommended Approach Impact + +**Conflicts reduced:** 40 → 36 slots (10% reduction) +**Custom terms created:** 4 (minimal) +**Standard vocabulary retained:** 90% of fields still use dcterms:description +**Semantic clarity gained:** +- Question responses separated from descriptions +- Technical anomaly details distinguished +- Variable quality notes contextualized +- Resource guarantees clarified + +### Effort vs. Benefit + +| Approach | Effort | Benefit | Ratio | +|----------|--------|---------|-------| +| Option 1 (status quo) | 0 | 0 | - | +| Option 2 (40 terms) | Very High | High | Low | +| **Option 3 (4 terms)** | **Low** | **Medium** | **High** | + +--- + +## Alternative: Deferred Decision + +If immediate decision is difficult, can: +1. Document dcterms:description semantic overlap as ACCEPTABLE +2. Add comment explaining the 40 slots represent different detail types +3. Revisit if real-world usage shows need for differentiation + +**Recommendation:** Proceed with Option 3 (4 targeted changes) - best balance of precision and pragmatism. + +--- + +## Next Steps (if Option 3 approved) + +1. Apply 4 fixes: + - response → d4d:questionResponse (D4D_Motivation.yaml, 3 instances) + - anomaly_details → d4d:anomalyDetails (D4D_Composition.yaml) + - quality_notes → d4d:qualityNotes (D4D_Variables.yaml) + - future_guarantees → d4d:availabilityGuarantee (D4D_Composition.yaml) + +2. Document in ontology_mapping_guide.md: + - Rationale for 4 custom terms + - Justification for accepting 36 dcterms:description slots + - Explain semantic overlap is intentional and acceptable + +3. Update semantic_fixes_session3.md with decision + +4. Regenerate schema and validate + +5. Mark Task #4 complete diff --git a/reports/fixes_applied.md b/reports/fixes_applied.md new file mode 100644 index 00000000..aed173ac --- /dev/null +++ b/reports/fixes_applied.md @@ -0,0 +1,292 @@ +# D4D Schema Semantic Fixes Applied + +**Date:** 2026-04-08 +**Branch:** add-schema-descriptions + +## Summary + +Addressing semantic issues identified in the comprehensive semantic review. Total issues identified: 136 (9 CRITICAL, 54 HIGH, 29 MEDIUM, 1 LOW). + +--- + +## Fixes Applied + +### 1. slot_uri Conflict: dcat:mediaType ✅ FIXED + +**Issue:** Both `encoding` and `media_type` mapped to `dcat:mediaType`, causing semantic collision. + +**Problem:** +- `encoding` (character encoding like UTF-8, Latin-1) ≠ `media_type` (MIME type like application/json) +- DCAT spec defines `dcat:mediaType` for MIME types only +- RDF serialization ambiguity + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Base_import.yaml` line 301 +- **Change:** `encoding` slot_uri: `dcat:mediaType` → `d4d:characterEncoding` +- **Rationale:** Character encoding is conceptually different from MIME type; no standard DCAT property exists for character encoding, so using custom D4D namespace term + +**Impact:** +- ✅ Resolves CRITICAL semantic conflict +- ✅ No data migration needed (slot_uri changes don't affect YAML data) +- ✅ Improves RDF/DCAT compliance +- ⚠️ Tools relying on dcat:mediaType for encoding will need update (unlikely) + +--- + +## Fixes Still Needed + +### HIGH PRIORITY + +#### 2. slot_uri Conflict: schema:identifier (8 usages) + +**Current usages:** +- `id` (D4D_Base_import) - Generic identifier ✓ OK +- `orcid` (D4D_Base_import) - Should use more specific ORCID property +- `identifiers_removed` (D4D_Composition) - ❌ WRONG: list of removed identifier TYPES, not identifiers +- `target_dataset` (D4D_Composition) - Should use dcterms:relation or similar +- `latest_version_doi` (D4D_Maintenance) - Should use dcterms:hasVersion +- `grant_number` (D4D_Motivation) - Should be more specific (funding identifier) +- `is_identifier` (D4D_Variables) - Boolean about whether variable is identifier (meta) + +**Recommended fixes:** +- `orcid` → Keep `schema:identifier` with added exact_mapping to ORCID ontology +- `identifiers_removed` → Change to `d4d:removedIdentifierTypes` +- `target_dataset` → Change to `dcterms:relation` +- `latest_version_doi` → Change to `dcterms:hasVersion` +- `grant_number` → Change to custom `d4d:grantIdentifier` or keep with note +- `is_identifier` → Change to `d4d:isIdentifier` (meta-property) + +#### 3. slot_uri Conflict: dcterms:description (40+ usages) - REQUIRES ARCHITECTURAL DECISION + +**Problem:** Massive semantic flattening - 40+ different slots all map to generic `dcterms:description` + +**Examples:** +- `acquisition_details`, `mechanism_details`, `collector_details` - All different aspects of data collection +- `bias_description`, `limitation_description`, `anomaly_details` - Different quality concerns +- Many `*_details` fields + +**Options:** +1. **Leave as-is** (current state) + - ✅ Simple, no breaking changes + - ❌ Loses semantic precision in RDF + +2. **Create specific D4D terms** for each concept + - ✅ Maximum semantic precision + - ❌ Significant effort, no reuse of standard terms + - Example: `d4d:acquisitionDetails`, `d4d:biasDescription`, etc. + +3. **Hybrid approach**: Use more specific DCTERMS/Schema.org where available, custom for rest + - `bias_description` → Keep `dcterms:description` (generic is OK) + - `source_description` → Change to `dcterms:source` + - `*_details` → Create d4d:details pattern or leave generic + - ✅ Balanced precision vs effort + - ⚠️ Some terms still generic + +**Recommendation:** Option 3 (hybrid) - prioritize fields with clear standard mappings + +#### 4. slot_uri Conflict: dcat:accessURL (3 usages) + +**Current usages:** +- `access_urls` (D4D_Distribution) - ✓ Correct usage +- `erratum_url` (D4D_Maintenance) - Should be custom +- `access_url` (D4D_Preprocessing - raw data) - Could be OK or custom + +**Recommended fixes:** +- `access_urls` → Keep `dcat:accessURL` +- `erratum_url` → Change to `d4d:erratumURL` +- `access_url` → Either keep or change to `d4d:rawDataAccessURL` + +#### 5. slot_uri Conflict: dcterms:creator (2 usages) + +**Current usages:** +- `created_by` (D4D_Base_import) - ✓ Correct usage +- `principal_investigator` (D4D_Motivation) - Semantically different (role-based) + +**Recommended fix:** +- `principal_investigator` → Change to `d4d:principalInvestigator` + +#### 6. slot_uri Conflict: dcterms:license (2 usages) + +**Current usages:** +- `license` (Software) - Software license +- `license` (top-level slot) - Dataset license + +**Issue:** Same concept, different entities + +**Recommended fix:** +- Consider acceptable (both are licenses) +- OR differentiate: Software.license → `d4d:softwareLicense` + +### MEDIUM PRIORITY + +#### 7. Range Mismatches (51 HIGH priority) + +**Note:** Many flagged issues are false positives from the automated checker + +**Genuine issues to investigate:** +- Fields with "(e.g., ...)" triggering multivalued false positives ✓ VERIFIED: Most already correct +- Boolean fields - need semantic review to determine if oversimplified + +**Action:** Manual review of each HIGH priority range mismatch for genuine issues + +### LOW PRIORITY + +#### 8. Enum Candidates (75 fields) + +**Data analysis identified** 75 string fields with limited value sets + +**Examples:** +- `id`, `name`, `title`, `description`, `page` - Generic identifiers/text (OK as strings) +- Others may benefit from controlled vocabularies + +**Action:** Review enum candidates, create enums where beneficial for data quality + +--- + +## Validation Status + +### Before Fixes +```bash +make semantic-review +``` +- slot_uri conflicts: 17 +- dcat:mediaType conflict: CRITICAL + +### After Current Fixes +- ✅ dcat:mediaType conflict: RESOLVED +- slot_uri conflicts: 16 remaining +- Next: schema:identifier, dcterms:description, dcat:accessURL + +### Testing +```bash +make test-schema # Verify schema still valid +make gen-project # Regenerate artifacts +make test # Run all tests +``` + +--- + +## Implementation Plan + +### Phase 1: Clear Wins (IN PROGRESS) +- [x] Fix dcat:mediaType conflict (encoding) +- [ ] Fix dcterms:creator conflict (principal_investigator) +- [ ] Fix dcat:accessURL conflict (erratum_url) + +### Phase 2: schema:identifier Differentiation +- [ ] Change identifiers_removed slot_uri +- [ ] Change latest_version_doi slot_uri +- [ ] Change is_identifier slot_uri +- [ ] Consider grant_number, target_dataset + +### Phase 3: Architectural Decision on dcterms:description +- [ ] Review all 40+ usages +- [ ] Identify fields with clear standard alternatives +- [ ] Create D4D custom terms for remainder +- [ ] Document rationale + +### Phase 4: Validation & Documentation +- [ ] Run semantic-review after fixes +- [ ] Update semantic_review_report.md +- [ ] Document all slot_uri decisions in ontology_mapping_guide.md +- [ ] Update CLAUDE.md with semantic validation practices + +--- + +## Breaking Change Assessment + +### Non-Breaking (Safe to implement) +- ✅ slot_uri changes (RDF/JSON-LD only, YAML data unaffected) +- ✅ Adding exact_mappings, broad_mappings (additive) +- ✅ Improving descriptions (documentation only) + +### Potentially Breaking +- ⚠️ range changes (boolean → enum, string → class) + - Requires data migration if deployed + - Consider v2.0 schema version +- ⚠️ Adding multivalued where missing + - Data may need wrapping in lists + - Check existing data first + +### Coordination Needed +- 🤝 RDF/DCAT converters +- 🤝 FAIRSCAPE integration tools +- 🤝 Validation tools relying on specific slot_uris + +--- + +### 9. slot_uri Conflict: dcat:landingPage ✅ FIXED + +**Issue:** `contribution_url` (contribution guidelines) mapped to `dcat:landingPage` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Maintenance.yaml` line 149 +- **Change:** slot_uri: `dcat:landingPage` → `d4d:contributionURL` +- **Rationale:** Contribution guidelines URL is semantically different from dataset landing page + +### 10. slot_uri Conflict: dcat:accessURL ✅ PARTIALLY FIXED + +**Issue:** `erratum_url` (erratum access) incorrectly mapped to `dcat:accessURL` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Maintenance.yaml` line 66 +- **Change:** slot_uri: `dcat:accessURL` → `d4d:erratumURL` +- **Rationale:** Erratum-specific access point is different from general dataset access + +**Remaining:** `access_url` in D4D_Preprocessing still conflicts with `access_urls` in D4D_Distribution + +### 11. schema:identifier Conflict: identifiers_removed ✅ FIXED + +**Issue:** `identifiers_removed` mapped to `schema:identifier` but contains list of removed identifier TYPES, not identifiers + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Composition.yaml` line 388 +- **Change:** slot_uri: `schema:identifier` → `d4d:removedIdentifierTypes` +- **Rationale:** Semantic inversion - this documents what types of identifiers were removed (e.g., "SSN", "name"), not identifier values themselves + +--- + +## Metrics + +### Conflict Reduction +**Before fixes:** +- Total slot_uri conflicts: 17 +- CRITICAL: 9 + +**After fixes:** +- Total slot_uri conflicts: 15 ✅ (12% reduction) +- CRITICAL: 8 ✅ (11% reduction) + +### Issues Resolved +- CRITICAL: 4/9 (44%) ✅ Fixed: dcat:mediaType, dcat:landingPage, dcat:accessURL (partial), identifiers_removed +- HIGH: 0/54 (0%) +- MEDIUM: 0/29 (0%) +- LOW: 0/1 (0%) + +**Total progress: 4 issues fully resolved** + +### Remaining Critical Issues +1. dcterms:description (40 slots) - Massive semantic flattening +2. dcterms:accessRights (3 slots) +3. dcterms:creator (2 slots) - NOTE: May be acceptable shared usage +4. dcterms:format (2 slots) +5. dcterms:license (2 slots) - Software vs dataset license +6. dcterms:type (3 slots) +7. schema:affiliation (2 slots) +8. dcat:accessURL (2 slots remaining) + +### Next Session Goals +- Resolve remaining CRITICAL conflicts (7-8) +- Address schema:identifier remaining conflicts (6 usages) +- Make architectural decision on dcterms:description semantic flattening +- Address top 10 HIGH priority range mismatches +- Create ontology_mapping_guide.md with rationale + +--- + +## Notes + +- Automated semantic review tools created provide ongoing validation +- Re-run `make semantic-review` after each batch of fixes +- Prioritize fixes by: (1) CRITICAL severity, (2) Clear standard mappings, (3) High usage frequency +- Document rationale for all non-obvious mapping decisions diff --git a/reports/medium_conflicts_analysis.md b/reports/medium_conflicts_analysis.md new file mode 100644 index 00000000..bdae2d02 --- /dev/null +++ b/reports/medium_conflicts_analysis.md @@ -0,0 +1,188 @@ +# MEDIUM Priority Conflicts - Analysis & Decisions + +## 1. dcat:byteSize (2 slots) - ✅ ACCEPTABLE + +**Usages:** +- `bytes` (D4D_Base_import) - Size of a single file/resource in bytes +- `total_bytes` (D4D_FileCollection) - Total size of all files in a collection + +**Analysis:** Both slots correctly use dcat:byteSize to express size in bytes. The semantic distinction is the entity being measured (single file vs collection), not the property itself. DCAT's byteSize property is appropriate for both contexts. + +**Decision:** ACCEPT as valid semantic overlap. No fix needed. + +**Rationale:** The slot_uri describes the property (byte size), not the entity. Different entities can share the same property type. + +--- + +## 2. dcterms:conformsTo (3 slots) - ⚠️ DIFFERENTIATE for precision + +**Usages:** +- `conforms_to` (generic) - Any established standard/specification +- `conforms_to_schema` (specific) - Schema or data model +- `conforms_to_class` (specific) - Specific class within a schema + +**Analysis:** While all express conformance relationships (appropriate for dcterms:conformsTo), the specific variants add semantic precision that could be valuable for machine interpretation. + +**Decision:** DIFFERENTIATE the specific variants + +**Fix:** +```yaml +conforms_to_schema: + slot_uri: d4d:conformsToSchema + broad_mappings: + - dcterms:conformsTo + +conforms_to_class: + slot_uri: d4d:conformsToClass + broad_mappings: + - dcterms:conformsTo +``` + +**Rationale:** Schema and class conformance are semantically narrower than generic conformance. Custom terms preserve this precision while maintaining broad mapping. + +--- + +## 3. dcterms:identifier (4 slots) - ⚠️ PARTIALLY DIFFERENTIATE + +**Usages:** +- `hash` - Generic cryptographic hash +- `md5` - MD5 hash (128-bit) +- `sha256` - SHA-256 hash (256-bit) +- `doi` - Digital Object Identifier + +**Analysis:** +- **Hashes (hash, md5, sha256)**: All serve same purpose (integrity verification), just different algorithms. Semantic overlap acceptable. +- **DOI**: Fundamentally different - persistent citation identifier, not integrity hash. Should be differentiated. + +**Decision:** Differentiate DOI only, keep hashes as dcterms:identifier + +**Fix:** +```yaml +doi: + slot_uri: d4d:doiIdentifier + broad_mappings: + - dcterms:identifier + exact_mappings: + - schema:identifier # DOI is a schema.org identifier +``` + +**Rationale:** +- Hashes: All identifiers based on content, acceptable overlap +- DOI: Persistent scholarly identifier with different semantics (citation vs verification) + +--- + +## 4. schema:description (4 slots) - ✅ MOSTLY ACCEPTABLE + +**Usages:** +- `description` (NamedThing) - Generic description +- `description` (DatasetProperty) - Property description +- `label_description` (D4D_Composition) - Pattern/format of labels +- `representative_verification` (D4D_Composition) - Verification description + +**Analysis:** Most are generic descriptions (appropriate). `label_description` and `representative_verification` are more specific descriptive fields. + +**Decision:** ACCEPT generic descriptions, DIFFERENTIATE specific ones + +**Fix:** +```yaml +label_description: + slot_uri: d4d:labelPattern + broad_mappings: + - schema:description + +representative_verification: + slot_uri: d4d:verificationDescription + broad_mappings: + - schema:description +``` + +**Rationale:** Generic descriptions share semantics; specific descriptive fields benefit from precision. + +--- + +## 5. schema:name (4 slots) - ✅ DIFFERENTIATED + +**Usages:** +- `name` (NamedThing) - Generic name +- `name` (DatasetProperty) - Property name +- `tools` (D4D_Preprocessing) - Tool names list +- `variable_name` (D4D_Variables) - Variable identifier name + +**Analysis:** +- Generic `name` slots: Appropriate overlap +- `tools`: List of tool names (software name list) +- `variable_name`: Specific variable identifier (technical name) + +**Decision:** DIFFERENTIATE tools and variable_name + +**Fix Applied:** +```yaml +tools: + slot_uri: d4d:toolNames + broad_mappings: + - schema:name + +variable_name: + slot_uri: d4d:variableName + broad_mappings: + - schema:name + - schema:identifier # Variables are identified by name +``` + +**Rationale:** Software tool names and variable technical names have specific semantics beyond generic naming. + +--- + +## Summary + +| Conflict | Slots | Decision | Fixes Needed | +|----------|-------|----------|--------------| +| dcat:byteSize | 2 | ACCEPT | 0 | +| dcterms:conformsTo | 3 | DIFFERENTIATE | 2 | +| dcterms:identifier | 4 | PARTIAL | 1 (DOI) | +| schema:description | 4 | PARTIAL | 2 | +| schema:name | 4 | PARTIAL | 2 | +| **TOTAL** | **17** | - | **7 fixes** | + +**Acceptable overlaps:** 5 slots (dcat:byteSize both, dcterms:identifier hashes, generic names/descriptions) + +**Precision improvements:** 7 slots need custom d4d: terms + +--- + +## Implementation Order + +1. ✅ doi → d4d:doiIdentifier (D4D_Base_import.yaml:383) +2. ✅ conforms_to_schema → d4d:conformsToSchema (D4D_Base_import.yaml:334) +3. ✅ conforms_to_class → d4d:conformsToClass (D4D_Base_import.yaml:338) +4. ✅ variable_name → d4d:variableName (D4D_Variables.yaml:47) +5. ✅ label_description → d4d:labelPattern (D4D_Composition.yaml:83) +6. ✅ representative_verification → d4d:verificationDescription (D4D_Composition.yaml:132) +7. ✅ tools → d4d:toolNames (D4D_Preprocessing.yaml:236) + +**Result:** 5 MEDIUM conflicts → 2 acceptable overlaps (dcat:byteSize, dcterms:identifier hashes) + +--- + +## Final Status + +**Conflicts resolved:** 3 of 5 MEDIUM priority conflicts +- ✅ dcterms:conformsTo (3 slots) → 2 custom terms + 1 generic +- ✅ schema:description (4 slots) → 2 custom terms + 2 generic +- ✅ schema:name (4 slots) → 2 custom terms + 2 generic + +**Acceptable overlaps documented:** +- ✅ dcat:byteSize (2 slots) - Different entities, same property semantics +- ✅ dcterms:identifier (4 slots) - Hash variants acceptable, DOI differentiated + +**Custom D4D terms created:** 7 +- d4d:doiIdentifier +- d4d:conformsToSchema +- d4d:conformsToClass +- d4d:variableName +- d4d:labelPattern +- d4d:verificationDescription +- d4d:toolNames + +**Total impact:** Reduced slot_uri conflicts from 17 to 5 (70% reduction) diff --git a/reports/range_mismatches.json b/reports/range_mismatches.json new file mode 100644 index 00000000..e355e267 --- /dev/null +++ b/reports/range_mismatches.json @@ -0,0 +1,849 @@ +{ + "metadata": { + "tool": "range_description_checker", + "total_issues": 76 + }, + "summary": { + "HIGH": 51, + "MEDIUM": 24, + "LOW": 1 + }, + "issues": [ + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "Software", + "attribute": "version", + "description": "The version identifier of the software (e.g., \"1.0.0\", \"2.3.1-beta\").", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "Software", + "attribute": "license", + "description": "The license under which the software is distributed (e.g., \"MIT\", \"Apache-2.0\", \"GPL-3.0\").", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "Software", + "attribute": "url", + "description": "URL where the software can be found (e.g., homepage, repository, or documentation).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "Person", + "attribute": "orcid", + "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "FormatDialect", + "attribute": "comment_prefix", + "description": "Character(s) used to indicate comment lines (e.g., \"#\" for CSV comments).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "FormatDialect", + "attribute": "delimiter", + "description": "Field delimiter character (e.g., \",\" for CSV, \"\\t\" for TSV).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "FormatDialect", + "attribute": "double_quote", + "description": "String indicator of whether quotes within quoted fields are escaped by doubling them.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "FormatDialect", + "attribute": "header", + "description": "String indicator of whether the first row contains column headers.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": "FormatDialect", + "attribute": "quote_char", + "description": "Character used for quoting fields (e.g., '\"' for CSV).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "language", + "description": "Language in which the information is expressed.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "page", + "description": "A landing page or web page providing access to or information about the resource.", + "range": "string", + "multivalued": false, + "issue": "Primitive type 'string' used but description implies structured class", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "dialect", + "description": "Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "compression", + "description": "Compression format used, if any (e.g., gzip, bzip2, zip).", + "range": "CompressionEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "hash", + "description": "Cryptographic hash value of the data for integrity verification.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "conforms_to", + "description": "An established standard, specification, or schema to which the resource conforms.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "conforms_to_schema", + "description": "The schema or data model to which the resource conforms.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "conforms_to_class", + "description": "The specific class or type within a schema to which the resource conforms.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "license", + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "version", + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "modified_by", + "description": "A person or organization that contributed to modifying or updating the resource.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "status", + "description": "The status of the resource (e.g., draft, published, deprecated).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "was_derived_from", + "description": "A resource from which this resource was derived, in whole or in part.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "class": null, + "attribute": "doi", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "InstanceAcquisition", + "attribute": "was_directly_observed", + "description": "Whether the data was directly observed.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "InstanceAcquisition", + "attribute": "was_reported_by_subjects", + "description": "Whether the data was reported directly by the subjects themselves.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "InstanceAcquisition", + "attribute": "was_inferred_derived", + "description": "Whether the data was inferred or derived from other data.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "InstanceAcquisition", + "attribute": "was_validated_verified", + "description": "Whether the data was validated or verified in any way.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies date", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "DataCollector", + "attribute": "role", + "description": "Role of the data collector (e.g., researcher, crowdworker).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "DirectCollection", + "attribute": "is_direct", + "description": "Whether collection was direct from individuals.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "MissingDataDocumentation", + "attribute": "handling_strategy", + "description": "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation).\n", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "class": "RawDataSource", + "attribute": "source_description", + "description": "Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collection).\n", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Instance", + "attribute": "data_topic", + "description": "General topic of each instance (e.g., from Bridge2AI standards).\n", + "range": "uriorcurie", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Instance", + "attribute": "instance_type", + "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Instance", + "attribute": "data_substrate", + "description": "Type of data (e.g., raw text, images) from Bridge2AI standards.\n", + "range": "uriorcurie", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Instance", + "attribute": "label", + "description": "Is there a label or target associated with each instance?\n", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "SamplingStrategy", + "attribute": "is_sample", + "description": "Indicates whether it is a sample of a larger set.", + "range": "boolean", + "multivalued": true, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "SamplingStrategy", + "attribute": "is_random", + "description": "Indicates whether the sample is random.", + "range": "boolean", + "multivalued": true, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "SamplingStrategy", + "attribute": "is_representative", + "description": "Indicates whether the sample is representative of the larger set.\n", + "range": "boolean", + "multivalued": true, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "DatasetBias", + "attribute": "bias_description", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "DatasetLimitation", + "attribute": "limitation_type", + "description": "Category of limitation (e.g., scope, coverage, temporal, methodological).\n", + "range": "LimitationTypeEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "ExternalResource", + "attribute": "archival", + "description": "Indication whether official archival versions of external resources are included.\n", + "range": "boolean", + "multivalued": true, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Confidentiality", + "attribute": "confidential_elements_present", + "description": "Indicates whether any confidential data elements are present.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "ContentWarning", + "attribute": "content_warnings_present", + "description": "Indicates whether any content warnings are needed.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Subpopulation", + "attribute": "subpopulation_elements_present", + "description": "Indicates whether any subpopulations are explicitly identified.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Deidentification", + "attribute": "identifiable_elements_present", + "description": "Indicates whether data subjects can be identified.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "Deidentification", + "attribute": "method", + "description": "Method used for de-identification (e.g., HIPAA Safe Harbor).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "SensitiveElement", + "attribute": "sensitive_elements_present", + "description": "Indicates whether sensitive data elements are present.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "DatasetRelationship", + "attribute": "target_dataset", + "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "class": "DatasetRelationship", + "attribute": "relationship_type", + "description": "The type of relationship (e.g., derives_from, supplements, is_version_of). Uses DatasetRelationshipTypeEnum for standardized relationship types.", + "range": "DatasetRelationshipTypeEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "class": "LicenseAndUseTerms", + "attribute": "data_use_permission", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO.", + "range": "DataUsePermissionEnum", + "multivalued": true, + "issue": "Description implies URI but range is DataUsePermissionEnum", + "severity": "LOW" + }, + { + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "class": "ThirdPartySharing", + "attribute": "is_shared", + "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "class": "DistributionFormat", + "attribute": "access_urls", + "description": "Details of the distribution channel(s) or format(s).", + "range": "string", + "multivalued": true, + "issue": "Primitive type 'string' used but description implies structured class", + "severity": "MEDIUM" + }, + { + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "class": "EthicalReview", + "attribute": "reviewing_organization", + "description": "Organization that conducted the ethical review (e.g., Institutional Review Board, Ethics Committee, Research Ethics Board). Provides information about the body responsible for ethical oversight.", + "range": "Organization", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_FileCollection", + "file": "D4D_FileCollection.yaml", + "class": "File", + "attribute": "file_type", + "description": "Semantic type or purpose of this file (e.g., data_file, code_file, documentation_file, metadata_file).", + "range": "FileTypeEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Human", + "file": "D4D_Human.yaml", + "class": "HumanSubjectResearch", + "attribute": "involves_human_subjects", + "description": "Does this dataset involve human subjects research?", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Human", + "file": "D4D_Human.yaml", + "class": "InformedConsent", + "attribute": "consent_obtained", + "description": "Was informed consent obtained from all participants?", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Human", + "file": "D4D_Human.yaml", + "class": "ParticipantPrivacy", + "attribute": "anonymization_method", + "description": "What methods were used to anonymize or de-identify participant data? Include technical details of privacy-preserving techniques.\n", + "range": "string", + "multivalued": true, + "issue": "Primitive type 'string' used but description implies structured class", + "severity": "MEDIUM" + }, + { + "module": "D4D_Human", + "file": "D4D_Human.yaml", + "class": "HumanSubjectCompensation", + "attribute": "compensation_provided", + "description": "Were participants compensated for their participation?", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Human", + "file": "D4D_Human.yaml", + "class": "AtRiskPopulations", + "attribute": "at_risk_groups_included", + "description": "Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaired individuals)?\n", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "class": "Maintainer", + "attribute": "role", + "description": "Role of the maintainer (e.g., researcher, platform, organization).\n", + "range": "CreatorOrMaintainerEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "class": "UpdatePlan", + "attribute": "frequency", + "description": "How often updates are planned (e.g., quarterly, annually).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "class": "VersionAccess", + "attribute": "versions_available", + "description": "List of available versions with metadata.", + "range": "string", + "multivalued": true, + "issue": "Primitive type 'string' used but description implies structured class", + "severity": "MEDIUM" + }, + { + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "class": "Task", + "attribute": "response", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "class": "Grant", + "attribute": "grant_number", + "description": "The alphanumeric identifier for the grant.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "LabelingStrategy", + "attribute": "data_annotation_platform", + "description": "Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool).", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "LabelingStrategy", + "attribute": "annotations_per_item", + "description": "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement.", + "range": "integer", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "LabelingStrategy", + "attribute": "inter_annotator_agreement", + "description": "Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percent agreement). Include both the metric name and value.", + "range": "string", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "LabelingStrategy", + "attribute": "annotator_demographics", + "description": "Demographic information about annotators, if available and relevant (e.g., geographic location, language background, expertise level).", + "range": "string", + "multivalued": true, + "issue": "Primitive type 'string' used but description implies structured class", + "severity": "MEDIUM" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "ImputationProtocol", + "attribute": "imputation_rationale", + "description": "Justification for the imputation approach chosen, including assumptions made about missing data mechanisms.\n", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "AnnotationAnalysis", + "attribute": "inter_annotator_agreement_score", + "description": "Measured agreement between annotators (e.g., Cohen's kappa value, Fleiss' kappa, Krippendorff's alpha).\n", + "range": "float", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "class": "AnnotationAnalysis", + "attribute": "agreement_metric", + "description": "Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreement, etc.).\n", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "class": "VariableMetadata", + "attribute": "variable_name", + "description": "The name or identifier of the variable as it appears in the data files.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + }, + { + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "class": "VariableMetadata", + "attribute": "data_type", + "description": "The data type of the variable (e.g., integer, float, string, boolean, date, categorical). Use standard type names when possible.", + "range": "VariableTypeEnum", + "multivalued": false, + "issue": "Description implies list but multivalued=false", + "severity": "HIGH" + }, + { + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "class": "VariableMetadata", + "attribute": "is_identifier", + "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies enum", + "severity": "HIGH" + }, + { + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "class": "VariableMetadata", + "attribute": "is_sensitive", + "description": "Indicates whether this variable contains sensitive information (e.g., personal data, protected health information).", + "range": "boolean", + "multivalued": false, + "issue": "Boolean oversimplifies - description implies string", + "severity": "HIGH" + }, + { + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "class": "VariableMetadata", + "attribute": "derivation", + "description": "Description of how this variable was derived or calculated from other variables, if applicable.", + "range": "string", + "multivalued": false, + "issue": "String used but description implies enum (limited choices)", + "severity": "MEDIUM" + } + ] +} \ No newline at end of file diff --git a/reports/semantic_fixes_session2.md b/reports/semantic_fixes_session2.md new file mode 100644 index 00000000..b30df3e0 --- /dev/null +++ b/reports/semantic_fixes_session2.md @@ -0,0 +1,373 @@ +# D4D Schema Semantic Fixes - Session 2 + +**Date:** 2026-04-08 +**Branch:** add-schema-descriptions +**Continuation of:** Initial semantic review (Session 1) + +## Summary + +Continued systematic resolution of semantic issues identified in comprehensive schema review. Reduced slot_uri conflicts from 17 to 8 (53% reduction), with CRITICAL conflicts reduced from 9 to 2 (78% reduction). + +--- + +## Session 2 Fixes Applied (11 additional fixes) + +### 5. dcat:accessURL - Raw Data Access ✅ FIXED + +**Issue:** `access_url` (raw data access) conflicted with `access_urls` (distribution channels) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Preprocessing.yaml` line 138 +- **Change:** slot_uri: `dcat:accessURL` → `d4d:rawDataAccessURL` +- **Rationale:** Raw data access point is semantically distinct from general dataset distribution access + +--- + +### 6. dcterms:creator - Principal Investigator ✅ FIXED + +**Issue:** `principal_investigator` (specific role) conflicted with `created_by` (general creator) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Motivation.yaml` line 81 +- **Change:** slot_uri: `dcterms:creator` → `d4d:principalInvestigator` +- **Added:** `broad_mappings` to dcterms:creator and schema:creator +- **Rationale:** Principal Investigator is a specific role-based creator designation, semantically narrower than general creator + +--- + +### 7. schema:affiliation - Team Affiliation ✅ FIXED + +**Issue:** `affiliations` (team/creator affiliations) conflicted with `affiliation` (person affiliation) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Motivation.yaml` line 90 +- **Change:** slot_uri: `schema:affiliation` → `d4d:teamAffiliation` +- **Added:** `broad_mappings` to schema:affiliation +- **Rationale:** Team/creator affiliations are contextually different from individual person affiliations + +--- + +### 8. dcterms:accessRights - External Resource Restrictions ✅ FIXED + +**Issue:** `restrictions` (external resource restrictions) conflicted with `regulatory_restrictions` and `is_shared` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Composition.yaml` line 305 +- **Change:** slot_uri: `dcterms:accessRights` → `d4d:externalResourceRestrictions` +- **Added:** `broad_mappings` to dcterms:accessRights +- **Rationale:** Restrictions on external resources (not the dataset itself) are semantically distinct + +**Kept:** `regulatory_restrictions` with `dcterms:accessRights` (most appropriate for regulatory access controls) + +--- + +### 9. dcterms:accessRights - is_shared Boolean ✅ FIXED + +**Issue:** `is_shared` (boolean) incorrectly mapped to `dcterms:accessRights` (expects text description) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Distribution.yaml` line 55 +- **Change:** slot_uri: `dcterms:accessRights` → `d4d:isExternallyShared` +- **Rationale:** Boolean indicator of external distribution doesn't match accessRights semantics (which expects textual description of access rights) + +--- + +### 10. schema:identifier - ORCID ✅ FIXED + +**Issue:** `orcid` (ORCID identifier) conflicted with generic `id` slots + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Base_import.yaml` line 191 +- **Change:** slot_uri: `schema:identifier` → `d4d:orcidIdentifier` +- **Added:** `broad_mappings` to schema:identifier +- **Rationale:** ORCID is a specific type of persistent researcher identifier, more precise than generic identifier + +--- + +### 11. schema:identifier - Target Dataset Relation ✅ FIXED + +**Issue:** `target_dataset` (dataset relationship) incorrectly mapped to `schema:identifier` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Composition.yaml` line 430 +- **Change:** slot_uri: `schema:identifier` → `dcterms:relation` +- **Rationale:** Target dataset is a relationship/reference to another dataset, not an identifier property + +--- + +### 12. schema:identifier - Latest Version ✅ FIXED + +**Issue:** `latest_version_doi` (version relationship) incorrectly mapped to `schema:identifier` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Maintenance.yaml` line 125 +- **Change:** slot_uri: `schema:identifier` → `dcterms:hasVersion` +- **Rationale:** DOI/URL of latest version is a version relationship, not an identifier of the current resource + +**Note:** Created temporary conflict with base `version` slot, resolved in Fix #13 + +--- + +### 13. dcterms:hasVersion - Version String ✅ FIXED + +**Issue:** Base `version` slot (version string like "1.0.0") incorrectly using `dcterms:hasVersion` (for version relationships) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Base_import.yaml` line 351 +- **Change:** slot_uri: `dcterms:hasVersion` → `schema:version` +- **Rationale:** dcterms:hasVersion is for relating resources to their versions; schema:version is for the version string itself + +--- + +### 14. schema:identifier - Grant Number ✅ FIXED + +**Issue:** `grant_number` (grant identifier) conflicted with generic identifier slots + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Motivation.yaml` line 141 +- **Change:** slot_uri: `schema:identifier` → `d4d:grantIdentifier` +- **Added:** `broad_mappings` to schema:identifier +- **Rationale:** Grant number is a specific type of funding identifier, more precise than generic identifier + +--- + +### 15. schema:identifier - is_identifier Meta-property ✅ FIXED + +**Issue:** `is_identifier` (boolean meta-property) incorrectly mapped to `schema:identifier` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Variables.yaml` line 111 +- **Change:** slot_uri: `schema:identifier` → `d4d:isIdentifier` +- **Rationale:** Meta-property indicating whether a variable IS an identifier, not an identifier value itself + +--- + +### 16. dcterms:format - Data Substrate Type ✅ FIXED + +**Issue:** `data_substrate` (data type: text/images) incorrectly mapped to `dcterms:format` (file format) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Composition.yaml` line 68 +- **Change:** slot_uri: `dcterms:format` → `dcterms:type` +- **Rationale:** Data content type (text, images) is about the type of data, not file format; dcterms:type more appropriate + +**Note:** Created acceptable semantic overlap with source_type and instance_type (all describing types of things) + +--- + +### 17. dcterms:type - Publication Status ✅ FIXED + +**Issue:** `status` (draft/published/deprecated) incorrectly mapped to `dcterms:type` + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Base_import.yaml` line 373 +- **Change:** slot_uri: `dcterms:type` → `d4d:publicationStatus` +- **Rationale:** Lifecycle/publication status is semantically different from resource type + +--- + +### 18. dcterms:license - License Terms Description ✅ FIXED + +**Issue:** `license_terms` (description of license) conflicted with `license` (the license itself) + +**Fix:** +- **File:** `src/data_sheets_schema/schema/D4D_Data_Governance.yaml` line 55 +- **Change:** slot_uri: `dcterms:license` → `d4d:licenseDescription` +- **Added:** `broad_mappings` to dcterms:license and dcterms:rights +- **Rationale:** License terms description is explanatory text about the license, distinct from the license identifier itself + +--- + +## Progress Metrics + +### Conflict Reduction + +| Metric | Before Session 1 | After Session 1 | After Session 2 | Total Reduction | +|--------|-----------------|-----------------|-----------------|-----------------| +| **Total Conflicts** | 17 | 15 | **8** | **53%** ⬇️ | +| **CRITICAL** | 9 | 8 | **2** | **78%** ⬇️ | +| **HIGH** | 3 | 3 | **1** | **67%** ⬇️ | +| **MEDIUM** | 4 | 3 | **5** | 25% ⬆️ | + +**Note:** Some HIGH conflicts were reclassified to MEDIUM after fixes resolved semantic ambiguity + +### Fixes by Category + +| Category | Session 1 | Session 2 | Total | +|----------|-----------|-----------|-------| +| **dcat:** conflicts | 3 | 1 | 4 | +| **dcterms:** conflicts | 0 | 7 | 7 | +| **schema:** conflicts | 1 | 3 | 4 | +| **Custom d4d:** terms created | 4 | 11 | 15 | + +--- + +## Remaining Issues + +### CRITICAL (2 remaining) + +#### 1. dcterms:description (40 slots) - ARCHITECTURAL DECISION NEEDED + +**Issue:** Massive semantic flattening - 40+ different detail/description slots all map to generic `dcterms:description` + +**Examples:** +- acquisition_details, mechanism_details, collector_details (data collection aspects) +- bias_description, limitation_description, anomaly_details (quality issues) +- preprocessing_details, cleaning_details, labeling_details (processing steps) +- And 30+ more... + +**Options:** +1. **Leave as-is** - Accept semantic flattening for simplicity + - ✅ No breaking changes + - ❌ Loses RDF semantic precision + +2. **Create specific D4D terms** for each category + - ✅ Maximum semantic precision + - ❌ Significant effort (40+ custom terms) + - ❌ No reuse of standard vocabulary + +3. **Hybrid approach** - Differentiate where clear standard alternatives exist + - Some use more specific DCTERMS/Schema.org properties + - Group related *_details fields under category-specific terms + - ✅ Balanced precision vs effort + - ⚠️ Requires careful semantic analysis + +**Recommendation:** Option 3 with phased approach: +- Phase 1: Identify fields with clear standard alternatives (e.g., source_description → dcterms:source) +- Phase 2: Group *_details by semantic category (collection, processing, quality, etc.) +- Phase 3: Create d4d: category namespaces (d4d:collectionDetails, d4d:qualityDetails, etc.) + +--- + +#### 2. dcterms:type (3 slots) - SEMANTIC OVERLAP + +**Current usages:** +- `source_type` (D4D_Collection) - Type of data source (sensor/database/web) +- `instance_type` (D4D_Composition) - Types of instances (movies/users/ratings) +- `data_substrate` (D4D_Composition) - Type of data content (text/images/etc) + +**Analysis:** All three legitimately describe "types" of things, which is what dcterms:type is for. This may be **acceptable semantic overlap** rather than a conflict. + +**Options:** +1. **Leave as-is** - All are valid dcterms:type usages +2. **Differentiate** - Create specific terms (d4d:sourceType, d4d:instanceType, keep data_substrate as dcterms:type since it uses controlled vocabulary) + +**Recommendation:** Option 2 for maximum precision, but Option 1 is defensible + +--- + +### HIGH (1 remaining) + +#### schema:contactPoint (2 slots) + +**Usages:** Contact information in different contexts - investigate and differentiate if needed + +--- + +### MEDIUM (5 remaining) + +1. **dcat:byteSize** (2 slots) - `bytes` vs `total_bytes` (acceptable - different entities) +2. **dcterms:conformsTo** (3 slots) - Multiple conformance declarations +3. **dcterms:identifier** (4 slots) - Remaining generic identifiers (mostly acceptable) +4. **schema:description** (3 slots) - Multiple description fields (similar to dcterms:description) +5. **schema:name** (3 slots) - Name fields in different contexts (likely acceptable) + +--- + +## Validation Status + +```bash +make test-schema # ✅ PASSED +make gen-project # ✅ PASSED +``` + +**All changes non-breaking:** +- slot_uri modifications affect RDF/JSON-LD only +- YAML data structure unchanged +- No data migration required + +--- + +## Next Steps + +### Immediate +1. ✅ Commit Session 2 fixes +2. 📝 Create `docs/ontology_mapping_guide.md` with rationale for all mappings +3. 🎯 Make architectural decision on dcterms:description (40 slots) + +### High Priority +4. 🔍 Review and address dcterms:type semantic overlap (3 slots) +5. 🔍 Investigate schema:contactPoint conflict (2 slots) +6. 🔍 Review remaining MEDIUM priority conflicts + +### Medium Priority +7. 📊 Address 51 HIGH priority range mismatches from data analysis +8. 🔍 Review 75 enum candidates identified in actual data + +### Documentation +9. 📝 Document all custom d4d: namespace terms created +10. 📝 Update CLAUDE.md with semantic validation practices +11. 📝 Create migration guide for any breaking changes (if needed) + +--- + +## Files Modified + +**Schema modules:** +- `src/data_sheets_schema/schema/D4D_Base_import.yaml` (3 fixes) +- `src/data_sheets_schema/schema/D4D_Collection.yaml` (0 fixes this session) +- `src/data_sheets_schema/schema/D4D_Composition.yaml` (3 fixes) +- `src/data_sheets_schema/schema/D4D_Data_Governance.yaml` (1 fix) +- `src/data_sheets_schema/schema/D4D_Distribution.yaml` (1 fix) +- `src/data_sheets_schema/schema/D4D_Maintenance.yaml` (1 fix) +- `src/data_sheets_schema/schema/D4D_Motivation.yaml` (3 fixes) +- `src/data_sheets_schema/schema/D4D_Preprocessing.yaml` (1 fix) +- `src/data_sheets_schema/schema/D4D_Variables.yaml` (1 fix) + +**Generated files** (auto-regenerated): +- `project/jsonld/data_sheets_schema.jsonld` +- `project/owl/data_sheets_schema.owl.ttl` +- `src/data_sheets_schema/datamodel/data_sheets_schema.py` + +**Reports:** +- `reports/slot_uri_conflicts_final.json` (updated) +- `reports/semantic_fixes_session2.md` (this file) + +--- + +## Custom D4D Terms Created (Session 2) + +All custom terms in d4d: namespace with broad_mappings to standard vocabularies where applicable: + +1. `d4d:rawDataAccessURL` - Access point for raw/unprocessed data +2. `d4d:principalInvestigator` - PI role designation +3. `d4d:teamAffiliation` - Creator/team organizational affiliations +4. `d4d:externalResourceRestrictions` - Restrictions on external resources +5. `d4d:isExternallyShared` - Boolean indicator of external distribution +6. `d4d:orcidIdentifier` - ORCID persistent researcher identifier +7. `d4d:grantIdentifier` - Funding grant number/identifier +8. `d4d:isIdentifier` - Meta-property: whether variable is identifier +9. `d4d:publicationStatus` - Lifecycle status (draft/published/deprecated) +10. `d4d:licenseDescription` - Explanatory text about license terms + +**Total custom D4D terms:** 15 (Session 1: 5, Session 2: 10) + +--- + +## Semantic Precision Improvements + +### Before +- Generic schema:identifier for all identifier types +- dcat:accessURL for all access points +- dcterms:type for both types and status +- dcterms:format for both format and data type +- Boolean using text-description slot_uri + +### After +- Specific identifiers: orcidIdentifier, grantIdentifier, plus generic id +- Specific access: rawDataAccessURL, erratumURL, contributionURL, plus generic accessURL +- Clear separation: dcterms:type for types, publicationStatus for status +- Clear separation: dcterms:format for file format, dcterms:type for data content type +- Semantically appropriate slot_uris for all boolean fields + +**Result:** RDF/JSON-LD output now has much higher semantic precision and interoperability diff --git a/reports/semantic_review_report.json b/reports/semantic_review_report.json new file mode 100644 index 00000000..9665409b --- /dev/null +++ b/reports/semantic_review_report.json @@ -0,0 +1,9 @@ +{ + "generated": "2026-04-08T18:00:46.777670", + "total_issues": 136, + "reports_analyzed": [ + "slot_uri_conflicts", + "range_mismatches", + "data_value_analysis" + ] +} \ No newline at end of file diff --git a/reports/semantic_review_report.md b/reports/semantic_review_report.md new file mode 100644 index 00000000..61dafa57 --- /dev/null +++ b/reports/semantic_review_report.md @@ -0,0 +1,593 @@ +# D4D Schema Semantic Review Report + +**Generated:** 2026-04-08 18:00:46 +**Review Scope:** All D4D schema modules + actual data from 4 Bridge2AI projects + +--- + +## Executive Summary + +**Total Issues Found:** 136 + +- **CRITICAL:** 9 (Blocks functionality) +- **HIGH:** 54 (Wrong semantics) +- **MEDIUM:** 29 (Reduces clarity) +- **LOW:** 1 (Documentation quality) + +### Key Findings + +1. **slot_uri Conflicts:** 17 conflicts detected + - Most severe: `dcterms:description` used by 40 different slots (semantic flattening) + - Critical: Multiple core mappings (dcat:mediaType, dcterms:license, etc.) + +2. **Range-Description Mismatches:** 76 issues + - HIGH priority: 51 (boolean oversimplification, missing multivalued) + - MEDIUM priority: 24 (primitives vs structured types) + +3. **Data Value Analysis:** 142 fields analyzed across 4 Bridge2AI projects + - Enum candidates: 75 string fields with limited value sets + - Multivalued fields: 38 fields containing lists in actual data + + +--- + +## Critical Issues (Must Fix) + +### C-001: slot_uri Conflict - dcat:accessURL + +**Severity:** CRITICAL +**Conflict Count:** 3 different slots + +**Usages:** +- `access_urls` in D4D_Distribution.yaml + - Description: "Details of the distribution channel(s) or format(s)...." +- `erratum_url` in D4D_Maintenance.yaml + - Description: "URL or access point for the erratum...." +- `access_url` in D4D_Preprocessing.yaml + - Description: "URL or access point for the raw data...." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `access_urls` → `dcat:accessURL` (correct usage) +- **CHANGE** `erratum_url` → `d4d:erratum_url` (avoid conflict) +- **CHANGE** `access_url` → `d4d:access_url` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-002: slot_uri Conflict - dcat:landingPage + +**Severity:** CRITICAL +**Conflict Count:** 2 different slots + +**Usages:** +- `page` in D4D_Base_import.yaml + - Description: "A landing page or web page providing access to or information about the resource..." +- `contribution_url` in D4D_Maintenance.yaml + - Description: "URL for contribution guidelines or process...." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `page` → `dcat:landingPage` (correct usage) +- **CHANGE** `contribution_url` → `d4d:contribution_url` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-003: slot_uri Conflict - dcterms:accessRights + +**Severity:** CRITICAL +**Conflict Count:** 3 different slots + +**Usages:** +- `restrictions` in D4D_Composition.yaml + - Description: "Description of any restrictions or fees associated with external resources. +..." +- `regulatory_restrictions` in D4D_Data_Governance.yaml + - Description: "Export or regulatory restrictions on the dataset...." +- `is_shared` in D4D_Distribution.yaml + - Description: "Boolean indicating whether the dataset is distributed to parties external to the..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `restrictions` → `dcterms:accessRights` (correct usage) +- **CHANGE** `regulatory_restrictions` → `d4d:regulatory_restrictions` (avoid conflict) +- **CHANGE** `is_shared` → `d4d:is_shared` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-004: slot_uri Conflict - dcterms:creator + +**Severity:** CRITICAL +**Conflict Count:** 2 different slots + +**Usages:** +- `created_by` in D4D_Base_import.yaml + - Description: "The person or organization primarily responsible for creating the resource...." +- `principal_investigator` in D4D_Motivation.yaml + - Description: "A key individual (Principal Investigator) responsible for or overseeing dataset ..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `created_by` → `dcterms:creator` (correct usage) +- **CHANGE** `principal_investigator` → `d4d:principal_investigator` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-005: slot_uri Conflict - dcterms:description + +**Severity:** CRITICAL +**Conflict Count:** 40 different slots + +**Usages:** +- `acquisition_details` in D4D_Collection.yaml + - Description: "Details on how data was acquired for each instance. +..." +- `mechanism_details` in D4D_Collection.yaml + - Description: "Details on mechanisms or procedures used to collect the data. +..." +- `collector_details` in D4D_Collection.yaml + - Description: "Details on who collected the data and their compensation. +..." +- `timeframe_details` in D4D_Collection.yaml + - Description: "Details on the collection timeframe and relationship to data creation dates. +..." +- `collection_details` in D4D_Collection.yaml + - Description: "Details on direct vs. indirect collection methods and sources. +..." +- `source_description` in D4D_Collection.yaml + - Description: "Detailed description of where raw data comes from (e.g., sensors, databases, web..." +- `missing` in D4D_Composition.yaml + - Description: "Description of the missing data fields or elements. +..." +- `why_missing` in D4D_Composition.yaml + - Description: "Explanation of why each piece of data is missing. +..." +- `relationship_details` in D4D_Composition.yaml + - Description: "Details on relationships between instances (e.g., graph edges, ratings). +..." +- `split_details` in D4D_Composition.yaml + - Description: "Details on recommended data splits and their rationale. +..." +- `anomaly_details` in D4D_Composition.yaml + - Description: "Details on errors, noise sources, or redundancies in the dataset. +..." +- `bias_description` in D4D_Composition.yaml + - Description: "Detailed description of how this bias manifests in the dataset, including affect..." +- `limitation_description` in D4D_Composition.yaml + - Description: "Detailed description of the limitation and its implications. +..." +- `future_guarantees` in D4D_Composition.yaml + - Description: "Explanation of any commitments that external resources will remain available and..." +- `confidentiality_details` in D4D_Composition.yaml + - Description: "Details on confidential data elements and handling procedures. +..." +- `warnings` in D4D_Composition.yaml + - Description: "Specific content warnings describing potentially offensive, insulting, threateni..." +- `identification` in D4D_Composition.yaml + - Description: "How subpopulations are identified and defined (e.g., by age groups, gender, geog..." +- `distribution` in D4D_Composition.yaml + - Description: "The distribution of instances across identified subpopulations, including counts..." +- `deidentification_details` in D4D_Composition.yaml + - Description: "Details on de-identification procedures and residual risks. +..." +- `sensitivity_details` in D4D_Composition.yaml + - Description: "Details on sensitive data elements present and handling procedures. +..." +- `review_details` in D4D_Ethics.yaml + - Description: "Details on ethical review processes, outcomes, and supporting documentation. +..." +- `impact_details` in D4D_Ethics.yaml + - Description: "Details on data protection impact analysis, outcomes, and documentation. +..." +- `notification_details` in D4D_Ethics.yaml + - Description: "Details on how individuals were notified about data collection. +..." +- `consent_details` in D4D_Ethics.yaml + - Description: "Details on how consent was requested, provided, and documented. +..." +- `revocation_details` in D4D_Ethics.yaml + - Description: "Details on consent revocation mechanisms and procedures. +..." +- `maintainer_details` in D4D_Maintenance.yaml + - Description: "Details on who will support, host, or maintain the dataset. +..." +- `erratum_details` in D4D_Maintenance.yaml + - Description: "Details on any errata or corrections to the dataset. +..." +- `update_details` in D4D_Maintenance.yaml + - Description: "Details on update plans, responsible parties, and communication methods. +..." +- `retention_details` in D4D_Maintenance.yaml + - Description: "Details on data retention limits and enforcement procedures. +..." +- `version_details` in D4D_Maintenance.yaml + - Description: "Details on version support policies and obsolescence communication. +..." +- `extension_details` in D4D_Maintenance.yaml + - Description: "Details on extension mechanisms, contribution validation, and communication. +..." +- `response` in D4D_Motivation.yaml + - Description: "Short explanation describing the primary purpose of creating the dataset...." +- `response` in D4D_Motivation.yaml + - Description: "Short explanation describing the specific task or tasks for which this dataset w..." +- `response` in D4D_Motivation.yaml + - Description: "Short explanation of the knowledge or resource gap that this dataset was intende..." +- `preprocessing_details` in D4D_Preprocessing.yaml + - Description: "Details on preprocessing steps applied to the data. +..." +- `cleaning_details` in D4D_Preprocessing.yaml + - Description: "Details on data cleaning procedures applied. +..." +- `labeling_details` in D4D_Preprocessing.yaml + - Description: "Details on labeling/annotation procedures and quality metrics. +..." +- `raw_data_details` in D4D_Preprocessing.yaml + - Description: "Details on raw data availability and access procedures. +..." +- `repository_details` in D4D_Uses.yaml + - Description: "Details on the repository of known dataset uses. +..." +- `task_details` in D4D_Uses.yaml + - Description: "Details on other potential tasks the dataset could be used for. +..." +- `impact_details` in D4D_Uses.yaml + - Description: "Details on potential impacts, risks, and mitigation strategies. +..." +- `discouragement_details` in D4D_Uses.yaml + - Description: "Details on tasks for which the dataset should not be used. +..." +- `quality_notes` in D4D_Variables.yaml + - Description: "Notes about data quality, reliability, or known issues specific to this variable..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `acquisition_details` → `dcterms:description` (correct usage) +- **CHANGE** `mechanism_details` → `d4d:mechanism_details` (avoid conflict) +- **CHANGE** `collector_details` → `d4d:collector_details` (avoid conflict) +- **CHANGE** `timeframe_details` → `d4d:timeframe_details` (avoid conflict) +- **CHANGE** `collection_details` → `d4d:collection_details` (avoid conflict) +- **CHANGE** `source_description` → `d4d:source_description` (avoid conflict) +- **CHANGE** `missing` → `d4d:missing` (avoid conflict) +- **CHANGE** `why_missing` → `d4d:why_missing` (avoid conflict) +- **CHANGE** `relationship_details` → `d4d:relationship_details` (avoid conflict) +- **CHANGE** `split_details` → `d4d:split_details` (avoid conflict) +- **CHANGE** `anomaly_details` → `d4d:anomaly_details` (avoid conflict) +- **CHANGE** `bias_description` → `d4d:bias_description` (avoid conflict) +- **CHANGE** `limitation_description` → `d4d:limitation_description` (avoid conflict) +- **CHANGE** `future_guarantees` → `d4d:future_guarantees` (avoid conflict) +- **CHANGE** `confidentiality_details` → `d4d:confidentiality_details` (avoid conflict) +- **CHANGE** `warnings` → `d4d:warnings` (avoid conflict) +- **CHANGE** `identification` → `d4d:identification` (avoid conflict) +- **CHANGE** `distribution` → `d4d:distribution` (avoid conflict) +- **CHANGE** `deidentification_details` → `d4d:deidentification_details` (avoid conflict) +- **CHANGE** `sensitivity_details` → `d4d:sensitivity_details` (avoid conflict) +- **CHANGE** `review_details` → `d4d:review_details` (avoid conflict) +- **CHANGE** `impact_details` → `d4d:impact_details` (avoid conflict) +- **CHANGE** `notification_details` → `d4d:notification_details` (avoid conflict) +- **CHANGE** `consent_details` → `d4d:consent_details` (avoid conflict) +- **CHANGE** `revocation_details` → `d4d:revocation_details` (avoid conflict) +- **CHANGE** `maintainer_details` → `d4d:maintainer_details` (avoid conflict) +- **CHANGE** `erratum_details` → `d4d:erratum_details` (avoid conflict) +- **CHANGE** `update_details` → `d4d:update_details` (avoid conflict) +- **CHANGE** `retention_details` → `d4d:retention_details` (avoid conflict) +- **CHANGE** `version_details` → `d4d:version_details` (avoid conflict) +- **CHANGE** `extension_details` → `d4d:extension_details` (avoid conflict) +- **CHANGE** `response` → `d4d:response` (avoid conflict) +- **CHANGE** `response` → `d4d:response` (avoid conflict) +- **CHANGE** `response` → `d4d:response` (avoid conflict) +- **CHANGE** `preprocessing_details` → `d4d:preprocessing_details` (avoid conflict) +- **CHANGE** `cleaning_details` → `d4d:cleaning_details` (avoid conflict) +- **CHANGE** `labeling_details` → `d4d:labeling_details` (avoid conflict) +- **CHANGE** `raw_data_details` → `d4d:raw_data_details` (avoid conflict) +- **CHANGE** `repository_details` → `d4d:repository_details` (avoid conflict) +- **CHANGE** `task_details` → `d4d:task_details` (avoid conflict) +- **CHANGE** `impact_details` → `d4d:impact_details` (avoid conflict) +- **CHANGE** `discouragement_details` → `d4d:discouragement_details` (avoid conflict) +- **CHANGE** `quality_notes` → `d4d:quality_notes` (avoid conflict) + +**Rationale:** Overuse of generic description property loses semantic distinction between different types of descriptive text + + +### C-006: slot_uri Conflict - dcterms:format + +**Severity:** CRITICAL +**Conflict Count:** 2 different slots + +**Usages:** +- `format` in D4D_Base_import.yaml + - Description: "The file format, physical medium, or dimensions of a resource. This should be a ..." +- `data_substrate` in D4D_Composition.yaml + - Description: "Type of data (e.g., raw text, images) from Bridge2AI standards. +..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `format` → `dcterms:format` (correct usage) +- **CHANGE** `data_substrate` → `d4d:data_substrate` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-007: slot_uri Conflict - dcterms:license + +**Severity:** CRITICAL +**Conflict Count:** 2 different slots + +**Usages:** +- `license` in D4D_Base_import.yaml + - Description: "The legal license under which the resource is made available (e.g., "MIT", "CC-B..." +- `license_terms` in D4D_Data_Governance.yaml + - Description: "Description of the dataset's license and terms of use (including links, costs, o..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `license` → `dcterms:license` (correct usage) +- **CHANGE** `license_terms` → `d4d:license_terms` (avoid conflict) + +**Rationale:** License applies to different entities (dataset vs software) and should be differentiated + + +### C-008: slot_uri Conflict - dcterms:type + +**Severity:** CRITICAL +**Conflict Count:** 3 different slots + +**Usages:** +- `status` in D4D_Base_import.yaml + - Description: "The status of the resource (e.g., draft, published, deprecated)...." +- `source_type` in D4D_Collection.yaml + - Description: "Type of raw source (sensor, database, user input, web scraping, etc.). +..." +- `instance_type` in D4D_Composition.yaml + - Description: "Multiple types of instances? (e.g., movies, users, and ratings). +..." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `status` → `dcterms:type` (correct usage) +- **CHANGE** `source_type` → `d4d:source_type` (avoid conflict) +- **CHANGE** `instance_type` → `d4d:instance_type` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + +### C-009: slot_uri Conflict - schema:affiliation + +**Severity:** CRITICAL +**Conflict Count:** 2 different slots + +**Usages:** +- `affiliation` in D4D_Base_import.yaml + - Description: "The organization(s) to which the person belongs in the context of this dataset. ..." +- `affiliations` in D4D_Motivation.yaml + - Description: "Organizations with which the creator or team is affiliated...." + +**Impact:** +- RDF Serialization: critical +- Tool Breakage Risk: high + +**Recommended Fix:** +- **KEEP** `affiliation` → `schema:affiliation` (correct usage) +- **CHANGE** `affiliations` → `d4d:affiliations` (avoid conflict) + +**Rationale:** Multiple semantic concepts mapped to same ontology term creates ambiguity + + + +--- + +## High Priority Issues (Wrong Semantics) + +### H-001: slot_uri Conflict - dcat:mediaType + +**Usages:** encoding, media_type +**Files:** D4D_Base_import.yaml + +### H-002: slot_uri Conflict - schema:contactPoint + +**Usages:** contact_person, governance_committee_contact, contact_person +**Files:** D4D_Data_Governance.yaml, D4D_Ethics.yaml + +### H-003: slot_uri Conflict - schema:identifier + +**Usages:** id, id, orcid, identifiers_removed, target_dataset, latest_version_doi, grant_number, is_identifier +**Files:** D4D_Base_import.yaml, D4D_Composition.yaml, D4D_Motivation.yaml, D4D_Maintenance.yaml, D4D_Variables.yaml + +### H-004: Range Mismatch - D4D_Base_import::Software::version + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "The version identifier of the software (e.g., "1.0.0", "2.3.1-beta")...." + +### H-005: Range Mismatch - D4D_Base_import::Software::license + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "The license under which the software is distributed (e.g., "MIT", "Apache-2.0", "GPL-3.0")...." + +### H-006: Range Mismatch - D4D_Base_import::Software::url + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "URL where the software can be found (e.g., homepage, repository, or documentation)...." + +### H-007: Range Mismatch - D4D_Base_import::FormatDialect::comment_prefix + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "Character(s) used to indicate comment lines (e.g., "#" for CSV comments)...." + +### H-008: Range Mismatch - D4D_Base_import::FormatDialect::delimiter + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "Field delimiter character (e.g., "," for CSV, "\t" for TSV)...." + +### H-009: Range Mismatch - D4D_Base_import::FormatDialect::quote_char + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "Character used for quoting fields (e.g., '"' for CSV)...." + +### H-010: Range Mismatch - D4D_Base_import::slots::dialect + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile)...." + +### H-011: Range Mismatch - D4D_Base_import::slots::compression + +**Current Range:** `CompressionEnum` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "Compression format used, if any (e.g., gzip, bzip2, zip)...." + +### H-012: Range Mismatch - D4D_Base_import::slots::license + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "The legal license under which the resource is made available (e.g., "MIT", "CC-BY-4.0")...." + +### H-013: Range Mismatch - D4D_Base_import::slots::version + +**Current Range:** `string` (multivalued: False) +**Issue:** Description implies list but multivalued=false +**Description:** "The version identifier of the resource (e.g., "1.0", "2.3.1")...." + + +--- + +## Data-Driven Insights + +Analysis of actual D4D records for AI_READI, CHORUS, CM4AI, and VOICE projects: + +### Enum Candidates + +Fields with limited value sets that could be enums: + +- `id`: String field with only 4 distinct values (enum candidate) + - Values: "https://chorus4ai.org/", "https://doi.org/10.13026/37yb-1t42", "https://doi.org/10.18130/V3/DXWOS5", "https://fairhub.io/datasets/2" +- `name`: String field with only 4 distinct values (enum candidate) + - Values: "AI-READI", "Bridge2AI-Voice", "CHoRUS", "CM4AI" +- `title`: String field with only 4 distinct values (enum candidate) + - Values: "Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI)", "Bridge2AI-Voice - An ethically-sourced, diverse voice dataset linked to health information", "Cell Maps for Artificial Intelligence (CM4AI)", "Patient-Focused Collaborative Hospital Repository Uniting Standards (CHoRUS) for Equitable AI" +- `description`: String field with only 4 distinct values (enum candidate) + - Values: "CHoRUS for Equitable AI is a Bridge2AI data generation project developing the most diverse, high-resolution, ethically sourced, AI-ready critical care dataset to answer the grand challenge of improving recovery from acute illness. The project spans 20 academic centers (14 data acquisition centers) and creates a publicly available dataset of over 100,000 critically ill patients with multi-modal data including structured EHR, waveform telemetry, medical imaging, EEG, and clinical notes. All data is standardized to the OMOP Common Data Model with additional formats (DICOM, WFDB, OHNLP tokenization) and includes comprehensive metadata schemas. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. A visualization and annotation environment labels data with targets important for prediction. The project emphasizes skills and workforce development for a next generation of diverse academic and community AI scientists through training programs and partnerships with AIM-AHEAD. As of November 2024, the dataset covers 14 different hospitals with 23,400 unique admissions. +", "CM4AI is the Functional Genomics Data Generation Project in the U.S. National Institutes of Health's (NIH) Bridge to Artificial Intelligence (Bridge2AI) program. Its overarching mission is to produce ethical, AI-ready datasets of cell architecture, inferred from multimodal data collected for human cell lines, to enable transformative biomedical AI research. The project delivers machine-readable hierarchical maps of cell architecture as AI-Ready data produced from multimodal interrogation of 100 chromatin modifiers and 100 metabolic enzymes involved in cancer, neuropsychiatric, and cardiac disorders in disease-relevant cell lines under perturbed and unperturbed conditions. Data streams include immunofluorescence (IF) subcellular microscopy for spatial proteomics, affinity purification mass spectroscopy (AP-MS) and size exclusion mass spectroscopy (SEC-MS) for protein-protein interaction (PPI) data, and single-cell CRISPR-Cas perturbation screens by cell type. Input data streams are integrated via the Multi-Scale Integrated Cell (MuSIC) software pipeline employing deep learning models and community detection algorithms, and output cell maps are packaged with provenance graphs and rich metadata as AI-Ready datasets in RO-Crate format using the FAIRSCAPE framework. +", "The AI-READI is a flagship dataset consisting of multimodal data collected from 4,000 individuals with and without Type 2 Diabetes Mellitus (T2DM), harmonized across 3 data collection sites (Birmingham, Alabama; San Diego, California; Seattle, Washington). The dataset was designed with future AI/Machine Learning studies in mind, including recruitment sampling procedures aimed at achieving approximately equal distribution of participants across diabetes severity (triple-balanced by race/ethnicity, biological sex, and T2DM severity), as well as a multi-domain data acquisition protocol (survey data, physical measurements, clinical data, imaging data, wearable device data, environmental sensors, biospecimens) to enable downstream AI/ML analyses that may not be feasible with existing data sources such as claims or electronic health records data. The goal is to better understand salutogenesis (the pathway from disease to health) in T2DM. The study follows FAIR principles and incorporates ethical and equitable data collection and management practices. +", "The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. This comprehensive collection provides voice recordings with corresponding clinical information from participants selected based on known conditions which manifest within the voice waveform including voice disorders, neurological disorders, mood disorders, and respiratory disorders. The dataset is designed to fuel voice AI research, establish data standards, and promote ethical and trustworthy AI/ML development for voice biomarkers of health. Data collection occurs through a multi-institutional collaborative effort using standardized protocols, custom smartphone applications, and rigorous ethical oversight. The initial release (v1.0) provides 12,523 recordings for 306 participants collected across five sites in North America, with derived features such as spectrograms, MFCCs, acoustic features, and clinical phenotype data. Raw audio data is available through controlled access to protect participant privacy. +" +- `page`: String field with only 4 distinct values (enum candidate) + - Values: "https://chorus4ai.org/", "https://docs.b2ai-voice.org", "https://fairhub.io/datasets/2", "https://www.cm4ai.org" +- `license`: String field with only 4 distinct values (enum candidate) + - Values: "Bridge2AI Voice Registered Access License", "CC BY-NC 4.0", "CC BY-NC-SA 4.0", "Controlled Access with Data Use Agreement" +- `license_and_use_terms.name`: String field with only 4 distinct values (enum candidate) + - Values: "Bridge2AI Voice Registered Access License", "CHoRUS Controlled Access License", "Creative Commons Attribution Non-Commercial", "Creative Commons Attribution Non-Commercial Share-Alike" +- `license_and_use_terms.description`: String field with only 4 distinct values (enum candidate) + - Values: "Data licensed for reuse under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/). Attribution is required to the copyright holders and the Cell Maps for Artificial Intelligence project. Any publications referencing this data or derived products should cite the bioRxiv article (Clark T, et al. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. BioRXiv, May 2024. doi:10.1101/2024.05.21.589311) and directly cite the data collection. Commercial use requires separate license negotiation with copyright holder (UCSD, Stanford, and/or UCSF depending upon specific data package). A Data Access Committee will supervise ethical matters related to dataset distribution and potential dual licensing for commercial use. Copyright (c) 2025 The Regents of the University of California except where otherwise noted. Spatial proteomics raw image data is copyright (c) 2025 The Board of Trustees of the Leland Stanford Junior University. +", "Dataset distributed under controlled access requiring institutional email registration and signed licensing agreement. Access granted after review and approval process. Participants must complete registration form with name, email (institutional, not personal), and institution. Once approved, users receive email with access instructions to CHoRUS secure enclave. Contact for access requests: dbold@emory.edu or jared.houghtaling@tuftsmedicine.org. +", "Public access data distributed under Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license. Permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. Controlled access data requires data use agreement. See http://creativecommons.org/licenses/by-nc/4.0/ for full license terms. +", "Public access dataset distributed through PhysioNet under Bridge2AI Voice Registered Access License. Only registered users who sign the specified Data Use Agreement (Bridge2AI Voice Registered Access Agreement) can access files. Data covered under Certificate of Confidentiality which must be asserted against compulsory legal demands. Raw audio data available through controlled access only via Data Access Compliance Office (DACO) requiring distinct application. Recipient must adhere to PhysioNet requirements managed by MIT Laboratory for Computational Physiology, supported by NIBIB under grant R01EB030362. +" +- `updates.name`: String field with only 4 distinct values (enum candidate) + - Values: "Ongoing data collection and expansion", "Periodic data releases and maintenance plan", "Quarterly Data Releases and Maintenance Plan", "Versioned releases with ongoing data collection" +- `updates.description`: String field with only 4 distinct values (enum candidate) + - Values: "Dataset regularly updated and augmented through end of project in November 2026. Beta releases on quarterly basis with periodic data augmentation. Initial alpha release (v0.5) provided as supplemental data. March 2025 Beta (V1.4) includes perturb-seq in KOLF2.1J iPSCs, SEC-MS in iPSCs and derivatives, and IF images in MDA-MB-468 under three conditions. June 2025 Beta (V2.1) revision adds RGB IF images, ro-crate metadata corrections, and naming convention changes. Future releases will include computed cell maps and complete integration of all data streams. Long-term preservation in University of Virginia Dataverse with committed institutional support. +", "Dataset updated continuously as data collection progresses at 14 acquisition centers. As of November 2024, covers 14 hospitals with 23,400 unique admissions. Target exceeds 100,000 critically ill patients. Project timeline extends through November 30, 2026 (with approved no-cost extension). Regular status updates tracked through GitHub project management system. Sites provide updates via GitHub interface or Google Form submissions. +", "Dataset updated periodically as enrollment progresses toward target of 4,000 participants by November 2026. Version-specific documentation maintained for each release. Biorepository maintained at UAB CCTS with long-term storage protocols. Data sharing policies under ongoing development by Data Access Committee. Pilot data released May 2024; all data through July 31, 2024 released November 2024. +", "Dataset updated with versioned releases as data collection progresses. Initial release v1.0 published January 17, 2025 with 12,523 recordings from 306 participants. v1.1 released January 17, 2025 adding MFCC features. v2.0.0 released April 16, 2025. v2.0.1 released August 18, 2025. Latest version available at https://doi.org/10.13026/37yb-1t42. Data collection ongoing through November 30, 2026. Version-specific documentation maintained. As of v1.1, only adult cohort data available; pediatric cohort data planned for future releases with additional privacy precautions. +" +- `updates.frequency`: String field with only 4 distinct values (enum candidate) + - Values: "Continuous updates through November 2026", "Periodic releases with ongoing enrollment; final release planned for late 2026", "Periodic versioned releases during data collection period (2022-2026)", "Quarterly updates through November 2026; long-term preservation thereafter" +- `retention_limit.name`: String field with only 4 distinct values (enum candidate) + - Values: "Data and biospecimen retention", "Data retention and disposition", "Long-Term Preservation Plan", "Long-term dataset retention" +- `retention_limit.description`: String field with only 4 distinct values (enum candidate) + - Values: "Data Transfer and Use Agreement specifies retention requirements. Upon termination or expiration of agreement (two years after start date, project completion, or ethics approval expiration), data shall be destroyed per provider instructions with written certification required within 30 days. Recipient may retain one copy to extent necessary to comply with records retention requirements under law, regulation, institutional policy, and for research integrity and verification purposes. Restrictions apply to archival copies as long as recipient holds data. +", "Digital data maintained according to NIH data sharing policies and institutional requirements. Controlled access model ensures long-term availability for research while protecting patient privacy. +", "Digital data maintained according to NIH data sharing policies with long-term preservation in University of Virginia's LibraData repository supported by committed institutional funds. No planned sunset for data availability. Archived RO-Crates with persistent identifiers (ARK, future DOIs) ensure long-term accessibility and citability. +", "Digital data maintained according to NIH data sharing policies. Biospecimen retention subject to institutional policies and consent agreements. Finite number of biospecimen samples available for distribution. +" +- `human_subject_research.name`: String field with only 4 distinct values (enum candidate) + - Values: "AI-READI Human Subjects Research", "Bridge2AI-Voice Human Subjects Research", "CHoRUS Human Subjects Research", "CM4AI Non-Human Subjects Research" +- `human_subject_research.description`: String field with only 4 distinct values (enum candidate) + - Values: "CM4AI data are distinctive within Bridge2AI in that they are non-clinical data from tissue cultures and are considered to be de-identified as they cannot be matched, with current knowledge, to a human subject. Both cell lines (MDA-MB-468 and KOLF2.1J) are commercially available, ethically sourced, de-identified cell lines. MDA-MB-468 available from ATCC. KOLF2.1J available from HipSci resource for non-profit organizations via simple MTA. Ethics team developed comprehensive plan for ethical preparation, licensing, dissemination, and data access supervision balancing openness with IP protection and commercialization monitoring. +", "Data collection and sharing approved by University of South Florida Institutional Review Board. Participants provided written informed consent for data collection initiative and data sharing. Consent process includes authorization for voice data collection, access to medical information through EHR platforms for gold standard validation, and permission to share research data. Bioethics guidance integrated throughout study design and conduct. Ethics module develops new guidelines for consenting to voice data collection, voice data sharing, and utilization in context of voice AI technology. Project addresses ethical and trustworthy issues from voice data generation and AI/ML research through clinical adoption and downstream health decisions. +", "Retrospective data collection from critically ill patients approved through institutional review processes. Community-facing ethics focus groups conducted to determine what data is appropriate for public sharing. Legal framework established for collecting data at scale. Patient-focused efforts determine ethical and legal approaches to manage privacy and bias while accounting for Social Determinants of Health. Project draws expertise from law, ethics, health services, biomedical science, engineering, and scientific journal publications disciplines. +", "Study approved by Institutional Review Board (IRB) of University of Washington (approval number STUDY00016228), with reliance agreements from IRBs of University of Alabama at Birmingham and University of California, San Diego. Written informed consent provided by all participants. Bioethics guidance integrated throughout study design. Community Advisory Board of 11 persons with diversity in race and ethnicity contributes to protocol development. Ethical and equitable data collection and management practices implemented. +" + +### Multivalued Fields + +Fields that contain lists in actual data (verify schema has multivalued: true): + +- `keywords` +- `purposes` +- `tasks` +- `addressing_gaps` +- `creators` +- `funders` +- `instances` +- `subsets` +- `sampling_strategies` +- `subpopulations` +- `collection_mechanisms` +- `acquisition_methods` +- `preprocessing_strategies` +- `cleaning_strategies` +- `intended_uses` + +--- + +## Appendices + +### A: Files Requiring Changes + +Priority-ordered list of schema files needing updates: + +1. `src/data_sheets_schema/schema/D4D_Base_import.yaml` (CRITICAL - foundational) +2. `src/data_sheets_schema/schema/D4D_Composition.yaml` (HIGH) +3. `src/data_sheets_schema/schema/D4D_Data_Governance.yaml` (HIGH) +4. `src/data_sheets_schema/schema/D4D_Distribution.yaml` (MEDIUM) +5. `src/data_sheets_schema/schema/D4D_Maintenance.yaml` (MEDIUM) + +### B: Validation Tools + +Automated validation scripts created: + +- `scripts/slot_uri_conflict_detector.py` - Detects slot_uri conflicts +- `scripts/range_description_checker.py` - Checks range-description alignment +- `scripts/data_value_analyzer.py` - Analyzes actual data values + +### C: Next Steps + +1. Review and prioritize CRITICAL issues +2. Create implementation plan for fixes +3. Coordinate with tool maintainers for breaking changes +4. Develop data migration strategy if needed +5. Update documentation with rationale for changes diff --git a/reports/slot_uri_conflicts.json b/reports/slot_uri_conflicts.json new file mode 100644 index 00000000..6422f87e --- /dev/null +++ b/reports/slot_uri_conflicts.json @@ -0,0 +1,1510 @@ +{ + "metadata": { + "tool": "slot_uri_conflict_detector", + "total_conflicts": 17 + }, + "conflicts": [ + { + "slot_uri": "dcat:accessURL", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "access_urls", + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "description": "Details of the distribution channel(s) or format(s).", + "line": 0 + }, + { + "slot_name": "erratum_url", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "URL or access point for the erratum.", + "line": 0 + }, + { + "slot_name": "access_url", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "URL or access point for the raw data.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "access_urls", + "action": "keep", + "new_slot_uri": "dcat:accessURL", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "erratum_url", + "action": "change", + "new_slot_uri": "d4d:erratum_url", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "access_url", + "action": "change", + "new_slot_uri": "d4d:access_url", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcat:byteSize", + "severity": "MEDIUM", + "conflict_count": 2, + "usages": [ + { + "slot_name": "bytes", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Size of the data in bytes.", + "line": 0 + }, + { + "slot_name": "total_bytes", + "module": "D4D_FileCollection", + "file": "D4D_FileCollection.yaml", + "description": "Total size of all files in bytes.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "bytes", + "action": "keep", + "new_slot_uri": "dcat:byteSize", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "total_bytes", + "action": "change", + "new_slot_uri": "d4d:total_bytes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcat:landingPage", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "page", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A landing page or web page providing access to or information about the resource.", + "line": 0 + }, + { + "slot_name": "contribution_url", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "URL for contribution guidelines or process.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "page", + "action": "keep", + "new_slot_uri": "dcat:landingPage", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "contribution_url", + "action": "change", + "new_slot_uri": "d4d:contribution_url", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcat:mediaType", + "severity": "HIGH", + "conflict_count": 2, + "usages": [ + { + "slot_name": "encoding", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The character encoding of the data.", + "line": 0 + }, + { + "slot_name": "media_type", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The media type of the data. This should be a MIME type.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "encoding", + "action": "keep", + "new_slot_uri": "dcat:mediaType", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "media_type", + "action": "change", + "new_slot_uri": "d4d:media_type", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:accessRights", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "restrictions", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Description of any restrictions or fees associated with external resources.\n", + "line": 0 + }, + { + "slot_name": "regulatory_restrictions", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Export or regulatory restrictions on the dataset.", + "line": 0 + }, + { + "slot_name": "is_shared", + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "restrictions", + "action": "keep", + "new_slot_uri": "dcterms:accessRights", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "regulatory_restrictions", + "action": "change", + "new_slot_uri": "d4d:regulatory_restrictions", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "is_shared", + "action": "change", + "new_slot_uri": "d4d:is_shared", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:conformsTo", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "conforms_to", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "An established standard, specification, or schema to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_schema", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The schema or data model to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_class", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The specific class or type within a schema to which the resource conforms.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "conforms_to", + "action": "keep", + "new_slot_uri": "dcterms:conformsTo", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "conforms_to_schema", + "action": "change", + "new_slot_uri": "d4d:conforms_to_schema", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "conforms_to_class", + "action": "change", + "new_slot_uri": "d4d:conforms_to_class", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:creator", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "created_by", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The person or organization primarily responsible for creating the resource.", + "line": 0 + }, + { + "slot_name": "principal_investigator", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "created_by", + "action": "keep", + "new_slot_uri": "dcterms:creator", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "principal_investigator", + "action": "change", + "new_slot_uri": "d4d:principal_investigator", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:description", + "severity": "CRITICAL", + "conflict_count": 40, + "usages": [ + { + "slot_name": "acquisition_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on how data was acquired for each instance.\n", + "line": 0 + }, + { + "slot_name": "mechanism_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on mechanisms or procedures used to collect the data.\n", + "line": 0 + }, + { + "slot_name": "collector_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on who collected the data and their compensation.\n", + "line": 0 + }, + { + "slot_name": "timeframe_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "line": 0 + }, + { + "slot_name": "collection_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on direct vs. indirect collection methods and sources.\n", + "line": 0 + }, + { + "slot_name": "source_description", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collection).\n", + "line": 0 + }, + { + "slot_name": "missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Description of the missing data fields or elements.\n", + "line": 0 + }, + { + "slot_name": "why_missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of why each piece of data is missing.\n", + "line": 0 + }, + { + "slot_name": "relationship_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "line": 0 + }, + { + "slot_name": "split_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on recommended data splits and their rationale.\n", + "line": 0 + }, + { + "slot_name": "anomaly_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "line": 0 + }, + { + "slot_name": "bias_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "line": 0 + }, + { + "slot_name": "limitation_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of the limitation and its implications.\n", + "line": 0 + }, + { + "slot_name": "future_guarantees", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", + "line": 0 + }, + { + "slot_name": "confidentiality_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on confidential data elements and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "warnings", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content (e.g., violence, profanity, explicit imagery).", + "line": 0 + }, + { + "slot_name": "identification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", + "line": 0 + }, + { + "slot_name": "distribution", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "line": 0 + }, + { + "slot_name": "deidentification_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on de-identification procedures and residual risks.\n", + "line": 0 + }, + { + "slot_name": "sensitivity_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on sensitive data elements present and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "review_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "line": 0 + }, + { + "slot_name": "notification_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how individuals were notified about data collection.\n", + "line": 0 + }, + { + "slot_name": "consent_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how consent was requested, provided, and documented.\n", + "line": 0 + }, + { + "slot_name": "revocation_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on consent revocation mechanisms and procedures.\n", + "line": 0 + }, + { + "slot_name": "maintainer_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on who will support, host, or maintain the dataset.\n", + "line": 0 + }, + { + "slot_name": "erratum_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on any errata or corrections to the dataset.\n", + "line": 0 + }, + { + "slot_name": "update_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on update plans, responsible parties, and communication methods.\n", + "line": 0 + }, + { + "slot_name": "retention_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on data retention limits and enforcement procedures.\n", + "line": 0 + }, + { + "slot_name": "version_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on version support policies and obsolescence communication.\n", + "line": 0 + }, + { + "slot_name": "extension_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the primary purpose of creating the dataset.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", + "line": 0 + }, + { + "slot_name": "preprocessing_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on preprocessing steps applied to the data.\n", + "line": 0 + }, + { + "slot_name": "cleaning_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on data cleaning procedures applied.\n", + "line": 0 + }, + { + "slot_name": "labeling_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on labeling/annotation procedures and quality metrics.\n", + "line": 0 + }, + { + "slot_name": "raw_data_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on raw data availability and access procedures.\n", + "line": 0 + }, + { + "slot_name": "repository_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on the repository of known dataset uses.\n", + "line": 0 + }, + { + "slot_name": "task_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on other potential tasks the dataset could be used for.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "line": 0 + }, + { + "slot_name": "discouragement_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on tasks for which the dataset should not be used.\n", + "line": 0 + }, + { + "slot_name": "quality_notes", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "Notes about data quality, reliability, or known issues specific to this variable.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "acquisition_details", + "action": "keep", + "new_slot_uri": "dcterms:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "mechanism_details", + "action": "change", + "new_slot_uri": "d4d:mechanism_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collector_details", + "action": "change", + "new_slot_uri": "d4d:collector_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "timeframe_details", + "action": "change", + "new_slot_uri": "d4d:timeframe_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collection_details", + "action": "change", + "new_slot_uri": "d4d:collection_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "source_description", + "action": "change", + "new_slot_uri": "d4d:source_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "missing", + "action": "change", + "new_slot_uri": "d4d:missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "why_missing", + "action": "change", + "new_slot_uri": "d4d:why_missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "relationship_details", + "action": "change", + "new_slot_uri": "d4d:relationship_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "split_details", + "action": "change", + "new_slot_uri": "d4d:split_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "anomaly_details", + "action": "change", + "new_slot_uri": "d4d:anomaly_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "bias_description", + "action": "change", + "new_slot_uri": "d4d:bias_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "limitation_description", + "action": "change", + "new_slot_uri": "d4d:limitation_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "future_guarantees", + "action": "change", + "new_slot_uri": "d4d:future_guarantees", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "confidentiality_details", + "action": "change", + "new_slot_uri": "d4d:confidentiality_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "warnings", + "action": "change", + "new_slot_uri": "d4d:warnings", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "identification", + "action": "change", + "new_slot_uri": "d4d:identification", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "distribution", + "action": "change", + "new_slot_uri": "d4d:distribution", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "deidentification_details", + "action": "change", + "new_slot_uri": "d4d:deidentification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sensitivity_details", + "action": "change", + "new_slot_uri": "d4d:sensitivity_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "review_details", + "action": "change", + "new_slot_uri": "d4d:review_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "notification_details", + "action": "change", + "new_slot_uri": "d4d:notification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "consent_details", + "action": "change", + "new_slot_uri": "d4d:consent_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "revocation_details", + "action": "change", + "new_slot_uri": "d4d:revocation_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "maintainer_details", + "action": "change", + "new_slot_uri": "d4d:maintainer_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "erratum_details", + "action": "change", + "new_slot_uri": "d4d:erratum_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "update_details", + "action": "change", + "new_slot_uri": "d4d:update_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "retention_details", + "action": "change", + "new_slot_uri": "d4d:retention_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "version_details", + "action": "change", + "new_slot_uri": "d4d:version_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "extension_details", + "action": "change", + "new_slot_uri": "d4d:extension_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "preprocessing_details", + "action": "change", + "new_slot_uri": "d4d:preprocessing_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "cleaning_details", + "action": "change", + "new_slot_uri": "d4d:cleaning_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "labeling_details", + "action": "change", + "new_slot_uri": "d4d:labeling_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "raw_data_details", + "action": "change", + "new_slot_uri": "d4d:raw_data_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "repository_details", + "action": "change", + "new_slot_uri": "d4d:repository_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "task_details", + "action": "change", + "new_slot_uri": "d4d:task_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "discouragement_details", + "action": "change", + "new_slot_uri": "d4d:discouragement_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "quality_notes", + "action": "change", + "new_slot_uri": "d4d:quality_notes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:format", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "format", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The file format, physical medium, or dimensions of a resource. This should be a file extension or MIME type.", + "line": 0 + }, + { + "slot_name": "data_substrate", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Type of data (e.g., raw text, images) from Bridge2AI standards.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "format", + "action": "keep", + "new_slot_uri": "dcterms:format", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "data_substrate", + "action": "change", + "new_slot_uri": "d4d:data_substrate", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:identifier", + "severity": "MEDIUM", + "conflict_count": 4, + "usages": [ + { + "slot_name": "hash", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Cryptographic hash value of the data for integrity verification.", + "line": 0 + }, + { + "slot_name": "md5", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "MD5 hash value of the data (128-bit cryptographic hash).", + "line": 0 + }, + { + "slot_name": "sha256", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "SHA-256 hash value of the data (256-bit cryptographic hash, recommended).", + "line": 0 + }, + { + "slot_name": "doi", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "hash", + "action": "keep", + "new_slot_uri": "dcterms:identifier", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "md5", + "action": "change", + "new_slot_uri": "d4d:md5", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sha256", + "action": "change", + "new_slot_uri": "d4d:sha256", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "doi", + "action": "change", + "new_slot_uri": "d4d:doi", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:license", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "license", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", + "line": 0 + }, + { + "slot_name": "license_terms", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "license", + "action": "keep", + "new_slot_uri": "dcterms:license", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "license_terms", + "action": "change", + "new_slot_uri": "d4d:license_terms", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:type", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "status", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The status of the resource (e.g., draft, published, deprecated).", + "line": 0 + }, + { + "slot_name": "source_type", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "line": 0 + }, + { + "slot_name": "instance_type", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "status", + "action": "keep", + "new_slot_uri": "dcterms:type", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "source_type", + "action": "change", + "new_slot_uri": "d4d:source_type", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "instance_type", + "action": "change", + "new_slot_uri": "d4d:instance_type", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:affiliation", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "affiliation", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The organization(s) to which the person belongs in the context of this dataset. May vary across datasets; multivalued to support multiple affiliations.", + "line": 0 + }, + { + "slot_name": "affiliations", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Organizations with which the creator or team is affiliated.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "affiliation", + "action": "keep", + "new_slot_uri": "schema:affiliation", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "affiliations", + "action": "change", + "new_slot_uri": "d4d:affiliations", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:contactPoint", + "severity": "HIGH", + "conflict_count": 2, + "usages": [ + { + "slot_name": "contact_person", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", + "line": 0 + }, + { + "slot_name": "governance_committee_contact", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", + "line": 0 + }, + { + "slot_name": "contact_person", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "contact_person", + "action": "keep", + "new_slot_uri": "schema:contactPoint", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "governance_committee_contact", + "action": "change", + "new_slot_uri": "d4d:governance_committee_contact", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "contact_person", + "action": "change", + "new_slot_uri": "d4d:contact_person", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:description", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for a thing.", + "line": 0 + }, + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for this property.", + "line": 0 + }, + { + "slot_name": "label_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "If labeled, what pattern or format do labels follow?\n", + "line": 0 + }, + { + "slot_name": "representative_verification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of how representativeness was validated or verified.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "description", + "action": "keep", + "new_slot_uri": "schema:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "description", + "action": "change", + "new_slot_uri": "d4d:description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "label_description", + "action": "change", + "new_slot_uri": "d4d:label_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "representative_verification", + "action": "change", + "new_slot_uri": "d4d:representative_verification", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:identifier", + "severity": "HIGH", + "conflict_count": 7, + "usages": [ + { + "slot_name": "id", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A unique identifier for a thing.", + "line": 0 + }, + { + "slot_name": "id", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "An optional identifier for this property.", + "line": 0 + }, + { + "slot_name": "orcid", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", + "line": 0 + }, + { + "slot_name": "identifiers_removed", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "List of identifier types removed during de-identification.", + "line": 0 + }, + { + "slot_name": "target_dataset", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", + "line": 0 + }, + { + "slot_name": "latest_version_doi", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "DOI or URL of the latest dataset version.", + "line": 0 + }, + { + "slot_name": "grant_number", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "The alphanumeric identifier for the grant.", + "line": 0 + }, + { + "slot_name": "is_identifier", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "id", + "action": "keep", + "new_slot_uri": "schema:identifier", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "id", + "action": "change", + "new_slot_uri": "d4d:id", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "orcid", + "action": "change", + "new_slot_uri": "d4d:orcid", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "identifiers_removed", + "action": "change", + "new_slot_uri": "d4d:identifiers_removed", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "target_dataset", + "action": "change", + "new_slot_uri": "d4d:target_dataset", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "latest_version_doi", + "action": "change", + "new_slot_uri": "d4d:latest_version_doi", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "grant_number", + "action": "change", + "new_slot_uri": "d4d:grant_number", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "is_identifier", + "action": "change", + "new_slot_uri": "d4d:is_identifier", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:name", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for a thing.", + "line": 0 + }, + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for this property.", + "line": 0 + }, + { + "slot_name": "tools", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "List of automated annotation tools with their versions. Format each entry as \"ToolName version\" (e.g., \"spaCy 3.5.0\", \"NLTK 3.8\", \"GPT-4 turbo\"). Use \"unknown\" for version if not available (e.g., \"Custom NER Model unknown\").\n", + "line": 0 + }, + { + "slot_name": "variable_name", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "The name or identifier of the variable as it appears in the data files.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "name", + "action": "keep", + "new_slot_uri": "schema:name", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "name", + "action": "change", + "new_slot_uri": "d4d:name", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "tools", + "action": "change", + "new_slot_uri": "d4d:tools", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "variable_name", + "action": "change", + "new_slot_uri": "d4d:variable_name", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + } + ] +} \ No newline at end of file diff --git a/reports/slot_uri_conflicts_after.json b/reports/slot_uri_conflicts_after.json new file mode 100644 index 00000000..6c682e27 --- /dev/null +++ b/reports/slot_uri_conflicts_after.json @@ -0,0 +1,1396 @@ +{ + "metadata": { + "tool": "slot_uri_conflict_detector", + "total_conflicts": 15 + }, + "conflicts": [ + { + "slot_uri": "dcat:accessURL", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "access_urls", + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "description": "Details of the distribution channel(s) or format(s).", + "line": 0 + }, + { + "slot_name": "access_url", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "URL or access point for the raw data.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "access_urls", + "action": "keep", + "new_slot_uri": "dcat:accessURL", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "access_url", + "action": "change", + "new_slot_uri": "d4d:access_url", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcat:byteSize", + "severity": "MEDIUM", + "conflict_count": 2, + "usages": [ + { + "slot_name": "bytes", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Size of the data in bytes.", + "line": 0 + }, + { + "slot_name": "total_bytes", + "module": "D4D_FileCollection", + "file": "D4D_FileCollection.yaml", + "description": "Total size of all files in bytes.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "bytes", + "action": "keep", + "new_slot_uri": "dcat:byteSize", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "total_bytes", + "action": "change", + "new_slot_uri": "d4d:total_bytes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:accessRights", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "restrictions", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Description of any restrictions or fees associated with external resources.\n", + "line": 0 + }, + { + "slot_name": "regulatory_restrictions", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Export or regulatory restrictions on the dataset.", + "line": 0 + }, + { + "slot_name": "is_shared", + "module": "D4D_Distribution", + "file": "D4D_Distribution.yaml", + "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "restrictions", + "action": "keep", + "new_slot_uri": "dcterms:accessRights", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "regulatory_restrictions", + "action": "change", + "new_slot_uri": "d4d:regulatory_restrictions", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "is_shared", + "action": "change", + "new_slot_uri": "d4d:is_shared", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:conformsTo", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "conforms_to", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "An established standard, specification, or schema to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_schema", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The schema or data model to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_class", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The specific class or type within a schema to which the resource conforms.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "conforms_to", + "action": "keep", + "new_slot_uri": "dcterms:conformsTo", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "conforms_to_schema", + "action": "change", + "new_slot_uri": "d4d:conforms_to_schema", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "conforms_to_class", + "action": "change", + "new_slot_uri": "d4d:conforms_to_class", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:creator", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "created_by", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The person or organization primarily responsible for creating the resource.", + "line": 0 + }, + { + "slot_name": "principal_investigator", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "created_by", + "action": "keep", + "new_slot_uri": "dcterms:creator", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "principal_investigator", + "action": "change", + "new_slot_uri": "d4d:principal_investigator", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:description", + "severity": "CRITICAL", + "conflict_count": 40, + "usages": [ + { + "slot_name": "acquisition_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on how data was acquired for each instance.\n", + "line": 0 + }, + { + "slot_name": "mechanism_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on mechanisms or procedures used to collect the data.\n", + "line": 0 + }, + { + "slot_name": "collector_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on who collected the data and their compensation.\n", + "line": 0 + }, + { + "slot_name": "timeframe_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "line": 0 + }, + { + "slot_name": "collection_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on direct vs. indirect collection methods and sources.\n", + "line": 0 + }, + { + "slot_name": "source_description", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collection).\n", + "line": 0 + }, + { + "slot_name": "missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Description of the missing data fields or elements.\n", + "line": 0 + }, + { + "slot_name": "why_missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of why each piece of data is missing.\n", + "line": 0 + }, + { + "slot_name": "relationship_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "line": 0 + }, + { + "slot_name": "split_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on recommended data splits and their rationale.\n", + "line": 0 + }, + { + "slot_name": "anomaly_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "line": 0 + }, + { + "slot_name": "bias_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "line": 0 + }, + { + "slot_name": "limitation_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of the limitation and its implications.\n", + "line": 0 + }, + { + "slot_name": "future_guarantees", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", + "line": 0 + }, + { + "slot_name": "confidentiality_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on confidential data elements and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "warnings", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content (e.g., violence, profanity, explicit imagery).", + "line": 0 + }, + { + "slot_name": "identification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", + "line": 0 + }, + { + "slot_name": "distribution", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "line": 0 + }, + { + "slot_name": "deidentification_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on de-identification procedures and residual risks.\n", + "line": 0 + }, + { + "slot_name": "sensitivity_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on sensitive data elements present and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "review_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "line": 0 + }, + { + "slot_name": "notification_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how individuals were notified about data collection.\n", + "line": 0 + }, + { + "slot_name": "consent_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how consent was requested, provided, and documented.\n", + "line": 0 + }, + { + "slot_name": "revocation_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on consent revocation mechanisms and procedures.\n", + "line": 0 + }, + { + "slot_name": "maintainer_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on who will support, host, or maintain the dataset.\n", + "line": 0 + }, + { + "slot_name": "erratum_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on any errata or corrections to the dataset.\n", + "line": 0 + }, + { + "slot_name": "update_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on update plans, responsible parties, and communication methods.\n", + "line": 0 + }, + { + "slot_name": "retention_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on data retention limits and enforcement procedures.\n", + "line": 0 + }, + { + "slot_name": "version_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on version support policies and obsolescence communication.\n", + "line": 0 + }, + { + "slot_name": "extension_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the primary purpose of creating the dataset.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", + "line": 0 + }, + { + "slot_name": "preprocessing_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on preprocessing steps applied to the data.\n", + "line": 0 + }, + { + "slot_name": "cleaning_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on data cleaning procedures applied.\n", + "line": 0 + }, + { + "slot_name": "labeling_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on labeling/annotation procedures and quality metrics.\n", + "line": 0 + }, + { + "slot_name": "raw_data_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on raw data availability and access procedures.\n", + "line": 0 + }, + { + "slot_name": "repository_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on the repository of known dataset uses.\n", + "line": 0 + }, + { + "slot_name": "task_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on other potential tasks the dataset could be used for.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "line": 0 + }, + { + "slot_name": "discouragement_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on tasks for which the dataset should not be used.\n", + "line": 0 + }, + { + "slot_name": "quality_notes", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "Notes about data quality, reliability, or known issues specific to this variable.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "acquisition_details", + "action": "keep", + "new_slot_uri": "dcterms:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "mechanism_details", + "action": "change", + "new_slot_uri": "d4d:mechanism_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collector_details", + "action": "change", + "new_slot_uri": "d4d:collector_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "timeframe_details", + "action": "change", + "new_slot_uri": "d4d:timeframe_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collection_details", + "action": "change", + "new_slot_uri": "d4d:collection_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "source_description", + "action": "change", + "new_slot_uri": "d4d:source_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "missing", + "action": "change", + "new_slot_uri": "d4d:missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "why_missing", + "action": "change", + "new_slot_uri": "d4d:why_missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "relationship_details", + "action": "change", + "new_slot_uri": "d4d:relationship_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "split_details", + "action": "change", + "new_slot_uri": "d4d:split_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "anomaly_details", + "action": "change", + "new_slot_uri": "d4d:anomaly_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "bias_description", + "action": "change", + "new_slot_uri": "d4d:bias_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "limitation_description", + "action": "change", + "new_slot_uri": "d4d:limitation_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "future_guarantees", + "action": "change", + "new_slot_uri": "d4d:future_guarantees", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "confidentiality_details", + "action": "change", + "new_slot_uri": "d4d:confidentiality_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "warnings", + "action": "change", + "new_slot_uri": "d4d:warnings", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "identification", + "action": "change", + "new_slot_uri": "d4d:identification", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "distribution", + "action": "change", + "new_slot_uri": "d4d:distribution", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "deidentification_details", + "action": "change", + "new_slot_uri": "d4d:deidentification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sensitivity_details", + "action": "change", + "new_slot_uri": "d4d:sensitivity_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "review_details", + "action": "change", + "new_slot_uri": "d4d:review_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "notification_details", + "action": "change", + "new_slot_uri": "d4d:notification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "consent_details", + "action": "change", + "new_slot_uri": "d4d:consent_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "revocation_details", + "action": "change", + "new_slot_uri": "d4d:revocation_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "maintainer_details", + "action": "change", + "new_slot_uri": "d4d:maintainer_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "erratum_details", + "action": "change", + "new_slot_uri": "d4d:erratum_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "update_details", + "action": "change", + "new_slot_uri": "d4d:update_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "retention_details", + "action": "change", + "new_slot_uri": "d4d:retention_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "version_details", + "action": "change", + "new_slot_uri": "d4d:version_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "extension_details", + "action": "change", + "new_slot_uri": "d4d:extension_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "preprocessing_details", + "action": "change", + "new_slot_uri": "d4d:preprocessing_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "cleaning_details", + "action": "change", + "new_slot_uri": "d4d:cleaning_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "labeling_details", + "action": "change", + "new_slot_uri": "d4d:labeling_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "raw_data_details", + "action": "change", + "new_slot_uri": "d4d:raw_data_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "repository_details", + "action": "change", + "new_slot_uri": "d4d:repository_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "task_details", + "action": "change", + "new_slot_uri": "d4d:task_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "discouragement_details", + "action": "change", + "new_slot_uri": "d4d:discouragement_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "quality_notes", + "action": "change", + "new_slot_uri": "d4d:quality_notes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:format", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "format", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The file format, physical medium, or dimensions of a resource. This should be a file extension or MIME type.", + "line": 0 + }, + { + "slot_name": "data_substrate", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Type of data (e.g., raw text, images) from Bridge2AI standards.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "format", + "action": "keep", + "new_slot_uri": "dcterms:format", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "data_substrate", + "action": "change", + "new_slot_uri": "d4d:data_substrate", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:identifier", + "severity": "MEDIUM", + "conflict_count": 4, + "usages": [ + { + "slot_name": "hash", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Cryptographic hash value of the data for integrity verification.", + "line": 0 + }, + { + "slot_name": "md5", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "MD5 hash value of the data (128-bit cryptographic hash).", + "line": 0 + }, + { + "slot_name": "sha256", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "SHA-256 hash value of the data (256-bit cryptographic hash, recommended).", + "line": 0 + }, + { + "slot_name": "doi", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "hash", + "action": "keep", + "new_slot_uri": "dcterms:identifier", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "md5", + "action": "change", + "new_slot_uri": "d4d:md5", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sha256", + "action": "change", + "new_slot_uri": "d4d:sha256", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "doi", + "action": "change", + "new_slot_uri": "d4d:doi", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:license", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "license", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", + "line": 0 + }, + { + "slot_name": "license_terms", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "license", + "action": "keep", + "new_slot_uri": "dcterms:license", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "license_terms", + "action": "change", + "new_slot_uri": "d4d:license_terms", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:type", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "status", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The status of the resource (e.g., draft, published, deprecated).", + "line": 0 + }, + { + "slot_name": "source_type", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "line": 0 + }, + { + "slot_name": "instance_type", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "status", + "action": "keep", + "new_slot_uri": "dcterms:type", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "source_type", + "action": "change", + "new_slot_uri": "d4d:source_type", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "instance_type", + "action": "change", + "new_slot_uri": "d4d:instance_type", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:affiliation", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "affiliation", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The organization(s) to which the person belongs in the context of this dataset. May vary across datasets; multivalued to support multiple affiliations.", + "line": 0 + }, + { + "slot_name": "affiliations", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Organizations with which the creator or team is affiliated.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "affiliation", + "action": "keep", + "new_slot_uri": "schema:affiliation", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "affiliations", + "action": "change", + "new_slot_uri": "d4d:affiliations", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:contactPoint", + "severity": "HIGH", + "conflict_count": 2, + "usages": [ + { + "slot_name": "contact_person", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", + "line": 0 + }, + { + "slot_name": "governance_committee_contact", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", + "line": 0 + }, + { + "slot_name": "contact_person", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "contact_person", + "action": "keep", + "new_slot_uri": "schema:contactPoint", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "governance_committee_contact", + "action": "change", + "new_slot_uri": "d4d:governance_committee_contact", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "contact_person", + "action": "change", + "new_slot_uri": "d4d:contact_person", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:description", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for a thing.", + "line": 0 + }, + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for this property.", + "line": 0 + }, + { + "slot_name": "label_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "If labeled, what pattern or format do labels follow?\n", + "line": 0 + }, + { + "slot_name": "representative_verification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of how representativeness was validated or verified.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "description", + "action": "keep", + "new_slot_uri": "schema:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "description", + "action": "change", + "new_slot_uri": "d4d:description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "label_description", + "action": "change", + "new_slot_uri": "d4d:label_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "representative_verification", + "action": "change", + "new_slot_uri": "d4d:representative_verification", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:identifier", + "severity": "HIGH", + "conflict_count": 6, + "usages": [ + { + "slot_name": "id", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A unique identifier for a thing.", + "line": 0 + }, + { + "slot_name": "id", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "An optional identifier for this property.", + "line": 0 + }, + { + "slot_name": "orcid", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", + "line": 0 + }, + { + "slot_name": "target_dataset", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", + "line": 0 + }, + { + "slot_name": "latest_version_doi", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "DOI or URL of the latest dataset version.", + "line": 0 + }, + { + "slot_name": "grant_number", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "The alphanumeric identifier for the grant.", + "line": 0 + }, + { + "slot_name": "is_identifier", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "id", + "action": "keep", + "new_slot_uri": "schema:identifier", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "id", + "action": "change", + "new_slot_uri": "d4d:id", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "orcid", + "action": "change", + "new_slot_uri": "d4d:orcid", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "target_dataset", + "action": "change", + "new_slot_uri": "d4d:target_dataset", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "latest_version_doi", + "action": "change", + "new_slot_uri": "d4d:latest_version_doi", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "grant_number", + "action": "change", + "new_slot_uri": "d4d:grant_number", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "is_identifier", + "action": "change", + "new_slot_uri": "d4d:is_identifier", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:name", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for a thing.", + "line": 0 + }, + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for this property.", + "line": 0 + }, + { + "slot_name": "tools", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "List of automated annotation tools with their versions. Format each entry as \"ToolName version\" (e.g., \"spaCy 3.5.0\", \"NLTK 3.8\", \"GPT-4 turbo\"). Use \"unknown\" for version if not available (e.g., \"Custom NER Model unknown\").\n", + "line": 0 + }, + { + "slot_name": "variable_name", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "The name or identifier of the variable as it appears in the data files.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "name", + "action": "keep", + "new_slot_uri": "schema:name", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "name", + "action": "change", + "new_slot_uri": "d4d:name", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "tools", + "action": "change", + "new_slot_uri": "d4d:tools", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "variable_name", + "action": "change", + "new_slot_uri": "d4d:variable_name", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + } + ] +} \ No newline at end of file diff --git a/reports/slot_uri_conflicts_latest.json b/reports/slot_uri_conflicts_latest.json new file mode 100644 index 00000000..15f10503 --- /dev/null +++ b/reports/slot_uri_conflicts_latest.json @@ -0,0 +1,1142 @@ +{ + "metadata": { + "tool": "slot_uri_conflict_detector", + "total_conflicts": 11 + }, + "conflicts": [ + { + "slot_uri": "dcat:byteSize", + "severity": "MEDIUM", + "conflict_count": 2, + "usages": [ + { + "slot_name": "bytes", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Size of the data in bytes.", + "line": 0 + }, + { + "slot_name": "total_bytes", + "module": "D4D_FileCollection", + "file": "D4D_FileCollection.yaml", + "description": "Total size of all files in bytes.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "bytes", + "action": "keep", + "new_slot_uri": "dcat:byteSize", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "total_bytes", + "action": "change", + "new_slot_uri": "d4d:total_bytes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:conformsTo", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "conforms_to", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "An established standard, specification, or schema to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_schema", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The schema or data model to which the resource conforms.", + "line": 0 + }, + { + "slot_name": "conforms_to_class", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The specific class or type within a schema to which the resource conforms.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "conforms_to", + "action": "keep", + "new_slot_uri": "dcterms:conformsTo", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "conforms_to_schema", + "action": "change", + "new_slot_uri": "d4d:conforms_to_schema", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "conforms_to_class", + "action": "change", + "new_slot_uri": "d4d:conforms_to_class", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:description", + "severity": "CRITICAL", + "conflict_count": 40, + "usages": [ + { + "slot_name": "acquisition_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on how data was acquired for each instance.\n", + "line": 0 + }, + { + "slot_name": "mechanism_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on mechanisms or procedures used to collect the data.\n", + "line": 0 + }, + { + "slot_name": "collector_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on who collected the data and their compensation.\n", + "line": 0 + }, + { + "slot_name": "timeframe_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "line": 0 + }, + { + "slot_name": "collection_details", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Details on direct vs. indirect collection methods and sources.\n", + "line": 0 + }, + { + "slot_name": "source_description", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collection).\n", + "line": 0 + }, + { + "slot_name": "missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Description of the missing data fields or elements.\n", + "line": 0 + }, + { + "slot_name": "why_missing", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of why each piece of data is missing.\n", + "line": 0 + }, + { + "slot_name": "relationship_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on relationships between instances (e.g., graph edges, ratings).\n", + "line": 0 + }, + { + "slot_name": "split_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on recommended data splits and their rationale.\n", + "line": 0 + }, + { + "slot_name": "anomaly_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "line": 0 + }, + { + "slot_name": "bias_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "line": 0 + }, + { + "slot_name": "limitation_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Detailed description of the limitation and its implications.\n", + "line": 0 + }, + { + "slot_name": "future_guarantees", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", + "line": 0 + }, + { + "slot_name": "confidentiality_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on confidential data elements and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "warnings", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Specific content warnings describing potentially offensive, insulting, threatening, or anxiety-provoking content (e.g., violence, profanity, explicit imagery).", + "line": 0 + }, + { + "slot_name": "identification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disease status, or other demographic/clinical characteristics).", + "line": 0 + }, + { + "slot_name": "distribution", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "line": 0 + }, + { + "slot_name": "deidentification_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on de-identification procedures and residual risks.\n", + "line": 0 + }, + { + "slot_name": "sensitivity_details", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Details on sensitive data elements present and handling procedures.\n", + "line": 0 + }, + { + "slot_name": "review_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "line": 0 + }, + { + "slot_name": "notification_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how individuals were notified about data collection.\n", + "line": 0 + }, + { + "slot_name": "consent_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on how consent was requested, provided, and documented.\n", + "line": 0 + }, + { + "slot_name": "revocation_details", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Details on consent revocation mechanisms and procedures.\n", + "line": 0 + }, + { + "slot_name": "maintainer_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on who will support, host, or maintain the dataset.\n", + "line": 0 + }, + { + "slot_name": "erratum_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on any errata or corrections to the dataset.\n", + "line": 0 + }, + { + "slot_name": "update_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on update plans, responsible parties, and communication methods.\n", + "line": 0 + }, + { + "slot_name": "retention_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on data retention limits and enforcement procedures.\n", + "line": 0 + }, + { + "slot_name": "version_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on version support policies and obsolescence communication.\n", + "line": 0 + }, + { + "slot_name": "extension_details", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the primary purpose of creating the dataset.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "line": 0 + }, + { + "slot_name": "response", + "module": "D4D_Motivation", + "file": "D4D_Motivation.yaml", + "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", + "line": 0 + }, + { + "slot_name": "preprocessing_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on preprocessing steps applied to the data.\n", + "line": 0 + }, + { + "slot_name": "cleaning_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on data cleaning procedures applied.\n", + "line": 0 + }, + { + "slot_name": "labeling_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on labeling/annotation procedures and quality metrics.\n", + "line": 0 + }, + { + "slot_name": "raw_data_details", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "Details on raw data availability and access procedures.\n", + "line": 0 + }, + { + "slot_name": "repository_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on the repository of known dataset uses.\n", + "line": 0 + }, + { + "slot_name": "task_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on other potential tasks the dataset could be used for.\n", + "line": 0 + }, + { + "slot_name": "impact_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "line": 0 + }, + { + "slot_name": "discouragement_details", + "module": "D4D_Uses", + "file": "D4D_Uses.yaml", + "description": "Details on tasks for which the dataset should not be used.\n", + "line": 0 + }, + { + "slot_name": "quality_notes", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "Notes about data quality, reliability, or known issues specific to this variable.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "acquisition_details", + "action": "keep", + "new_slot_uri": "dcterms:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "mechanism_details", + "action": "change", + "new_slot_uri": "d4d:mechanism_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collector_details", + "action": "change", + "new_slot_uri": "d4d:collector_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "timeframe_details", + "action": "change", + "new_slot_uri": "d4d:timeframe_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "collection_details", + "action": "change", + "new_slot_uri": "d4d:collection_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "source_description", + "action": "change", + "new_slot_uri": "d4d:source_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "missing", + "action": "change", + "new_slot_uri": "d4d:missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "why_missing", + "action": "change", + "new_slot_uri": "d4d:why_missing", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "relationship_details", + "action": "change", + "new_slot_uri": "d4d:relationship_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "split_details", + "action": "change", + "new_slot_uri": "d4d:split_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "anomaly_details", + "action": "change", + "new_slot_uri": "d4d:anomaly_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "bias_description", + "action": "change", + "new_slot_uri": "d4d:bias_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "limitation_description", + "action": "change", + "new_slot_uri": "d4d:limitation_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "future_guarantees", + "action": "change", + "new_slot_uri": "d4d:future_guarantees", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "confidentiality_details", + "action": "change", + "new_slot_uri": "d4d:confidentiality_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "warnings", + "action": "change", + "new_slot_uri": "d4d:warnings", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "identification", + "action": "change", + "new_slot_uri": "d4d:identification", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "distribution", + "action": "change", + "new_slot_uri": "d4d:distribution", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "deidentification_details", + "action": "change", + "new_slot_uri": "d4d:deidentification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sensitivity_details", + "action": "change", + "new_slot_uri": "d4d:sensitivity_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "review_details", + "action": "change", + "new_slot_uri": "d4d:review_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "notification_details", + "action": "change", + "new_slot_uri": "d4d:notification_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "consent_details", + "action": "change", + "new_slot_uri": "d4d:consent_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "revocation_details", + "action": "change", + "new_slot_uri": "d4d:revocation_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "maintainer_details", + "action": "change", + "new_slot_uri": "d4d:maintainer_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "erratum_details", + "action": "change", + "new_slot_uri": "d4d:erratum_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "update_details", + "action": "change", + "new_slot_uri": "d4d:update_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "retention_details", + "action": "change", + "new_slot_uri": "d4d:retention_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "version_details", + "action": "change", + "new_slot_uri": "d4d:version_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "extension_details", + "action": "change", + "new_slot_uri": "d4d:extension_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "response", + "action": "change", + "new_slot_uri": "d4d:response", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "preprocessing_details", + "action": "change", + "new_slot_uri": "d4d:preprocessing_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "cleaning_details", + "action": "change", + "new_slot_uri": "d4d:cleaning_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "labeling_details", + "action": "change", + "new_slot_uri": "d4d:labeling_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "raw_data_details", + "action": "change", + "new_slot_uri": "d4d:raw_data_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "repository_details", + "action": "change", + "new_slot_uri": "d4d:repository_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "task_details", + "action": "change", + "new_slot_uri": "d4d:task_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "impact_details", + "action": "change", + "new_slot_uri": "d4d:impact_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "discouragement_details", + "action": "change", + "new_slot_uri": "d4d:discouragement_details", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "quality_notes", + "action": "change", + "new_slot_uri": "d4d:quality_notes", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:format", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "format", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The file format, physical medium, or dimensions of a resource. This should be a file extension or MIME type.", + "line": 0 + }, + { + "slot_name": "data_substrate", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Type of data (e.g., raw text, images) from Bridge2AI standards.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "format", + "action": "keep", + "new_slot_uri": "dcterms:format", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "data_substrate", + "action": "change", + "new_slot_uri": "d4d:data_substrate", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:hasVersion", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "version", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The version identifier of the resource (e.g., \"1.0\", \"2.3.1\").", + "line": 0 + }, + { + "slot_name": "latest_version_doi", + "module": "D4D_Maintenance", + "file": "D4D_Maintenance.yaml", + "description": "DOI or URL of the latest dataset version.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "version", + "action": "keep", + "new_slot_uri": "dcterms:hasVersion", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "latest_version_doi", + "action": "change", + "new_slot_uri": "d4d:latest_version_doi", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:identifier", + "severity": "MEDIUM", + "conflict_count": 4, + "usages": [ + { + "slot_name": "hash", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Cryptographic hash value of the data for integrity verification.", + "line": 0 + }, + { + "slot_name": "md5", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "MD5 hash value of the data (128-bit cryptographic hash).", + "line": 0 + }, + { + "slot_name": "sha256", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "SHA-256 hash value of the data (256-bit cryptographic hash, recommended).", + "line": 0 + }, + { + "slot_name": "doi", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "hash", + "action": "keep", + "new_slot_uri": "dcterms:identifier", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "md5", + "action": "change", + "new_slot_uri": "d4d:md5", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "sha256", + "action": "change", + "new_slot_uri": "d4d:sha256", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "doi", + "action": "change", + "new_slot_uri": "d4d:doi", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:license", + "severity": "CRITICAL", + "conflict_count": 2, + "usages": [ + { + "slot_name": "license", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The legal license under which the resource is made available (e.g., \"MIT\", \"CC-BY-4.0\").", + "line": 0 + }, + { + "slot_name": "license_terms", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "license", + "action": "keep", + "new_slot_uri": "dcterms:license", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "license_terms", + "action": "change", + "new_slot_uri": "d4d:license_terms", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "dcterms:type", + "severity": "CRITICAL", + "conflict_count": 3, + "usages": [ + { + "slot_name": "status", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "The status of the resource (e.g., draft, published, deprecated).", + "line": 0 + }, + { + "slot_name": "source_type", + "module": "D4D_Collection", + "file": "D4D_Collection.yaml", + "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "line": 0 + }, + { + "slot_name": "instance_type", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Multiple types of instances? (e.g., movies, users, and ratings).\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "status", + "action": "keep", + "new_slot_uri": "dcterms:type", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "source_type", + "action": "change", + "new_slot_uri": "d4d:source_type", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "instance_type", + "action": "change", + "new_slot_uri": "d4d:instance_type", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:contactPoint", + "severity": "HIGH", + "conflict_count": 2, + "usages": [ + { + "slot_name": "contact_person", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", + "line": 0 + }, + { + "slot_name": "governance_committee_contact", + "module": "D4D_Data_Governance", + "file": "D4D_Data_Governance.yaml", + "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", + "line": 0 + }, + { + "slot_name": "contact_person", + "module": "D4D_Ethics", + "file": "D4D_Ethics.yaml", + "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "contact_person", + "action": "keep", + "new_slot_uri": "schema:contactPoint", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "governance_committee_contact", + "action": "change", + "new_slot_uri": "d4d:governance_committee_contact", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "contact_person", + "action": "change", + "new_slot_uri": "d4d:contact_person", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:description", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for a thing.", + "line": 0 + }, + { + "slot_name": "description", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable description for this property.", + "line": 0 + }, + { + "slot_name": "label_description", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "If labeled, what pattern or format do labels follow?\n", + "line": 0 + }, + { + "slot_name": "representative_verification", + "module": "D4D_Composition", + "file": "D4D_Composition.yaml", + "description": "Explanation of how representativeness was validated or verified.\n", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "description", + "action": "keep", + "new_slot_uri": "schema:description", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "description", + "action": "change", + "new_slot_uri": "d4d:description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "label_description", + "action": "change", + "new_slot_uri": "d4d:label_description", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "representative_verification", + "action": "change", + "new_slot_uri": "d4d:representative_verification", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + }, + { + "slot_uri": "schema:name", + "severity": "MEDIUM", + "conflict_count": 3, + "usages": [ + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for a thing.", + "line": 0 + }, + { + "slot_name": "name", + "module": "D4D_Base_import", + "file": "D4D_Base_import.yaml", + "description": "A human-readable name for this property.", + "line": 0 + }, + { + "slot_name": "tools", + "module": "D4D_Preprocessing", + "file": "D4D_Preprocessing.yaml", + "description": "List of automated annotation tools with their versions. Format each entry as \"ToolName version\" (e.g., \"spaCy 3.5.0\", \"NLTK 3.8\", \"GPT-4 turbo\"). Use \"unknown\" for version if not available (e.g., \"Custom NER Model unknown\").\n", + "line": 0 + }, + { + "slot_name": "variable_name", + "module": "D4D_Variables", + "file": "D4D_Variables.yaml", + "description": "The name or identifier of the variable as it appears in the data files.", + "line": 0 + } + ], + "impact": { + "data_corruption_risk": "low", + "tool_breakage_risk": "high", + "semantic_integrity": "critical", + "migration_complexity": "low" + }, + "recommended_fix": { + "approach": "differentiate_mappings", + "recommendations": [ + { + "slot": "name", + "action": "keep", + "new_slot_uri": "schema:name", + "rationale": "Appears to match ontology term semantics" + }, + { + "slot": "name", + "action": "change", + "new_slot_uri": "d4d:name", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "tools", + "action": "change", + "new_slot_uri": "d4d:tools", + "rationale": "Create custom D4D term to avoid conflict" + }, + { + "slot": "variable_name", + "action": "change", + "new_slot_uri": "d4d:variable_name", + "rationale": "Create custom D4D term to avoid conflict" + } + ] + } + } + ] +} \ No newline at end of file diff --git a/scripts/data_value_analyzer.py b/scripts/data_value_analyzer.py new file mode 100755 index 00000000..43c82de2 --- /dev/null +++ b/scripts/data_value_analyzer.py @@ -0,0 +1,293 @@ +#!/usr/bin/env python3 +""" +Analyze actual values in D4D data files to validate schema range types. + +Examines D4D YAML files for the four Bridge2AI projects to see: +- What values are present in boolean fields (are they truly boolean?) +- What values are in string fields (should they be enums?) +- Whether single-valued fields contain list-like data +- What types of values are actually used +""" + +import yaml +import json +import argparse +from pathlib import Path +from collections import defaultdict +from typing import Any, Dict, List, Set + +def load_d4d_file(file_path: Path) -> dict: + """Load a D4D YAML data file.""" + with open(file_path, 'r') as f: + return yaml.safe_load(f) + +def extract_field_values(data: Any, prefix: str = '') -> Dict[str, Set]: + """ + Recursively extract all field values from D4D data. + + Returns dict mapping field_path to set of values seen. + """ + field_values = defaultdict(set) + + if isinstance(data, dict): + for key, value in data.items(): + field_path = f"{prefix}.{key}" if prefix else key + + if value is None: + field_values[field_path].add(None) + elif isinstance(value, (str, int, float, bool)): + field_values[field_path].add(value) + elif isinstance(value, list): + # For lists, record that it's multivalued and extract item values + field_values[field_path].add(f"__LIST__({len(value)} items)") + for item in value: + if isinstance(item, (str, int, float, bool)): + field_values[f"{field_path}[item]"].add(item) + elif isinstance(item, dict): + sub_values = extract_field_values(item, field_path + "[item]") + for sub_key, sub_vals in sub_values.items(): + field_values[sub_key].update(sub_vals) + elif isinstance(value, dict): + sub_values = extract_field_values(value, field_path) + for sub_key, sub_vals in sub_values.items(): + field_values[sub_key].update(sub_vals) + + elif isinstance(data, list): + for item in data: + sub_values = extract_field_values(item, prefix) + for sub_key, sub_vals in sub_values.items(): + field_values[sub_key].update(sub_vals) + + return field_values + +def analyze_field_type(values: Set) -> dict: + """ + Analyze the type of values in a field. + + Returns dict with: + - inferred_type: str + - is_boolean: bool + - could_be_enum: bool (if limited value set) + - is_multivalued: bool + - value_count: int + - sample_values: list + """ + # Filter out None and __LIST__ markers + real_values = {v for v in values if v is not None and not str(v).startswith('__LIST__')} + list_markers = {v for v in values if str(v).startswith('__LIST__')} + + value_count = len(real_values) + is_multivalued = len(list_markers) > 0 + + # Infer type + if not real_values: + inferred_type = 'unknown' + is_boolean = False + could_be_enum = False + else: + types = {type(v).__name__ for v in real_values} + + if len(types) == 1: + inferred_type = list(types)[0] + else: + inferred_type = 'mixed' + + # Check if boolean + is_boolean = (inferred_type == 'bool' or + (inferred_type == 'str' and + real_values.issubset({'true', 'false', 'True', 'False', 'yes', 'no'}))) + + # Check if could be enum (limited value set) + could_be_enum = (inferred_type in ['str', 'bool'] and + 1 < value_count <= 20) # Arbitrary threshold + + return { + 'inferred_type': inferred_type, + 'is_boolean': is_boolean, + 'could_be_enum': could_be_enum, + 'is_multivalued': is_multivalued, + 'value_count': value_count, + 'sample_values': sorted(list(real_values))[:10] + } + +def analyze_d4d_files(data_dir: Path, projects: List[str]) -> Dict[str, dict]: + """ + Analyze D4D data files for multiple projects. + + Returns dict mapping field_path to analysis results. + """ + all_field_values = defaultdict(set) + + # Find and analyze D4D files + for project in projects: + # Use claudecode_agent method (recommended) + d4d_file = data_dir / 'claudecode_agent' / f'{project}_d4d.yaml' + + if not d4d_file.exists(): + print(f"Warning: {d4d_file} not found, skipping") + continue + + print(f"Analyzing: {d4d_file}") + data = load_d4d_file(d4d_file) + field_values = extract_field_values(data) + + # Merge with all values + for field, values in field_values.items(): + all_field_values[field].update(values) + + # Analyze each field + field_analyses = {} + for field, values in all_field_values.items(): + field_analyses[field] = analyze_field_type(values) + + return field_analyses + +def identify_issues(field_analyses: Dict[str, dict]) -> List[dict]: + """ + Identify potential schema issues based on actual data values. + + Returns list of issues found. + """ + issues = [] + + for field_path, analysis in field_analyses.items(): + # Skip internal/list item paths + if '[item]' in field_path: + continue + + # Issue: Boolean field but multiple distinct string values + if analysis['inferred_type'] == 'str' and not analysis['is_boolean']: + last_segment = field_path.split('.')[-1] + if last_segment.startswith(('is_', 'has_', 'was_', 'are_')) or 'boolean' in last_segment.lower(): + issues.append({ + 'field': field_path, + 'issue_type': 'boolean_field_has_non_boolean_values', + 'severity': 'HIGH', + 'description': f"Field name suggests boolean but contains {analysis['value_count']} string values", + 'sample_values': analysis['sample_values'] + }) + + # Issue: String field with limited value set (could be enum) + if analysis['could_be_enum'] and not analysis['is_multivalued']: + issues.append({ + 'field': field_path, + 'issue_type': 'string_could_be_enum', + 'severity': 'MEDIUM', + 'description': f"String field with only {analysis['value_count']} distinct values (enum candidate)", + 'sample_values': analysis['sample_values'] + }) + + # Issue: Multivalued field in data (check if schema allows) + if analysis['is_multivalued']: + issues.append({ + 'field': field_path, + 'issue_type': 'multivalued_in_data', + 'severity': 'INFO', + 'description': "Field contains lists in data - verify schema has multivalued: true", + 'sample_values': [str(v) for v in list(analysis['sample_values'])[:5]] + }) + + return issues + +def generate_report(field_analyses: Dict[str, dict], issues: List[dict], output_format: str = 'json') -> str: + """Generate data value analysis report.""" + report = { + 'metadata': { + 'tool': 'data_value_analyzer', + 'total_fields': len(field_analyses), + 'total_issues': len(issues) + }, + 'summary': { + 'fields_analyzed': len(field_analyses), + 'boolean_fields': sum(1 for a in field_analyses.values() if a['is_boolean']), + 'enum_candidates': sum(1 for a in field_analyses.values() if a['could_be_enum']), + 'multivalued_fields': sum(1 for a in field_analyses.values() if a['is_multivalued']) + }, + 'issues': issues, + 'field_analyses': { + k: v for k, v in sorted(field_analyses.items()) + } + } + + if output_format == 'json': + return json.dumps(report, indent=2) + else: + # Text format + lines = [] + lines.append("=" * 80) + lines.append("D4D DATA VALUE ANALYSIS REPORT") + lines.append("=" * 80) + lines.append("") + lines.append(f"Fields analyzed: {len(field_analyses)}") + lines.append(f"Issues found: {len(issues)}") + lines.append("") + + # Group issues by type + issues_by_type = defaultdict(list) + for issue in issues: + issues_by_type[issue['issue_type']].append(issue) + + for issue_type, type_issues in sorted(issues_by_type.items()): + lines.append(f"\n{issue_type.upper().replace('_', ' ')} ({len(type_issues)} occurrences):") + lines.append("-" * 80) + for issue in type_issues[:10]: # Limit to 10 per type + lines.append(f"\n {issue['field']}") + lines.append(f" {issue['description']}") + if issue['sample_values']: + vals_preview = ', '.join([str(v)[:30] for v in issue['sample_values'][:5]]) + lines.append(f" Sample values: {vals_preview}") + + lines.append("") + lines.append("=" * 80) + return "\n".join(lines) + +def main(): + parser = argparse.ArgumentParser( + description='Analyze actual values in D4D data files' + ) + parser.add_argument( + '--data-dir', + type=Path, + default=Path('data/d4d_concatenated'), + help='Directory containing D4D data files' + ) + parser.add_argument( + '--projects', + nargs='+', + default=['AI_READI', 'CHORUS', 'CM4AI', 'VOICE'], + help='Projects to analyze' + ) + parser.add_argument( + '--output', + type=Path, + help='Output file path (default: stdout)' + ) + parser.add_argument( + '--format', + choices=['json', 'text'], + default='json', + help='Output format' + ) + + args = parser.parse_args() + + # Analyze D4D files + field_analyses = analyze_d4d_files(args.data_dir, args.projects) + + # Identify issues + issues = identify_issues(field_analyses) + + # Generate report + report = generate_report(field_analyses, issues, args.format) + + # Output + if args.output: + args.output.write_text(report) + print(f"\nReport written to: {args.output}") + else: + print(report) + + return 0 + +if __name__ == '__main__': + exit(main()) diff --git a/scripts/generate_semantic_review_report.py b/scripts/generate_semantic_review_report.py new file mode 100755 index 00000000..f93d1a4a --- /dev/null +++ b/scripts/generate_semantic_review_report.py @@ -0,0 +1,285 @@ +#!/usr/bin/env python3 +""" +Generate comprehensive semantic review report from all validation outputs. + +Consolidates findings from: +- slot_uri conflict detector +- range-description checker +- data value analyzer +- (future) ontology mapping validator +- (future) semantic consistency analyzer +- (future) logical constraint validator +""" + +import json +import argparse +from pathlib import Path +from collections import defaultdict +from datetime import datetime + +def load_json_report(file_path: Path) -> dict: + """Load a JSON report file.""" + if not file_path.exists(): + return {} + with open(file_path, 'r') as f: + return json.load(f) + +def generate_executive_summary(reports: dict) -> str: + """Generate executive summary section.""" + slot_uri = reports.get('slot_uri_conflicts', {}) + ranges = reports.get('range_mismatches', {}) + data_values = reports.get('data_value_analysis', {}) + + total_issues = ( + slot_uri.get('metadata', {}).get('total_conflicts', 0) + + ranges.get('metadata', {}).get('total_issues', 0) + + data_values.get('metadata', {}).get('total_issues', 0) + ) + + # Count by severity + critical = sum(1 for c in slot_uri.get('conflicts', []) if c.get('severity') == 'CRITICAL') + high = (sum(1 for c in slot_uri.get('conflicts', []) if c.get('severity') == 'HIGH') + + ranges.get('summary', {}).get('HIGH', 0)) + medium = (sum(1 for c in slot_uri.get('conflicts', []) if c.get('severity') == 'MEDIUM') + + ranges.get('summary', {}).get('MEDIUM', 0)) + low = ranges.get('summary', {}).get('LOW', 0) + + lines = [] + lines.append("## Executive Summary\n") + lines.append(f"**Total Issues Found:** {total_issues}\n") + lines.append(f"- **CRITICAL:** {critical} (Blocks functionality)") + lines.append(f"- **HIGH:** {high} (Wrong semantics)") + lines.append(f"- **MEDIUM:** {medium} (Reduces clarity)") + lines.append(f"- **LOW:** {low} (Documentation quality)\n") + + lines.append("### Key Findings\n") + lines.append(f"1. **slot_uri Conflicts:** {slot_uri.get('metadata', {}).get('total_conflicts', 0)} conflicts detected") + conflicts_list = slot_uri.get('conflicts', []) + if conflicts_list: + most_severe = max(conflicts_list, key=lambda c: c.get('conflict_count', 0)) + lines.append(f" - Most severe: `{most_severe.get('slot_uri', 'N/A')}` used by {most_severe.get('conflict_count', '?')} slots") + critical_uris = [c.get('slot_uri') for c in conflicts_list if c.get('severity') == 'CRITICAL'] + if critical_uris: + lines.append(f" - Critical URIs: {', '.join(f'`{u}`' for u in critical_uris[:5])}") + lines.append("") + + lines.append(f"2. **Range-Description Mismatches:** {ranges.get('metadata', {}).get('total_issues', 0)} issues") + lines.append(f" - HIGH priority: {ranges.get('summary', {}).get('HIGH', 0)} (boolean oversimplification, missing multivalued)") + lines.append(f" - MEDIUM priority: {ranges.get('summary', {}).get('MEDIUM', 0)} (primitives vs structured types)\n") + + lines.append(f"3. **Data Value Analysis:** {data_values.get('metadata', {}).get('total_fields', 0)} fields analyzed across 4 Bridge2AI projects") + lines.append(f" - Enum candidates: {data_values.get('summary', {}).get('enum_candidates', 0)} string fields with limited value sets") + lines.append(f" - Multivalued fields: {data_values.get('summary', {}).get('multivalued_fields', 0)} fields containing lists in actual data\n") + + return '\n'.join(lines) + +def generate_critical_issues_section(reports: dict) -> str: + """Generate critical issues section.""" + lines = [] + lines.append("## Critical Issues (Must Fix)\n") + + slot_uri = reports.get('slot_uri_conflicts', {}) + critical_conflicts = [c for c in slot_uri.get('conflicts', []) if c.get('severity') == 'CRITICAL'] + + for i, conflict in enumerate(critical_conflicts, 1): + lines.append(f"### C-{i:03d}: slot_uri Conflict - {conflict['slot_uri']}\n") + lines.append(f"**Severity:** CRITICAL") + lines.append(f"**Conflict Count:** {conflict['conflict_count']} different slots") + + # Show usages + lines.append(f"\n**Usages:**") + for usage in conflict['usages']: + lines.append(f"- `{usage['slot_name']}` in {usage['file']}") + if usage['description']: + lines.append(f" - Description: \"{usage['description'][:80]}...\"") + + lines.append(f"\n**Impact:**") + impact = conflict.get('impact', {}) + lines.append(f"- RDF Serialization: {impact.get('semantic_integrity', 'unknown')}") + lines.append(f"- Tool Breakage Risk: {impact.get('tool_breakage_risk', 'unknown')}") + + lines.append(f"\n**Recommended Fix:**") + fix = conflict.get('recommended_fix', {}) + if fix.get('recommendations'): + for rec in fix['recommendations']: + action = rec['action'] + if action == 'keep': + lines.append(f"- **KEEP** `{rec['slot']}` → `{rec['new_slot_uri']}` (correct usage)") + else: + lines.append(f"- **CHANGE** `{rec['slot']}` → `{rec['new_slot_uri']}` (avoid conflict)") + + lines.append(f"\n**Rationale:** {get_rationale(conflict['slot_uri'])}\n") + lines.append("") + + return '\n'.join(lines) + +def get_rationale(slot_uri: str) -> str: + """Get rationale for specific slot_uri conflicts.""" + rationales = { + 'dcat:mediaType': 'DCAT spec defines mediaType as MIME type (e.g., application/json), not character encoding', + 'dcterms:description': 'Overuse of generic description property loses semantic distinction between different types of descriptive text', + 'schema:identifier': 'Multiple slots represent different kinds of identifiers (DOI, ORCID, generic ID) - each should have specific mapping', + 'dcterms:license': 'License applies to different entities (dataset vs software) and should be differentiated', + } + for key in rationales: + if key in slot_uri: + return rationales[key] + return 'Multiple semantic concepts mapped to same ontology term creates ambiguity' + +def generate_high_priority_section(reports: dict) -> str: + """Generate high priority issues section.""" + lines = [] + lines.append("## High Priority Issues (Wrong Semantics)\n") + + # slot_uri conflicts marked as HIGH + slot_uri = reports.get('slot_uri_conflicts', {}) + high_conflicts = [c for c in slot_uri.get('conflicts', []) if c.get('severity') == 'HIGH'] + + issue_num = 1 + for conflict in high_conflicts[:5]: # Limit to top 5 + lines.append(f"### H-{issue_num:03d}: slot_uri Conflict - {conflict['slot_uri']}\n") + lines.append(f"**Usages:** {', '.join([u['slot_name'] for u in conflict['usages']])}") + lines.append(f"**Files:** {', '.join(set([u['file'] for u in conflict['usages']]))}\n") + issue_num += 1 + + # Range-description mismatches marked as HIGH + ranges = reports.get('range_mismatches', {}) + high_range_issues = [i for i in ranges.get('issues', []) if i.get('severity') == 'HIGH'] + + for issue in high_range_issues[:10]: # Limit to top 10 + location = f"{issue['module']}::{issue['class'] or 'slots'}::{issue['attribute']}" + lines.append(f"### H-{issue_num:03d}: Range Mismatch - {location}\n") + lines.append(f"**Current Range:** `{issue['range']}` (multivalued: {issue['multivalued']})") + lines.append(f"**Issue:** {issue['issue']}") + lines.append(f"**Description:** \"{issue['description'][:100]}...\"\n") + issue_num += 1 + + return '\n'.join(lines) + +def generate_data_insights_section(reports: dict) -> str: + """Generate section based on actual data analysis.""" + lines = [] + lines.append("## Data-Driven Insights\n") + lines.append("Analysis of actual D4D records for AI_READI, CHORUS, CM4AI, and VOICE projects:\n") + + data = reports.get('data_value_analysis', {}) + + lines.append(f"### Enum Candidates\n") + lines.append(f"Fields with limited value sets that could be enums:\n") + + enum_issues = [i for i in data.get('issues', []) if i['issue_type'] == 'string_could_be_enum'] + for issue in enum_issues[:15]: + lines.append(f"- `{issue['field']}`: {issue['description']}") + if issue['sample_values']: + vals = ', '.join([f'"{v}"' for v in issue['sample_values'][:5]]) + lines.append(f" - Values: {vals}") + + lines.append(f"\n### Multivalued Fields\n") + lines.append(f"Fields that contain lists in actual data (verify schema has multivalued: true):\n") + + mv_issues = [i for i in data.get('issues', []) if i['issue_type'] == 'multivalued_in_data'] + for issue in mv_issues[:15]: + lines.append(f"- `{issue['field']}`") + + return '\n'.join(lines) + +def generate_markdown_report(reports: dict) -> str: + """Generate complete markdown report.""" + lines = [] + lines.append("# D4D Schema Semantic Review Report\n") + lines.append(f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}") + lines.append(f"**Review Scope:** All D4D schema modules + actual data from 4 Bridge2AI projects\n") + lines.append("---\n") + + lines.append(generate_executive_summary(reports)) + lines.append("\n---\n") + + lines.append(generate_critical_issues_section(reports)) + lines.append("\n---\n") + + lines.append(generate_high_priority_section(reports)) + lines.append("\n---\n") + + lines.append(generate_data_insights_section(reports)) + lines.append("\n---\n") + + # Appendices + lines.append("## Appendices\n") + lines.append("### A: Files Requiring Changes\n") + lines.append("Priority-ordered list of schema files needing updates:\n") + lines.append("1. `src/data_sheets_schema/schema/D4D_Base_import.yaml` (CRITICAL - foundational)") + lines.append("2. `src/data_sheets_schema/schema/D4D_Composition.yaml` (HIGH)") + lines.append("3. `src/data_sheets_schema/schema/D4D_Data_Governance.yaml` (HIGH)") + lines.append("4. `src/data_sheets_schema/schema/D4D_Distribution.yaml` (MEDIUM)") + lines.append("5. `src/data_sheets_schema/schema/D4D_Maintenance.yaml` (MEDIUM)\n") + + lines.append("### B: Validation Tools\n") + lines.append("Automated validation scripts created:\n") + lines.append("- `scripts/slot_uri_conflict_detector.py` - Detects slot_uri conflicts") + lines.append("- `scripts/range_description_checker.py` - Checks range-description alignment") + lines.append("- `scripts/data_value_analyzer.py` - Analyzes actual data values\n") + + lines.append("### C: Next Steps\n") + lines.append("1. Review and prioritize CRITICAL issues") + lines.append("2. Create implementation plan for fixes") + lines.append("3. Coordinate with tool maintainers for breaking changes") + lines.append("4. Develop data migration strategy if needed") + lines.append("5. Update documentation with rationale for changes\n") + + return '\n'.join(lines) + +def main(): + parser = argparse.ArgumentParser( + description='Generate comprehensive semantic review report' + ) + parser.add_argument( + 'report_files', + nargs='+', + type=Path, + help='Input JSON report files' + ) + parser.add_argument( + '--output', + type=Path, + default=Path('reports/semantic_review_report.md'), + help='Output markdown file' + ) + + args = parser.parse_args() + + # Load all reports + reports = {} + for file_path in args.report_files: + if 'slot_uri_conflicts' in str(file_path): + reports['slot_uri_conflicts'] = load_json_report(file_path) + elif 'range_mismatches' in str(file_path): + reports['range_mismatches'] = load_json_report(file_path) + elif 'data_value_analysis' in str(file_path): + reports['data_value_analysis'] = load_json_report(file_path) + + # Generate markdown report + markdown = generate_markdown_report(reports) + + # Write output + args.output.write_text(markdown) + print(f"Semantic review report written to: {args.output}") + + # Also generate JSON summary + json_output = args.output.with_suffix('.json') + summary = { + 'generated': datetime.now().isoformat(), + 'total_issues': ( + reports.get('slot_uri_conflicts', {}).get('metadata', {}).get('total_conflicts', 0) + + reports.get('range_mismatches', {}).get('metadata', {}).get('total_issues', 0) + + reports.get('data_value_analysis', {}).get('metadata', {}).get('total_issues', 0) + ), + 'reports_analyzed': list(reports.keys()) + } + with open(json_output, 'w') as f: + json.dump(summary, f, indent=2) + print(f"JSON summary written to: {json_output}") + + return 0 + +if __name__ == '__main__': + exit(main()) diff --git a/scripts/quality_after.json b/scripts/quality_after.json new file mode 100644 index 00000000..1d1a9525 --- /dev/null +++ b/scripts/quality_after.json @@ -0,0 +1,3130 @@ +{ + "summary": { + "total_issues": 307, + "by_type": { + "CONSIDER_EXAMPLE": 239, + "MISSING_PERIOD": 50, + "TOO_BRIEF": 18 + }, + "by_priority": { + "LOW": 299, + "MEDIUM": 8 + } + }, + "modules": [ + { + "module": "D4D_Base_import", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.id", + "location": "D4D_Base_import::classes::NamedThing::attributes::id", + "description": "A unique identifier for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.name", + "location": "D4D_Base_import::classes::NamedThing::attributes::name", + "description": "A human-readable name for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.description", + "location": "D4D_Base_import::classes::NamedThing::attributes::description", + "description": "A human-readable description for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.id", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::id", + "description": "An optional identifier for this property.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.name", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::name", + "description": "A human-readable name for this property.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.description", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::description", + "description": "A human-readable description for this property.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DatasetProperty.used_software", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::used_software", + "description": "What software was used as part of this dataset property?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.used_software", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::used_software", + "description": "What software was used as part of this dataset property?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.affiliation", + "location": "D4D_Base_import::classes::Person::attributes::affiliation", + "description": "The organization(s) to which the person belongs in the context of this dataset. May vary across datasets; multivalued to support multiple affiliations.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.email", + "location": "D4D_Base_import::classes::Person::attributes::email", + "description": "The email address of the person. Represents current/preferred contact information in the context of this dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.orcid", + "location": "D4D_Base_import::classes::Person::attributes::orcid", + "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FormatDialect.double_quote", + "location": "D4D_Base_import::classes::FormatDialect::attributes::double_quote", + "description": "Whether quotes within quoted fields are escaped by doubling them.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FormatDialect.header", + "location": "D4D_Base_import::classes::FormatDialect::attributes::header", + "description": "Whether the first row contains column headers (true/false).", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-11", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-11", + "description": "Latin/Thai encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-5", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-5", + "description": "Latin/Cyrillic encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-6", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-6", + "description": "Latin/Arabic encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-7", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-7", + "description": "Latin/Greek encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-8", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-8", + "description": "Latin/Hebrew encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-9", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-9", + "description": "Latin-5 (Turkish)", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 5, + "complete_sentences_pct": 71.42857142857143, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 18, + "with_examples": 6, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 18, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "slot": { + "total": 31, + "with_examples": 5, + "with_examples_pct": 16.129032258064516, + "complete_sentences": 31, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 10, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 7, + "complete_sentences_pct": 70.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 134, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 66, + "complete_sentences_pct": 49.25373134328358, + "too_brief": 24, + "too_brief_pct": 17.91044776119403 + } + } + }, + { + "module": "D4D_Collection", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_directly_observed", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_directly_observed", + "description": "Whether the data was directly observed", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_directly_observed", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_directly_observed", + "description": "Whether the data was directly observed", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_reported_by_subjects", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_reported_by_subjects", + "description": "Whether the data was reported directly by the subjects themselves", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_reported_by_subjects", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_reported_by_subjects", + "description": "Whether the data was reported directly by the subjects themselves", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_inferred_derived", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_inferred_derived", + "description": "Whether the data was inferred or derived from other data", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_inferred_derived", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_inferred_derived", + "description": "Whether the data was inferred or derived from other data", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_validated_verified", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_validated_verified", + "description": "Whether the data was validated or verified in any way", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_validated_verified", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_validated_verified", + "description": "Whether the data was validated or verified in any way", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.acquisition_details", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::acquisition_details", + "description": "Details on how data was acquired for each instance.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionMechanism.mechanism_details", + "location": "D4D_Collection::classes::CollectionMechanism::attributes::mechanism_details", + "description": "Details on mechanisms or procedures used to collect the data.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DataCollector.role", + "location": "D4D_Collection::classes::DataCollector::attributes::role", + "description": "Role of the data collector (e.g., researcher, crowdworker)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataCollector.collector_details", + "location": "D4D_Collection::classes::DataCollector::attributes::collector_details", + "description": "Details on who collected the data and their compensation.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "CollectionTimeframe.start_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::start_date", + "description": "Start date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.start_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::start_date", + "description": "Start date of data collection", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "CollectionTimeframe.end_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::end_date", + "description": "End date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.end_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::end_date", + "description": "End date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.timeframe_details", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::timeframe_details", + "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DirectCollection.is_direct", + "location": "D4D_Collection::classes::DirectCollection::attributes::is_direct", + "description": "Whether collection was direct from individuals", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DirectCollection.is_direct", + "location": "D4D_Collection::classes::DirectCollection::attributes::is_direct", + "description": "Whether collection was direct from individuals", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DirectCollection.collection_details", + "location": "D4D_Collection::classes::DirectCollection::attributes::collection_details", + "description": "Details on direct vs. indirect collection methods and sources.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.source_type", + "location": "D4D_Collection::classes::RawDataSource::attributes::source_type", + "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.access_details", + "location": "D4D_Collection::classes::RawDataSource::attributes::access_details", + "description": "Information on how to access or retrieve the raw source data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.raw_data_format", + "location": "D4D_Collection::classes::RawDataSource::attributes::raw_data_format", + "description": "Format of the raw data before any preprocessing.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 3, + "with_examples_pct": 42.857142857142854, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 20, + "with_examples": 5, + "with_examples_pct": 25.0, + "complete_sentences": 14, + "complete_sentences_pct": 70.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Composition", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.counts", + "location": "D4D_Composition::classes::Instance::attributes::counts", + "description": "How many instances are there in total (of each type, if appropriate)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.counts", + "location": "D4D_Composition::classes::Instance::attributes::counts", + "description": "How many instances are there in total (of each type, if appropriate)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.label", + "location": "D4D_Composition::classes::Instance::attributes::label", + "description": "Is there a label or target associated with each instance?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.label", + "location": "D4D_Composition::classes::Instance::attributes::label", + "description": "Is there a label or target associated with each instance?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.label_description", + "location": "D4D_Composition::classes::Instance::attributes::label_description", + "description": "If labeled, what pattern or format do labels follow?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.label_description", + "location": "D4D_Composition::classes::Instance::attributes::label_description", + "description": "If labeled, what pattern or format do labels follow?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.sampling_strategies", + "location": "D4D_Composition::classes::Instance::attributes::sampling_strategies", + "description": "References to one or more SamplingStrategy objects.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.missing_information", + "location": "D4D_Composition::classes::Instance::attributes::missing_information", + "description": "References to one or more MissingInfo objects describing missing data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_sample", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_sample", + "description": "Indicates whether it is a sample of a larger set.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_random", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_random", + "description": "Indicates whether the sample is random.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.source_data", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::source_data", + "description": "Description of the larger set from which the sample was drawn, if any.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_representative", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_representative", + "description": "Indicates whether the sample is representative of the larger set.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.representative_verification", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::representative_verification", + "description": "Explanation of how representativeness was validated or verified.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.why_not_representative", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::why_not_representative", + "description": "Explanation of why the sample is not representative, if applicable.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.strategies", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::strategies", + "description": "Description of the sampling strategy (deterministic, probabilistic, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MissingInfo.missing", + "location": "D4D_Composition::classes::MissingInfo::attributes::missing", + "description": "Description of the missing data fields or elements.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MissingInfo.why_missing", + "location": "D4D_Composition::classes::MissingInfo::attributes::why_missing", + "description": "Explanation of why each piece of data is missing.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Splits.split_details", + "location": "D4D_Composition::classes::Splits::attributes::split_details", + "description": "Details on recommended data splits and their rationale.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataAnomaly.anomaly_details", + "location": "D4D_Composition::classes::DataAnomaly::attributes::anomaly_details", + "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.bias_type", + "location": "D4D_Composition::classes::DatasetBias::attributes::bias_type", + "description": "The type of bias identified, using standardized categories from the Artificial Intelligence Ontology (AIO).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.bias_description", + "location": "D4D_Composition::classes::DatasetBias::attributes::bias_description", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.mitigation_strategy", + "location": "D4D_Composition::classes::DatasetBias::attributes::mitigation_strategy", + "description": "Steps taken or recommended to mitigate this bias.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.affected_subsets", + "location": "D4D_Composition::classes::DatasetBias::attributes::affected_subsets", + "description": "Specific subsets or features of the dataset affected by this bias.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.limitation_description", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::limitation_description", + "description": "Detailed description of the limitation and its implications.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.scope_impact", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::scope_impact", + "description": "How this limitation affects the scope or applicability of the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.recommended_mitigation", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::recommended_mitigation", + "description": "Recommended approaches for users to address this limitation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.future_guarantees", + "location": "D4D_Composition::classes::ExternalResource::attributes::future_guarantees", + "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.archival", + "location": "D4D_Composition::classes::ExternalResource::attributes::archival", + "description": "Indication whether official archival versions of external resources are included.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.restrictions", + "location": "D4D_Composition::classes::ExternalResource::attributes::restrictions", + "description": "Description of any restrictions or fees associated with external resources.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Confidentiality.confidential_elements_present", + "location": "D4D_Composition::classes::Confidentiality::attributes::confidential_elements_present", + "description": "Indicates whether any confidential data elements are present.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Confidentiality.confidentiality_details", + "location": "D4D_Composition::classes::Confidentiality::attributes::confidentiality_details", + "description": "Details on confidential data elements and handling procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ContentWarning.content_warnings_present", + "location": "D4D_Composition::classes::ContentWarning::attributes::content_warnings_present", + "description": "Indicates whether any content warnings are needed.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Subpopulation.subpopulation_elements_present", + "location": "D4D_Composition::classes::Subpopulation::attributes::subpopulation_elements_present", + "description": "Indicates whether any subpopulations are explicitly identified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Subpopulation.distribution", + "location": "D4D_Composition::classes::Subpopulation::attributes::distribution", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.identifiable_elements_present", + "location": "D4D_Composition::classes::Deidentification::attributes::identifiable_elements_present", + "description": "Indicates whether data subjects can be identified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.identifiers_removed", + "location": "D4D_Composition::classes::Deidentification::attributes::identifiers_removed", + "description": "List of identifier types removed during de-identification.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.deidentification_details", + "location": "D4D_Composition::classes::Deidentification::attributes::deidentification_details", + "description": "Details on de-identification procedures and residual risks.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SensitiveElement.sensitive_elements_present", + "location": "D4D_Composition::classes::SensitiveElement::attributes::sensitive_elements_present", + "description": "Indicates whether sensitive data elements are present.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SensitiveElement.sensitivity_details", + "location": "D4D_Composition::classes::SensitiveElement::attributes::sensitivity_details", + "description": "Details on sensitive data elements present and handling procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetRelationship.target_dataset", + "location": "D4D_Composition::classes::DatasetRelationship::attributes::target_dataset", + "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetRelationship.description", + "location": "D4D_Composition::classes::DatasetRelationship::attributes::description", + "description": "Free-text description providing additional context about the relationship.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 15, + "with_examples": 9, + "with_examples_pct": 60.0, + "complete_sentences": 14, + "complete_sentences_pct": 93.33333333333333, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 47, + "with_examples": 9, + "with_examples_pct": 19.148936170212767, + "complete_sentences": 47, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 0, + "complete_sentences_pct": 0.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 14, + "with_examples": 3, + "with_examples_pct": 21.428571428571427, + "complete_sentences": 14, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Data_Governance", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.license_terms", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::license_terms", + "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.data_use_permission", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::data_use_permission", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.contact_person", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::contact_person", + "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IPRestrictions.restrictions", + "location": "D4D_Data_Governance::classes::IPRestrictions::attributes::restrictions", + "description": "Explanation of third-party IP restrictions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.regulatory_restrictions", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::regulatory_restrictions", + "description": "Export or regulatory restrictions on the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.hipaa_compliant", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::hipaa_compliant", + "description": "Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA applies to protected health information in the United States.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.confidentiality_level", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::confidentiality_level", + "description": "Confidentiality classification of the dataset indicating level of access restrictions and sensitivity.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.governance_committee_contact", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::governance_committee_contact", + "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 3, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 9, + "with_examples": 2, + "with_examples_pct": 22.22222222222222, + "complete_sentences": 8, + "complete_sentences_pct": 88.88888888888889, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 3, + "with_examples": 1, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 2, + "complete_sentences_pct": 66.66666666666666, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 30, + "with_examples": 1, + "with_examples_pct": 3.3333333333333335, + "complete_sentences": 10, + "complete_sentences_pct": 33.33333333333333, + "too_brief": 1, + "too_brief_pct": 3.3333333333333335 + } + } + }, + { + "module": "D4D_Distribution", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ThirdPartySharing.is_shared", + "location": "D4D_Distribution::classes::ThirdPartySharing::attributes::is_shared", + "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DistributionFormat.access_urls", + "location": "D4D_Distribution::classes::DistributionFormat::attributes::access_urls", + "description": "Details of the distribution channel(s) or format(s).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DistributionDate.release_dates", + "location": "D4D_Distribution::classes::DistributionDate::attributes::release_dates", + "description": "Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 3, + "with_examples": 2, + "with_examples_pct": 66.66666666666666, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 3, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Ethics", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EthicalReview.contact_person", + "location": "D4D_Ethics::classes::EthicalReview::attributes::contact_person", + "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EthicalReview.review_details", + "location": "D4D_Ethics::classes::EthicalReview::attributes::review_details", + "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataProtectionImpact.impact_details", + "location": "D4D_Ethics::classes::DataProtectionImpact::attributes::impact_details", + "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionNotification.notification_details", + "location": "D4D_Ethics::classes::CollectionNotification::attributes::notification_details", + "description": "Details on how individuals were notified about data collection.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionConsent.consent_details", + "location": "D4D_Ethics::classes::CollectionConsent::attributes::consent_details", + "description": "Details on how consent was requested, provided, and documented.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ConsentRevocation.revocation_details", + "location": "D4D_Ethics::classes::ConsentRevocation::attributes::revocation_details", + "description": "Details on consent revocation mechanisms and procedures.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 5, + "with_examples": 2, + "with_examples_pct": 40.0, + "complete_sentences": 5, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Evaluation_Summary", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_type", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_type", + "description": "Type of rubric used (rubric10 or rubric20)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_type", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_type", + "description": "Type of rubric used (rubric10 or rubric20)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_description", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_description", + "description": "Description of rubric structure and scoring", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_description", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_description", + "description": "Description of rubric structure and scoring", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.total_files_evaluated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::total_files_evaluated", + "description": "Total number of D4D files evaluated", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.total_files_evaluated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::total_files_evaluated", + "description": "Total number of D4D files evaluated", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.concatenated_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::concatenated_file_count", + "description": "Number of concatenated D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.concatenated_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::concatenated_file_count", + "description": "Number of concatenated D4D files", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.individual_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::individual_file_count", + "description": "Number of individual D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.individual_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::individual_file_count", + "description": "Number of individual D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.overall_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::overall_performance", + "description": "Summary statistics of evaluation performance across all D4D files.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.method_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::method_comparison", + "description": "Performance comparison across generation methods", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.method_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::method_comparison", + "description": "Performance comparison across generation methods", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.project_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::project_comparison", + "description": "Performance comparison across Bridge2AI projects", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.project_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::project_comparison", + "description": "Performance comparison across Bridge2AI projects", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.top_performers", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::top_performers", + "description": "List of top performing D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.top_performers", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::top_performers", + "description": "List of top performing D4D files", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.element_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::element_performance", + "description": "Performance by rubric element (rubric10 only)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.element_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::element_performance", + "description": "Performance by rubric element (rubric10 only)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.category_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::category_performance", + "description": "Performance by category (rubric20 only)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.category_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::category_performance", + "description": "Performance by category (rubric20 only)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_weaknesses", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_weaknesses", + "description": "Common weaknesses across all evaluations", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_weaknesses", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_weaknesses", + "description": "Common weaknesses across all evaluations", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_strengths", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_strengths", + "description": "Common strengths across all evaluations", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_strengths", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_strengths", + "description": "Common strengths across all evaluations", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.key_insights", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::key_insights", + "description": "Key analytical insights from evaluation", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.key_insights", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::key_insights", + "description": "Key analytical insights from evaluation", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.input_type_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::input_type_comparison", + "description": "Concatenated vs individual file performance", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.input_type_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::input_type_comparison", + "description": "Concatenated vs individual file performance", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.files_generated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::files_generated", + "description": "Output files generated by evaluation", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.files_generated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::files_generated", + "description": "Output files generated by evaluation", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "OverallPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_score", + "description": "Average score across all files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_score", + "description": "Average score across all files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_percentage", + "description": "Average score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_score", + "description": "Highest score achieved across all evaluated D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_percentage", + "description": "Highest score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_performer", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_performer", + "description": "Identifier of the project, method, or file type that achieved the best score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_score", + "description": "Lowest score achieved across all evaluated D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_percentage", + "description": "Lowest score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_performer", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_performer", + "description": "Identifier of the project, method, or file type that achieved the worst score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::input_type", + "description": "Type of input files used for generation (concatenated or individual source documents).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::file_count", + "description": "Number of D4D files evaluated using this generation method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::average_score", + "description": "Mean score achieved across all files generated with this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::average_percentage", + "description": "Mean score expressed as a percentage of maximum possible for this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.best_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::best_score", + "description": "Highest score achieved by any file using this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.worst_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::worst_score", + "description": "Lowest score achieved by any file using this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::rank", + "description": "Ranking position of this method among all methods (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::file_count", + "description": "Number of D4D files evaluated for this specific Bridge2AI project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::average_score", + "description": "Mean score achieved across all D4D files for this project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::average_percentage", + "description": "Mean score expressed as a percentage of maximum possible for this project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::rank", + "description": "Ranking position of this project among all projects (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.rank", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::rank", + "description": "Position in the top performers list (1 = highest score).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.project", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::project", + "description": "The Bridge2AI project associated with this top-performing datasheet.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.method", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::method", + "description": "The D4D generation method used to create this top-performing file.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.input_type", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::input_type", + "description": "Type of input used for generation (concatenated or individual source documents).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.score", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::score", + "description": "Raw score achieved by this D4D file on the rubric.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.percentage", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::percentage", + "description": "Score expressed as a percentage of the maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.max_score", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.elements_passing", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::elements_passing", + "description": "Number of rubric elements that met or exceeded the passing threshold (rubric10 only).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.file_name", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::file_name", + "description": "Filename of the top-performing D4D datasheet.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.element_id", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::element_id", + "description": "Numeric identifier of the rubric element being evaluated (1-10 for rubric10).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::max_score", + "description": "Maximum possible score for this specific rubric element.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::average_score", + "description": "Mean score achieved for this element across all evaluated files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::average_percentage", + "description": "Mean score for this element expressed as a percentage of maximum possible.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.strength_level", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::strength_level", + "description": "Qualitative assessment of performance level for this element (strongest, strong, weak, weakest).", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ElementPerformance.description", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::description", + "description": "Description of what this element measures", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.description", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::description", + "description": "Description of what this element measures", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.category_id", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::category_id", + "description": "Numeric identifier of the rubric category being evaluated (1-4 for rubric20).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::max_score", + "description": "Maximum possible score for all questions within this category.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::average_score", + "description": "Mean score achieved for this category across all evaluated files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::average_percentage", + "description": "Mean score for this category expressed as a percentage of maximum possible.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::rank", + "description": "Ranking position of this category among all categories (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.question_count", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::question_count", + "description": "Total number of evaluation questions within this category.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.weakness_type", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::weakness_type", + "description": "Classification of the weakness observed across multiple D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.description", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::description", + "description": "Detailed explanation of the weakness pattern and its manifestation.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.frequency", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::frequency", + "description": "How frequently this weakness appears across the evaluated dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.affected_element_or_question", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::affected_element_or_question", + "description": "Specific rubric element or question where this weakness commonly occurs.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.typical_score", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::typical_score", + "description": "Representative score typically achieved in areas affected by this weakness.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.strength_type", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::strength_type", + "description": "Classification of the strength pattern observed across multiple D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.description", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::description", + "description": "Detailed explanation of the strength pattern and its positive impact.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.frequency", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::frequency", + "description": "How frequently this strength appears across the evaluated dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.affected_element_or_question", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::affected_element_or_question", + "description": "Specific rubric element or question where this strength commonly occurs.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.typical_score", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::typical_score", + "description": "Representative score typically achieved in areas demonstrating this strength.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.title", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::title", + "description": "Concise summary title capturing the essence of the insight.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.description", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::description", + "description": "Comprehensive explanation of the insight and its implications.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.supporting_data", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::supporting_data", + "description": "Quantitative or qualitative evidence supporting this analytical insight.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.concatenated_performance", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::concatenated_performance", + "description": "Performance metrics for D4D files generated from concatenated source documents.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.individual_performance", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::individual_performance", + "description": "Performance metrics for D4D files generated from individual source documents.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.synthesis_advantage", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::synthesis_advantage", + "description": "Explanation of the multi-document synthesis advantage demonstrated by concatenated files.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::input_type", + "description": "Type of input (concatenated or individual)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::input_type", + "description": "Type of input (concatenated or individual)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::file_count", + "description": "Number of files of this type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::file_count", + "description": "Number of files of this type", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_score", + "description": "Average score for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_score", + "description": "Average score for this input type", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_percentage", + "description": "Average percentage for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_percentage", + "description": "Average percentage for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.score_range", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::score_range", + "description": "Range of scores observed for this input type (minimum to maximum values).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.best_method", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::best_method", + "description": "Generation method that achieved the highest performance for this input type.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.file_path", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::file_path", + "description": "Filesystem path to the generated evaluation output file.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.file_type", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::file_type", + "description": "File format type of the generated output (CSV, JSON, or Markdown).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.description", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::description", + "description": "Explanation of the file contents and what data it contains.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.row_count", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::row_count", + "description": "Number of data rows or entries in the file (applicable to CSV and JSON formats).", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "RubricTypeEnum", + "location": "D4D_Evaluation_Summary::enums::RubricTypeEnum", + "description": "Types of evaluation rubrics", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "GenerationMethodEnum", + "location": "D4D_Evaluation_Summary::enums::GenerationMethodEnum", + "description": "D4D generation methods", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "Bridge2AIProjectEnum", + "location": "D4D_Evaluation_Summary::enums::Bridge2AIProjectEnum", + "description": "Bridge2AI Grand Challenge projects", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "StrengthLevelEnum", + "location": "D4D_Evaluation_Summary::enums::StrengthLevelEnum", + "description": "Assessment of element/category strength", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "WeaknessTypeEnum", + "location": "D4D_Evaluation_Summary::enums::WeaknessTypeEnum", + "description": "Types of common weaknesses", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "StrengthTypeEnum", + "location": "D4D_Evaluation_Summary::enums::StrengthTypeEnum", + "description": "Types of common strengths", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "FrequencyEnum", + "location": "D4D_Evaluation_Summary::enums::FrequencyEnum", + "description": "Frequency of occurrence", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "InsightTypeEnum", + "location": "D4D_Evaluation_Summary::enums::InsightTypeEnum", + "description": "Types of analytical insights", + "priority": "MEDIUM" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 13, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 4, + "complete_sentences_pct": 30.76923076923077, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 91, + "with_examples": 6, + "with_examples_pct": 6.593406593406594, + "complete_sentences": 81, + "complete_sentences_pct": 89.01098901098901, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 10, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 9, + "complete_sentences_pct": 90.0, + "too_brief": 8, + "too_brief_pct": 80.0 + }, + "enum_value": { + "total": 38, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 26, + "complete_sentences_pct": 68.42105263157895, + "too_brief": 18, + "too_brief_pct": 47.368421052631575 + } + } + }, + { + "module": "D4D_FileCollection", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FileCollection.file_count", + "location": "D4D_FileCollection::classes::FileCollection::attributes::file_count", + "description": "Number of files in this collection.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FileCollection.total_bytes", + "location": "D4D_FileCollection::classes::FileCollection::attributes::total_bytes", + "description": "Total size of all files in bytes.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "FileCollectionTypeEnum.supplementary", + "location": "D4D_FileCollection::enums::FileCollectionTypeEnum::permissible_values::supplementary", + "description": "Supplementary materials", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 2, + "with_examples": 1, + "with_examples_pct": 50.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 4, + "with_examples": 2, + "with_examples_pct": 50.0, + "complete_sentences": 4, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 2, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 19, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 14, + "complete_sentences_pct": 73.68421052631578, + "too_brief": 6, + "too_brief_pct": 31.57894736842105 + } + } + }, + { + "module": "D4D_Human", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.involves_human_subjects", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::involves_human_subjects", + "description": "Does this dataset involve human subjects research?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.involves_human_subjects", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::involves_human_subjects", + "description": "Does this dataset involve human subjects research?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.irb_approval", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::irb_approval", + "description": "Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if applicable.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.ethics_review_board", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::ethics_review_board", + "description": "What ethics review board(s) reviewed this research? Include institution names and approval details.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.special_populations", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::special_populations", + "description": "Does the research involve any special populations that require additional protections (e.g., minors, pregnant women, prisoners)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.regulatory_compliance", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::regulatory_compliance", + "description": "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_obtained", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_obtained", + "description": "Was informed consent obtained from all participants?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_obtained", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_obtained", + "description": "Was informed consent obtained from all participants?", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_type", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_type", + "description": "What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_documentation", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_documentation", + "description": "How is consent documented? Include references to consent forms or procedures used.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.withdrawal_mechanism", + "location": "D4D_Human::classes::InformedConsent::attributes::withdrawal_mechanism", + "description": "How can participants withdraw their consent? What procedures are in place for data deletion upon withdrawal?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.withdrawal_mechanism", + "location": "D4D_Human::classes::InformedConsent::attributes::withdrawal_mechanism", + "description": "How can participants withdraw their consent? What procedures are in place for data deletion upon withdrawal?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_scope", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_scope", + "description": "What specific uses did participants consent to? Are there limitations on data use based on consent?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_scope", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_scope", + "description": "What specific uses did participants consent to? Are there limitations on data use based on consent?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.anonymization_method", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::anonymization_method", + "description": "What methods were used to anonymize or de-identify participant data? Include technical details of privacy-preserving techniques.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.reidentification_risk", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::reidentification_risk", + "description": "What is the assessed risk of re-identification? What measures were taken to minimize this risk?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.reidentification_risk", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::reidentification_risk", + "description": "What is the assessed risk of re-identification? What measures were taken to minimize this risk?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.privacy_techniques", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::privacy_techniques", + "description": "What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data masking)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.data_linkage", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::data_linkage", + "description": "Can this dataset be linked to other datasets in ways that might compromise participant privacy?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.data_linkage", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::data_linkage", + "description": "Can this dataset be linked to other datasets in ways that might compromise participant privacy?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_provided", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_provided", + "description": "Were participants compensated for their participation?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_provided", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_provided", + "description": "Were participants compensated for their participation?", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_type", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_type", + "description": "What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other incentives)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_amount", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_amount", + "description": "What was the amount or value of compensation provided? Include currency or equivalent value.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_rationale", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_rationale", + "description": "What was the rationale for the compensation structure? How was the amount determined to be appropriate?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_rationale", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_rationale", + "description": "What was the rationale for the compensation structure? How was the amount determined to be appropriate?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.at_risk_groups_included", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::at_risk_groups_included", + "description": "Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaired individuals)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.special_protections", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::special_protections", + "description": "What additional protections were implemented for at-risk populations? Include safeguards, modified procedures, or additional oversight.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.assent_procedures", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::assent_procedures", + "description": "For research involving minors, what assent procedures were used? How was developmentally appropriate assent obtained?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.assent_procedures", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::assent_procedures", + "description": "For research involving minors, what assent procedures were used? How was developmentally appropriate assent obtained?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.guardian_consent", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::guardian_consent", + "description": "For participants unable to provide their own consent, how was guardian or surrogate consent obtained?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.guardian_consent", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::guardian_consent", + "description": "For participants unable to provide their own consent, how was guardian or surrogate consent obtained?\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 5, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 5, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 22, + "with_examples": 6, + "with_examples_pct": 27.27272727272727, + "complete_sentences": 22, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Maintenance", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Maintainer.maintainer_details", + "location": "D4D_Maintenance::classes::Maintainer::attributes::maintainer_details", + "description": "Details on who will support, host, or maintain the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Erratum.erratum_url", + "location": "D4D_Maintenance::classes::Erratum::attributes::erratum_url", + "description": "URL or access point for the erratum.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Erratum.erratum_details", + "location": "D4D_Maintenance::classes::Erratum::attributes::erratum_details", + "description": "Details on any errata or corrections to the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UpdatePlan.update_details", + "location": "D4D_Maintenance::classes::UpdatePlan::attributes::update_details", + "description": "Details on update plans, responsible parties, and communication methods.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RetentionLimits.retention_period", + "location": "D4D_Maintenance::classes::RetentionLimits::attributes::retention_period", + "description": "Time period for data retention.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RetentionLimits.retention_details", + "location": "D4D_Maintenance::classes::RetentionLimits::attributes::retention_details", + "description": "Details on data retention limits and enforcement procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.latest_version_doi", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::latest_version_doi", + "description": "DOI or URL of the latest dataset version.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.versions_available", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::versions_available", + "description": "List of available versions with metadata.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.version_details", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::version_details", + "description": "Details on version support policies and obsolescence communication.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExtensionMechanism.contribution_url", + "location": "D4D_Maintenance::classes::ExtensionMechanism::attributes::contribution_url", + "description": "URL for contribution guidelines or process.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExtensionMechanism.extension_details", + "location": "D4D_Maintenance::classes::ExtensionMechanism::attributes::extension_details", + "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 6, + "with_examples": 2, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 6, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 13, + "with_examples": 2, + "with_examples_pct": 15.384615384615385, + "complete_sentences": 13, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Metadata", + "issues": [], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Minimal", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MinimalDatasetCollection.resources", + "location": "D4D_Minimal::classes::MinimalDatasetCollection::attributes::resources", + "description": "The datasets in this collection.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 2, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Motivation", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Purpose.response", + "location": "D4D_Motivation::classes::Purpose::attributes::response", + "description": "Short explanation describing the primary purpose of creating the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Task.response", + "location": "D4D_Motivation::classes::Task::attributes::response", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AddressingGap.response", + "location": "D4D_Motivation::classes::AddressingGap::attributes::response", + "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Creator.principal_investigator", + "location": "D4D_Motivation::classes::Creator::attributes::principal_investigator", + "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Creator.affiliations", + "location": "D4D_Motivation::classes::Creator::attributes::affiliations", + "description": "Organizations with which the creator or team is affiliated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FundingMechanism.grantor", + "location": "D4D_Motivation::classes::FundingMechanism::attributes::grantor", + "description": "Name/identifier of the organization providing monetary or resource support.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FundingMechanism.grants", + "location": "D4D_Motivation::classes::FundingMechanism::attributes::grants", + "description": "Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Grant.grant_number", + "location": "D4D_Motivation::classes::Grant::attributes::grant_number", + "description": "The alphanumeric identifier for the grant.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 9, + "with_examples": 1, + "with_examples_pct": 11.11111111111111, + "complete_sentences": 9, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Preprocessing", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "PreprocessingStrategy.preprocessing_details", + "location": "D4D_Preprocessing::classes::PreprocessingStrategy::attributes::preprocessing_details", + "description": "Details on preprocessing steps applied to the data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CleaningStrategy.cleaning_details", + "location": "D4D_Preprocessing::classes::CleaningStrategy::attributes::cleaning_details", + "description": "Details on data cleaning procedures applied.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.data_annotation_protocol", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::data_annotation_protocol", + "description": "Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guidelines, quality control procedures, and task definitions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.annotations_per_item", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::annotations_per_item", + "description": "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.labeling_details", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::labeling_details", + "description": "Details on labeling/annotation procedures and quality metrics.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawData.access_url", + "location": "D4D_Preprocessing::classes::RawData::attributes::access_url", + "description": "URL or access point for the raw data.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawData.raw_data_details", + "location": "D4D_Preprocessing::classes::RawData::attributes::raw_data_details", + "description": "Details on raw data availability and access procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_method", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_method", + "description": "Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, model-based imputation, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputed_fields", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputed_fields", + "description": "Fields or columns where imputation was applied.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_rationale", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_rationale", + "description": "Justification for the imputation approach chosen, including assumptions made about missing data mechanisms.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_validation", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_validation", + "description": "Methods used to validate imputation quality (if any).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.agreement_metric", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::agreement_metric", + "description": "Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreement, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.analysis_method", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::analysis_method", + "description": "Methodology used to assess annotation quality and resolve disagreements.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.annotation_quality_details", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::annotation_quality_details", + "description": "Additional details on annotation quality assessment and findings.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MachineAnnotationTools.tool_descriptions", + "location": "D4D_Preprocessing::classes::MachineAnnotationTools::attributes::tool_descriptions", + "description": "Descriptions of what each tool does in the annotation process and what types of annotations it produces. Should correspond to the tools list.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 3, + "with_examples_pct": 42.857142857142854, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 22, + "with_examples": 7, + "with_examples_pct": 31.818181818181817, + "complete_sentences": 22, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Uses", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExistingUse.examples", + "location": "D4D_Uses::classes::ExistingUse::attributes::examples", + "description": "List of examples of known/previous uses of the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UseRepository.repository_url", + "location": "D4D_Uses::classes::UseRepository::attributes::repository_url", + "description": "URL to a repository of known dataset uses.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UseRepository.repository_details", + "location": "D4D_Uses::classes::UseRepository::attributes::repository_details", + "description": "Details on the repository of known dataset uses.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OtherTask.task_details", + "location": "D4D_Uses::classes::OtherTask::attributes::task_details", + "description": "Details on other potential tasks the dataset could be used for.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FutureUseImpact.impact_details", + "location": "D4D_Uses::classes::FutureUseImpact::attributes::impact_details", + "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DiscouragedUse.discouragement_details", + "location": "D4D_Uses::classes::DiscouragedUse::attributes::discouragement_details", + "description": "Details on tasks for which the dataset should not be used.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IntendedUse.examples", + "location": "D4D_Uses::classes::IntendedUse::attributes::examples", + "description": "List of example intended uses for this dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IntendedUse.usage_notes", + "location": "D4D_Uses::classes::IntendedUse::attributes::usage_notes", + "description": "Notes or caveats about using the dataset for intended purposes.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 10, + "with_examples": 2, + "with_examples_pct": 20.0, + "complete_sentences": 10, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Variables", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.variable_name", + "location": "D4D_Variables::classes::VariableMetadata::attributes::variable_name", + "description": "The name or identifier of the variable as it appears in the data files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.unit", + "location": "D4D_Variables::classes::VariableMetadata::attributes::unit", + "description": "The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/). Examples: qudt:Kilogram, qudt:Meter, qudt:DegreeCelsius.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.missing_value_code", + "location": "D4D_Variables::classes::VariableMetadata::attributes::missing_value_code", + "description": "Code(s) used to represent missing values for this variable. Examples: \"NA\", \"-999\", \"null\", \"\". Multiple codes may be specified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.minimum_value", + "location": "D4D_Variables::classes::VariableMetadata::attributes::minimum_value", + "description": "The minimum value that the variable can take. Applicable to numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.maximum_value", + "location": "D4D_Variables::classes::VariableMetadata::attributes::maximum_value", + "description": "The maximum value that the variable can take. Applicable to numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.categories", + "location": "D4D_Variables::classes::VariableMetadata::attributes::categories", + "description": "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.examples", + "location": "D4D_Variables::classes::VariableMetadata::attributes::examples", + "description": "Example values for this variable to illustrate typical data.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.is_identifier", + "location": "D4D_Variables::classes::VariableMetadata::attributes::is_identifier", + "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.precision", + "location": "D4D_Variables::classes::VariableMetadata::attributes::precision", + "description": "The precision or number of decimal places for numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.measurement_technique", + "location": "D4D_Variables::classes::VariableMetadata::attributes::measurement_technique", + "description": "The technique or method used to measure this variable. Examples: \"mass spectrometry\", \"self-report survey\", \"GPS coordinates\".", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.derivation", + "location": "D4D_Variables::classes::VariableMetadata::attributes::derivation", + "description": "Description of how this variable was derived or calculated from other variables, if applicable.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.quality_notes", + "location": "D4D_Variables::classes::VariableMetadata::attributes::quality_notes", + "description": "Notes about data quality, reliability, or known issues specific to this variable.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.integer", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::integer", + "description": "Whole numbers.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.string", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::string", + "description": "Text strings.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.boolean", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::boolean", + "description": "True/false values.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 14, + "with_examples": 2, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 14, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 0, + "complete_sentences_pct": 0.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 13, + "with_examples": 1, + "with_examples_pct": 7.6923076923076925, + "complete_sentences": 13, + "complete_sentences_pct": 100.0, + "too_brief": 8, + "too_brief_pct": 61.53846153846154 + } + } + } + ] +} \ No newline at end of file diff --git a/scripts/quality_before.json b/scripts/quality_before.json new file mode 100644 index 00000000..1d1a9525 --- /dev/null +++ b/scripts/quality_before.json @@ -0,0 +1,3130 @@ +{ + "summary": { + "total_issues": 307, + "by_type": { + "CONSIDER_EXAMPLE": 239, + "MISSING_PERIOD": 50, + "TOO_BRIEF": 18 + }, + "by_priority": { + "LOW": 299, + "MEDIUM": 8 + } + }, + "modules": [ + { + "module": "D4D_Base_import", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.id", + "location": "D4D_Base_import::classes::NamedThing::attributes::id", + "description": "A unique identifier for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.name", + "location": "D4D_Base_import::classes::NamedThing::attributes::name", + "description": "A human-readable name for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "NamedThing.description", + "location": "D4D_Base_import::classes::NamedThing::attributes::description", + "description": "A human-readable description for a thing.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.id", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::id", + "description": "An optional identifier for this property.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.name", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::name", + "description": "A human-readable name for this property.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.description", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::description", + "description": "A human-readable description for this property.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DatasetProperty.used_software", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::used_software", + "description": "What software was used as part of this dataset property?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetProperty.used_software", + "location": "D4D_Base_import::classes::DatasetProperty::attributes::used_software", + "description": "What software was used as part of this dataset property?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.affiliation", + "location": "D4D_Base_import::classes::Person::attributes::affiliation", + "description": "The organization(s) to which the person belongs in the context of this dataset. May vary across datasets; multivalued to support multiple affiliations.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.email", + "location": "D4D_Base_import::classes::Person::attributes::email", + "description": "The email address of the person. Represents current/preferred contact information in the context of this dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Person.orcid", + "location": "D4D_Base_import::classes::Person::attributes::orcid", + "description": "ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FormatDialect.double_quote", + "location": "D4D_Base_import::classes::FormatDialect::attributes::double_quote", + "description": "Whether quotes within quoted fields are escaped by doubling them.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FormatDialect.header", + "location": "D4D_Base_import::classes::FormatDialect::attributes::header", + "description": "Whether the first row contains column headers (true/false).", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-11", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-11", + "description": "Latin/Thai encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-5", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-5", + "description": "Latin/Cyrillic encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-6", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-6", + "description": "Latin/Arabic encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-7", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-7", + "description": "Latin/Greek encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-8", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-8", + "description": "Latin/Hebrew encoding", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "EncodingEnum.ISO-8859-9", + "location": "D4D_Base_import::enums::EncodingEnum::permissible_values::ISO-8859-9", + "description": "Latin-5 (Turkish)", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 5, + "complete_sentences_pct": 71.42857142857143, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 18, + "with_examples": 6, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 18, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "slot": { + "total": 31, + "with_examples": 5, + "with_examples_pct": 16.129032258064516, + "complete_sentences": 31, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 10, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 7, + "complete_sentences_pct": 70.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 134, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 66, + "complete_sentences_pct": 49.25373134328358, + "too_brief": 24, + "too_brief_pct": 17.91044776119403 + } + } + }, + { + "module": "D4D_Collection", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_directly_observed", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_directly_observed", + "description": "Whether the data was directly observed", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_directly_observed", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_directly_observed", + "description": "Whether the data was directly observed", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_reported_by_subjects", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_reported_by_subjects", + "description": "Whether the data was reported directly by the subjects themselves", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_reported_by_subjects", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_reported_by_subjects", + "description": "Whether the data was reported directly by the subjects themselves", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_inferred_derived", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_inferred_derived", + "description": "Whether the data was inferred or derived from other data", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_inferred_derived", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_inferred_derived", + "description": "Whether the data was inferred or derived from other data", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_validated_verified", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_validated_verified", + "description": "Whether the data was validated or verified in any way", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.was_validated_verified", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::was_validated_verified", + "description": "Whether the data was validated or verified in any way", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InstanceAcquisition.acquisition_details", + "location": "D4D_Collection::classes::InstanceAcquisition::attributes::acquisition_details", + "description": "Details on how data was acquired for each instance.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionMechanism.mechanism_details", + "location": "D4D_Collection::classes::CollectionMechanism::attributes::mechanism_details", + "description": "Details on mechanisms or procedures used to collect the data.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DataCollector.role", + "location": "D4D_Collection::classes::DataCollector::attributes::role", + "description": "Role of the data collector (e.g., researcher, crowdworker)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataCollector.collector_details", + "location": "D4D_Collection::classes::DataCollector::attributes::collector_details", + "description": "Details on who collected the data and their compensation.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "CollectionTimeframe.start_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::start_date", + "description": "Start date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.start_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::start_date", + "description": "Start date of data collection", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "CollectionTimeframe.end_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::end_date", + "description": "End date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.end_date", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::end_date", + "description": "End date of data collection", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionTimeframe.timeframe_details", + "location": "D4D_Collection::classes::CollectionTimeframe::attributes::timeframe_details", + "description": "Details on the collection timeframe and relationship to data creation dates.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "DirectCollection.is_direct", + "location": "D4D_Collection::classes::DirectCollection::attributes::is_direct", + "description": "Whether collection was direct from individuals", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DirectCollection.is_direct", + "location": "D4D_Collection::classes::DirectCollection::attributes::is_direct", + "description": "Whether collection was direct from individuals", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DirectCollection.collection_details", + "location": "D4D_Collection::classes::DirectCollection::attributes::collection_details", + "description": "Details on direct vs. indirect collection methods and sources.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.source_type", + "location": "D4D_Collection::classes::RawDataSource::attributes::source_type", + "description": "Type of raw source (sensor, database, user input, web scraping, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.access_details", + "location": "D4D_Collection::classes::RawDataSource::attributes::access_details", + "description": "Information on how to access or retrieve the raw source data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawDataSource.raw_data_format", + "location": "D4D_Collection::classes::RawDataSource::attributes::raw_data_format", + "description": "Format of the raw data before any preprocessing.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 3, + "with_examples_pct": 42.857142857142854, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 20, + "with_examples": 5, + "with_examples_pct": 25.0, + "complete_sentences": 14, + "complete_sentences_pct": 70.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Composition", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.counts", + "location": "D4D_Composition::classes::Instance::attributes::counts", + "description": "How many instances are there in total (of each type, if appropriate)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.counts", + "location": "D4D_Composition::classes::Instance::attributes::counts", + "description": "How many instances are there in total (of each type, if appropriate)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.label", + "location": "D4D_Composition::classes::Instance::attributes::label", + "description": "Is there a label or target associated with each instance?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.label", + "location": "D4D_Composition::classes::Instance::attributes::label", + "description": "Is there a label or target associated with each instance?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "Instance.label_description", + "location": "D4D_Composition::classes::Instance::attributes::label_description", + "description": "If labeled, what pattern or format do labels follow?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.label_description", + "location": "D4D_Composition::classes::Instance::attributes::label_description", + "description": "If labeled, what pattern or format do labels follow?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.sampling_strategies", + "location": "D4D_Composition::classes::Instance::attributes::sampling_strategies", + "description": "References to one or more SamplingStrategy objects.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Instance.missing_information", + "location": "D4D_Composition::classes::Instance::attributes::missing_information", + "description": "References to one or more MissingInfo objects describing missing data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_sample", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_sample", + "description": "Indicates whether it is a sample of a larger set.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_random", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_random", + "description": "Indicates whether the sample is random.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.source_data", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::source_data", + "description": "Description of the larger set from which the sample was drawn, if any.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.is_representative", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::is_representative", + "description": "Indicates whether the sample is representative of the larger set.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.representative_verification", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::representative_verification", + "description": "Explanation of how representativeness was validated or verified.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.why_not_representative", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::why_not_representative", + "description": "Explanation of why the sample is not representative, if applicable.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SamplingStrategy.strategies", + "location": "D4D_Composition::classes::SamplingStrategy::attributes::strategies", + "description": "Description of the sampling strategy (deterministic, probabilistic, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MissingInfo.missing", + "location": "D4D_Composition::classes::MissingInfo::attributes::missing", + "description": "Description of the missing data fields or elements.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MissingInfo.why_missing", + "location": "D4D_Composition::classes::MissingInfo::attributes::why_missing", + "description": "Explanation of why each piece of data is missing.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Splits.split_details", + "location": "D4D_Composition::classes::Splits::attributes::split_details", + "description": "Details on recommended data splits and their rationale.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataAnomaly.anomaly_details", + "location": "D4D_Composition::classes::DataAnomaly::attributes::anomaly_details", + "description": "Details on errors, noise sources, or redundancies in the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.bias_type", + "location": "D4D_Composition::classes::DatasetBias::attributes::bias_type", + "description": "The type of bias identified, using standardized categories from the Artificial Intelligence Ontology (AIO).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.bias_description", + "location": "D4D_Composition::classes::DatasetBias::attributes::bias_description", + "description": "Detailed description of how this bias manifests in the dataset, including affected populations, features, or outcomes.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.mitigation_strategy", + "location": "D4D_Composition::classes::DatasetBias::attributes::mitigation_strategy", + "description": "Steps taken or recommended to mitigate this bias.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetBias.affected_subsets", + "location": "D4D_Composition::classes::DatasetBias::attributes::affected_subsets", + "description": "Specific subsets or features of the dataset affected by this bias.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.limitation_description", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::limitation_description", + "description": "Detailed description of the limitation and its implications.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.scope_impact", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::scope_impact", + "description": "How this limitation affects the scope or applicability of the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetLimitation.recommended_mitigation", + "location": "D4D_Composition::classes::DatasetLimitation::attributes::recommended_mitigation", + "description": "Recommended approaches for users to address this limitation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.future_guarantees", + "location": "D4D_Composition::classes::ExternalResource::attributes::future_guarantees", + "description": "Explanation of any commitments that external resources will remain available and stable over time.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.archival", + "location": "D4D_Composition::classes::ExternalResource::attributes::archival", + "description": "Indication whether official archival versions of external resources are included.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExternalResource.restrictions", + "location": "D4D_Composition::classes::ExternalResource::attributes::restrictions", + "description": "Description of any restrictions or fees associated with external resources.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Confidentiality.confidential_elements_present", + "location": "D4D_Composition::classes::Confidentiality::attributes::confidential_elements_present", + "description": "Indicates whether any confidential data elements are present.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Confidentiality.confidentiality_details", + "location": "D4D_Composition::classes::Confidentiality::attributes::confidentiality_details", + "description": "Details on confidential data elements and handling procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ContentWarning.content_warnings_present", + "location": "D4D_Composition::classes::ContentWarning::attributes::content_warnings_present", + "description": "Indicates whether any content warnings are needed.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Subpopulation.subpopulation_elements_present", + "location": "D4D_Composition::classes::Subpopulation::attributes::subpopulation_elements_present", + "description": "Indicates whether any subpopulations are explicitly identified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Subpopulation.distribution", + "location": "D4D_Composition::classes::Subpopulation::attributes::distribution", + "description": "The distribution of instances across identified subpopulations, including counts, percentages, or proportions for each subgroup.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.identifiable_elements_present", + "location": "D4D_Composition::classes::Deidentification::attributes::identifiable_elements_present", + "description": "Indicates whether data subjects can be identified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.identifiers_removed", + "location": "D4D_Composition::classes::Deidentification::attributes::identifiers_removed", + "description": "List of identifier types removed during de-identification.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Deidentification.deidentification_details", + "location": "D4D_Composition::classes::Deidentification::attributes::deidentification_details", + "description": "Details on de-identification procedures and residual risks.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SensitiveElement.sensitive_elements_present", + "location": "D4D_Composition::classes::SensitiveElement::attributes::sensitive_elements_present", + "description": "Indicates whether sensitive data elements are present.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "SensitiveElement.sensitivity_details", + "location": "D4D_Composition::classes::SensitiveElement::attributes::sensitivity_details", + "description": "Details on sensitive data elements present and handling procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetRelationship.target_dataset", + "location": "D4D_Composition::classes::DatasetRelationship::attributes::target_dataset", + "description": "The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DatasetRelationship.description", + "location": "D4D_Composition::classes::DatasetRelationship::attributes::description", + "description": "Free-text description providing additional context about the relationship.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 15, + "with_examples": 9, + "with_examples_pct": 60.0, + "complete_sentences": 14, + "complete_sentences_pct": 93.33333333333333, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 47, + "with_examples": 9, + "with_examples_pct": 19.148936170212767, + "complete_sentences": 47, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 0, + "complete_sentences_pct": 0.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 14, + "with_examples": 3, + "with_examples_pct": 21.428571428571427, + "complete_sentences": 14, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Data_Governance", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.license_terms", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::license_terms", + "description": "Description of the dataset's license and terms of use (including links, costs, or usage constraints).\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.data_use_permission", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::data_use_permission", + "description": "Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LicenseAndUseTerms.contact_person", + "location": "D4D_Data_Governance::classes::LicenseAndUseTerms::attributes::contact_person", + "description": "Contact person for licensing questions. Provides structured contact information including name, email, affiliation, and optional ORCID. This person can answer questions about licensing terms, usage restrictions, fees, and permissions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IPRestrictions.restrictions", + "location": "D4D_Data_Governance::classes::IPRestrictions::attributes::restrictions", + "description": "Explanation of third-party IP restrictions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.regulatory_restrictions", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::regulatory_restrictions", + "description": "Export or regulatory restrictions on the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.hipaa_compliant", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::hipaa_compliant", + "description": "Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA applies to protected health information in the United States.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.confidentiality_level", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::confidentiality_level", + "description": "Confidentiality classification of the dataset indicating level of access restrictions and sensitivity.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExportControlRegulatoryRestrictions.governance_committee_contact", + "location": "D4D_Data_Governance::classes::ExportControlRegulatoryRestrictions::attributes::governance_committee_contact", + "description": "Contact person for data governance committee. This person can answer questions about data governance policies, access procedures, and oversight mechanisms.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 3, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 9, + "with_examples": 2, + "with_examples_pct": 22.22222222222222, + "complete_sentences": 8, + "complete_sentences_pct": 88.88888888888889, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 3, + "with_examples": 1, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 2, + "complete_sentences_pct": 66.66666666666666, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 30, + "with_examples": 1, + "with_examples_pct": 3.3333333333333335, + "complete_sentences": 10, + "complete_sentences_pct": 33.33333333333333, + "too_brief": 1, + "too_brief_pct": 3.3333333333333335 + } + } + }, + { + "module": "D4D_Distribution", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ThirdPartySharing.is_shared", + "location": "D4D_Distribution::classes::ThirdPartySharing::attributes::is_shared", + "description": "Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DistributionFormat.access_urls", + "location": "D4D_Distribution::classes::DistributionFormat::attributes::access_urls", + "description": "Details of the distribution channel(s) or format(s).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DistributionDate.release_dates", + "location": "D4D_Distribution::classes::DistributionDate::attributes::release_dates", + "description": "Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled releases.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 3, + "with_examples": 2, + "with_examples_pct": 66.66666666666666, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 3, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 3, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Ethics", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EthicalReview.contact_person", + "location": "D4D_Ethics::classes::EthicalReview::attributes::contact_person", + "description": "Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EthicalReview.review_details", + "location": "D4D_Ethics::classes::EthicalReview::attributes::review_details", + "description": "Details on ethical review processes, outcomes, and supporting documentation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DataProtectionImpact.impact_details", + "location": "D4D_Ethics::classes::DataProtectionImpact::attributes::impact_details", + "description": "Details on data protection impact analysis, outcomes, and documentation.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionNotification.notification_details", + "location": "D4D_Ethics::classes::CollectionNotification::attributes::notification_details", + "description": "Details on how individuals were notified about data collection.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CollectionConsent.consent_details", + "location": "D4D_Ethics::classes::CollectionConsent::attributes::consent_details", + "description": "Details on how consent was requested, provided, and documented.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ConsentRevocation.revocation_details", + "location": "D4D_Ethics::classes::ConsentRevocation::attributes::revocation_details", + "description": "Details on consent revocation mechanisms and procedures.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 5, + "with_examples": 2, + "with_examples_pct": 40.0, + "complete_sentences": 5, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Evaluation_Summary", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_type", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_type", + "description": "Type of rubric used (rubric10 or rubric20)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_type", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_type", + "description": "Type of rubric used (rubric10 or rubric20)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_description", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_description", + "description": "Description of rubric structure and scoring", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.rubric_description", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::rubric_description", + "description": "Description of rubric structure and scoring", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.total_files_evaluated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::total_files_evaluated", + "description": "Total number of D4D files evaluated", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.total_files_evaluated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::total_files_evaluated", + "description": "Total number of D4D files evaluated", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.concatenated_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::concatenated_file_count", + "description": "Number of concatenated D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.concatenated_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::concatenated_file_count", + "description": "Number of concatenated D4D files", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.individual_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::individual_file_count", + "description": "Number of individual D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.individual_file_count", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::individual_file_count", + "description": "Number of individual D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.overall_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::overall_performance", + "description": "Summary statistics of evaluation performance across all D4D files.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.method_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::method_comparison", + "description": "Performance comparison across generation methods", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.method_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::method_comparison", + "description": "Performance comparison across generation methods", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.project_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::project_comparison", + "description": "Performance comparison across Bridge2AI projects", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.project_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::project_comparison", + "description": "Performance comparison across Bridge2AI projects", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.top_performers", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::top_performers", + "description": "List of top performing D4D files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.top_performers", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::top_performers", + "description": "List of top performing D4D files", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.element_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::element_performance", + "description": "Performance by rubric element (rubric10 only)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.element_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::element_performance", + "description": "Performance by rubric element (rubric10 only)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.category_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::category_performance", + "description": "Performance by category (rubric20 only)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.category_performance", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::category_performance", + "description": "Performance by category (rubric20 only)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_weaknesses", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_weaknesses", + "description": "Common weaknesses across all evaluations", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_weaknesses", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_weaknesses", + "description": "Common weaknesses across all evaluations", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_strengths", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_strengths", + "description": "Common strengths across all evaluations", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.common_strengths", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::common_strengths", + "description": "Common strengths across all evaluations", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.key_insights", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::key_insights", + "description": "Key analytical insights from evaluation", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.key_insights", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::key_insights", + "description": "Key analytical insights from evaluation", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.input_type_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::input_type_comparison", + "description": "Concatenated vs individual file performance", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.input_type_comparison", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::input_type_comparison", + "description": "Concatenated vs individual file performance", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "EvaluationSummary.files_generated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::files_generated", + "description": "Output files generated by evaluation", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "EvaluationSummary.files_generated", + "location": "D4D_Evaluation_Summary::classes::EvaluationSummary::attributes::files_generated", + "description": "Output files generated by evaluation", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "OverallPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_score", + "description": "Average score across all files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_score", + "description": "Average score across all files", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::average_percentage", + "description": "Average score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_score", + "description": "Highest score achieved across all evaluated D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_percentage", + "description": "Highest score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.best_performer", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::best_performer", + "description": "Identifier of the project, method, or file type that achieved the best score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_score", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_score", + "description": "Lowest score achieved across all evaluated D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_percentage", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_percentage", + "description": "Lowest score expressed as a percentage of maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OverallPerformance.worst_performer", + "location": "D4D_Evaluation_Summary::classes::OverallPerformance::attributes::worst_performer", + "description": "Identifier of the project, method, or file type that achieved the worst score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::input_type", + "description": "Type of input files used for generation (concatenated or individual source documents).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::file_count", + "description": "Number of D4D files evaluated using this generation method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::average_score", + "description": "Mean score achieved across all files generated with this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::average_percentage", + "description": "Mean score expressed as a percentage of maximum possible for this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.best_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::best_score", + "description": "Highest score achieved by any file using this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.worst_score", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::worst_score", + "description": "Lowest score achieved by any file using this method.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MethodPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::MethodPerformance::attributes::rank", + "description": "Ranking position of this method among all methods (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::file_count", + "description": "Number of D4D files evaluated for this specific Bridge2AI project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::average_score", + "description": "Mean score achieved across all D4D files for this project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::average_percentage", + "description": "Mean score expressed as a percentage of maximum possible for this project.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ProjectPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::ProjectPerformance::attributes::rank", + "description": "Ranking position of this project among all projects (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.rank", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::rank", + "description": "Position in the top performers list (1 = highest score).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.project", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::project", + "description": "The Bridge2AI project associated with this top-performing datasheet.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.method", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::method", + "description": "The D4D generation method used to create this top-performing file.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.input_type", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::input_type", + "description": "Type of input used for generation (concatenated or individual source documents).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.score", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::score", + "description": "Raw score achieved by this D4D file on the rubric.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.percentage", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::percentage", + "description": "Score expressed as a percentage of the maximum possible score.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.max_score", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::max_score", + "description": "Maximum possible score for the rubric being evaluated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.elements_passing", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::elements_passing", + "description": "Number of rubric elements that met or exceeded the passing threshold (rubric10 only).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "TopPerformer.file_name", + "location": "D4D_Evaluation_Summary::classes::TopPerformer::attributes::file_name", + "description": "Filename of the top-performing D4D datasheet.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.element_id", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::element_id", + "description": "Numeric identifier of the rubric element being evaluated (1-10 for rubric10).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::max_score", + "description": "Maximum possible score for this specific rubric element.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::average_score", + "description": "Mean score achieved for this element across all evaluated files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::average_percentage", + "description": "Mean score for this element expressed as a percentage of maximum possible.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.strength_level", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::strength_level", + "description": "Qualitative assessment of performance level for this element (strongest, strong, weak, weakest).", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ElementPerformance.description", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::description", + "description": "Description of what this element measures", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ElementPerformance.description", + "location": "D4D_Evaluation_Summary::classes::ElementPerformance::attributes::description", + "description": "Description of what this element measures", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.category_id", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::category_id", + "description": "Numeric identifier of the rubric category being evaluated (1-4 for rubric20).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.max_score", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::max_score", + "description": "Maximum possible score for all questions within this category.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::average_score", + "description": "Mean score achieved for this category across all evaluated files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::average_percentage", + "description": "Mean score for this category expressed as a percentage of maximum possible.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.rank", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::rank", + "description": "Ranking position of this category among all categories (1 = best performing).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CategoryPerformance.question_count", + "location": "D4D_Evaluation_Summary::classes::CategoryPerformance::attributes::question_count", + "description": "Total number of evaluation questions within this category.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.weakness_type", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::weakness_type", + "description": "Classification of the weakness observed across multiple D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.description", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::description", + "description": "Detailed explanation of the weakness pattern and its manifestation.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.frequency", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::frequency", + "description": "How frequently this weakness appears across the evaluated dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.affected_element_or_question", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::affected_element_or_question", + "description": "Specific rubric element or question where this weakness commonly occurs.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonWeakness.typical_score", + "location": "D4D_Evaluation_Summary::classes::CommonWeakness::attributes::typical_score", + "description": "Representative score typically achieved in areas affected by this weakness.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.strength_type", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::strength_type", + "description": "Classification of the strength pattern observed across multiple D4D files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.description", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::description", + "description": "Detailed explanation of the strength pattern and its positive impact.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.frequency", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::frequency", + "description": "How frequently this strength appears across the evaluated dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.affected_element_or_question", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::affected_element_or_question", + "description": "Specific rubric element or question where this strength commonly occurs.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CommonStrength.typical_score", + "location": "D4D_Evaluation_Summary::classes::CommonStrength::attributes::typical_score", + "description": "Representative score typically achieved in areas demonstrating this strength.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.title", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::title", + "description": "Concise summary title capturing the essence of the insight.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.description", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::description", + "description": "Comprehensive explanation of the insight and its implications.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "KeyInsight.supporting_data", + "location": "D4D_Evaluation_Summary::classes::KeyInsight::attributes::supporting_data", + "description": "Quantitative or qualitative evidence supporting this analytical insight.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.concatenated_performance", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::concatenated_performance", + "description": "Performance metrics for D4D files generated from concatenated source documents.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.individual_performance", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::individual_performance", + "description": "Performance metrics for D4D files generated from individual source documents.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypeComparison.synthesis_advantage", + "location": "D4D_Evaluation_Summary::classes::InputTypeComparison::attributes::synthesis_advantage", + "description": "Explanation of the multi-document synthesis advantage demonstrated by concatenated files.", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::input_type", + "description": "Type of input (concatenated or individual)", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.input_type", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::input_type", + "description": "Type of input (concatenated or individual)", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::file_count", + "description": "Number of files of this type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.file_count", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::file_count", + "description": "Number of files of this type", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_score", + "description": "Average score for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_score", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_score", + "description": "Average score for this input type", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_percentage", + "description": "Average percentage for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.average_percentage", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::average_percentage", + "description": "Average percentage for this input type", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.score_range", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::score_range", + "description": "Range of scores observed for this input type (minimum to maximum values).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InputTypePerformance.best_method", + "location": "D4D_Evaluation_Summary::classes::InputTypePerformance::attributes::best_method", + "description": "Generation method that achieved the highest performance for this input type.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.file_path", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::file_path", + "description": "Filesystem path to the generated evaluation output file.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.file_type", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::file_type", + "description": "File format type of the generated output (CSV, JSON, or Markdown).", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.description", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::description", + "description": "Explanation of the file contents and what data it contains.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "GeneratedFile.row_count", + "location": "D4D_Evaluation_Summary::classes::GeneratedFile::attributes::row_count", + "description": "Number of data rows or entries in the file (applicable to CSV and JSON formats).", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "RubricTypeEnum", + "location": "D4D_Evaluation_Summary::enums::RubricTypeEnum", + "description": "Types of evaluation rubrics", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "GenerationMethodEnum", + "location": "D4D_Evaluation_Summary::enums::GenerationMethodEnum", + "description": "D4D generation methods", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "Bridge2AIProjectEnum", + "location": "D4D_Evaluation_Summary::enums::Bridge2AIProjectEnum", + "description": "Bridge2AI Grand Challenge projects", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "StrengthLevelEnum", + "location": "D4D_Evaluation_Summary::enums::StrengthLevelEnum", + "description": "Assessment of element/category strength", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "WeaknessTypeEnum", + "location": "D4D_Evaluation_Summary::enums::WeaknessTypeEnum", + "description": "Types of common weaknesses", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "StrengthTypeEnum", + "location": "D4D_Evaluation_Summary::enums::StrengthTypeEnum", + "description": "Types of common strengths", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "FrequencyEnum", + "location": "D4D_Evaluation_Summary::enums::FrequencyEnum", + "description": "Frequency of occurrence", + "priority": "MEDIUM" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum", + "element_name": "InsightTypeEnum", + "location": "D4D_Evaluation_Summary::enums::InsightTypeEnum", + "description": "Types of analytical insights", + "priority": "MEDIUM" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 13, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 4, + "complete_sentences_pct": 30.76923076923077, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 91, + "with_examples": 6, + "with_examples_pct": 6.593406593406594, + "complete_sentences": 81, + "complete_sentences_pct": 89.01098901098901, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 10, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 9, + "complete_sentences_pct": 90.0, + "too_brief": 8, + "too_brief_pct": 80.0 + }, + "enum_value": { + "total": 38, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 26, + "complete_sentences_pct": 68.42105263157895, + "too_brief": 18, + "too_brief_pct": 47.368421052631575 + } + } + }, + { + "module": "D4D_FileCollection", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FileCollection.file_count", + "location": "D4D_FileCollection::classes::FileCollection::attributes::file_count", + "description": "Number of files in this collection.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FileCollection.total_bytes", + "location": "D4D_FileCollection::classes::FileCollection::attributes::total_bytes", + "description": "Total size of all files in bytes.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "FileCollectionTypeEnum.supplementary", + "location": "D4D_FileCollection::enums::FileCollectionTypeEnum::permissible_values::supplementary", + "description": "Supplementary materials", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 2, + "with_examples": 1, + "with_examples_pct": 50.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 4, + "with_examples": 2, + "with_examples_pct": 50.0, + "complete_sentences": 4, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 2, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 19, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 14, + "complete_sentences_pct": 73.68421052631578, + "too_brief": 6, + "too_brief_pct": 31.57894736842105 + } + } + }, + { + "module": "D4D_Human", + "issues": [ + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.involves_human_subjects", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::involves_human_subjects", + "description": "Does this dataset involve human subjects research?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.involves_human_subjects", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::involves_human_subjects", + "description": "Does this dataset involve human subjects research?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.irb_approval", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::irb_approval", + "description": "Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if applicable.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.ethics_review_board", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::ethics_review_board", + "description": "What ethics review board(s) reviewed this research? Include institution names and approval details.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.special_populations", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::special_populations", + "description": "Does the research involve any special populations that require additional protections (e.g., minors, pregnant women, prisoners)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectResearch.regulatory_compliance", + "location": "D4D_Human::classes::HumanSubjectResearch::attributes::regulatory_compliance", + "description": "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_obtained", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_obtained", + "description": "Was informed consent obtained from all participants?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_obtained", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_obtained", + "description": "Was informed consent obtained from all participants?", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_type", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_type", + "description": "What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_documentation", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_documentation", + "description": "How is consent documented? Include references to consent forms or procedures used.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.withdrawal_mechanism", + "location": "D4D_Human::classes::InformedConsent::attributes::withdrawal_mechanism", + "description": "How can participants withdraw their consent? What procedures are in place for data deletion upon withdrawal?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.withdrawal_mechanism", + "location": "D4D_Human::classes::InformedConsent::attributes::withdrawal_mechanism", + "description": "How can participants withdraw their consent? What procedures are in place for data deletion upon withdrawal?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "InformedConsent.consent_scope", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_scope", + "description": "What specific uses did participants consent to? Are there limitations on data use based on consent?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "InformedConsent.consent_scope", + "location": "D4D_Human::classes::InformedConsent::attributes::consent_scope", + "description": "What specific uses did participants consent to? Are there limitations on data use based on consent?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.anonymization_method", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::anonymization_method", + "description": "What methods were used to anonymize or de-identify participant data? Include technical details of privacy-preserving techniques.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.reidentification_risk", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::reidentification_risk", + "description": "What is the assessed risk of re-identification? What measures were taken to minimize this risk?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.reidentification_risk", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::reidentification_risk", + "description": "What is the assessed risk of re-identification? What measures were taken to minimize this risk?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.privacy_techniques", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::privacy_techniques", + "description": "What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data masking)?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.data_linkage", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::data_linkage", + "description": "Can this dataset be linked to other datasets in ways that might compromise participant privacy?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ParticipantPrivacy.data_linkage", + "location": "D4D_Human::classes::ParticipantPrivacy::attributes::data_linkage", + "description": "Can this dataset be linked to other datasets in ways that might compromise participant privacy?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_provided", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_provided", + "description": "Were participants compensated for their participation?", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_provided", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_provided", + "description": "Were participants compensated for their participation?", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_type", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_type", + "description": "What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other incentives)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_amount", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_amount", + "description": "What was the amount or value of compensation provided? Include currency or equivalent value.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_rationale", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_rationale", + "description": "What was the rationale for the compensation structure? How was the amount determined to be appropriate?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "HumanSubjectCompensation.compensation_rationale", + "location": "D4D_Human::classes::HumanSubjectCompensation::attributes::compensation_rationale", + "description": "What was the rationale for the compensation structure? How was the amount determined to be appropriate?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.at_risk_groups_included", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::at_risk_groups_included", + "description": "Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaired individuals)?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.special_protections", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::special_protections", + "description": "What additional protections were implemented for at-risk populations? Include safeguards, modified procedures, or additional oversight.\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.assent_procedures", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::assent_procedures", + "description": "For research involving minors, what assent procedures were used? How was developmentally appropriate assent obtained?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.assent_procedures", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::assent_procedures", + "description": "For research involving minors, what assent procedures were used? How was developmentally appropriate assent obtained?\n", + "priority": "LOW" + }, + { + "type": "MISSING_PERIOD", + "element_type": "attribute", + "element_name": "AtRiskPopulations.guardian_consent", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::guardian_consent", + "description": "For participants unable to provide their own consent, how was guardian or surrogate consent obtained?\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AtRiskPopulations.guardian_consent", + "location": "D4D_Human::classes::AtRiskPopulations::attributes::guardian_consent", + "description": "For participants unable to provide their own consent, how was guardian or surrogate consent obtained?\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 5, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 5, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 22, + "with_examples": 6, + "with_examples_pct": 27.27272727272727, + "complete_sentences": 22, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Maintenance", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Maintainer.maintainer_details", + "location": "D4D_Maintenance::classes::Maintainer::attributes::maintainer_details", + "description": "Details on who will support, host, or maintain the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Erratum.erratum_url", + "location": "D4D_Maintenance::classes::Erratum::attributes::erratum_url", + "description": "URL or access point for the erratum.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Erratum.erratum_details", + "location": "D4D_Maintenance::classes::Erratum::attributes::erratum_details", + "description": "Details on any errata or corrections to the dataset.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UpdatePlan.update_details", + "location": "D4D_Maintenance::classes::UpdatePlan::attributes::update_details", + "description": "Details on update plans, responsible parties, and communication methods.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RetentionLimits.retention_period", + "location": "D4D_Maintenance::classes::RetentionLimits::attributes::retention_period", + "description": "Time period for data retention.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RetentionLimits.retention_details", + "location": "D4D_Maintenance::classes::RetentionLimits::attributes::retention_details", + "description": "Details on data retention limits and enforcement procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.latest_version_doi", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::latest_version_doi", + "description": "DOI or URL of the latest dataset version.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.versions_available", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::versions_available", + "description": "List of available versions with metadata.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VersionAccess.version_details", + "location": "D4D_Maintenance::classes::VersionAccess::attributes::version_details", + "description": "Details on version support policies and obsolescence communication.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExtensionMechanism.contribution_url", + "location": "D4D_Maintenance::classes::ExtensionMechanism::attributes::contribution_url", + "description": "URL for contribution guidelines or process.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExtensionMechanism.extension_details", + "location": "D4D_Maintenance::classes::ExtensionMechanism::attributes::extension_details", + "description": "Details on extension mechanisms, contribution validation, and communication.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 6, + "with_examples": 2, + "with_examples_pct": 33.33333333333333, + "complete_sentences": 6, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 13, + "with_examples": 2, + "with_examples_pct": 15.384615384615385, + "complete_sentences": 13, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Metadata", + "issues": [], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Minimal", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MinimalDatasetCollection.resources", + "location": "D4D_Minimal::classes::MinimalDatasetCollection::attributes::resources", + "description": "The datasets in this collection.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 2, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 2, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Motivation", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Purpose.response", + "location": "D4D_Motivation::classes::Purpose::attributes::response", + "description": "Short explanation describing the primary purpose of creating the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Task.response", + "location": "D4D_Motivation::classes::Task::attributes::response", + "description": "Short explanation describing the specific task or tasks for which this dataset was created.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AddressingGap.response", + "location": "D4D_Motivation::classes::AddressingGap::attributes::response", + "description": "Short explanation of the knowledge or resource gap that this dataset was intended to address.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Creator.principal_investigator", + "location": "D4D_Motivation::classes::Creator::attributes::principal_investigator", + "description": "A key individual (Principal Investigator) responsible for or overseeing dataset creation.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Creator.affiliations", + "location": "D4D_Motivation::classes::Creator::attributes::affiliations", + "description": "Organizations with which the creator or team is affiliated.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FundingMechanism.grantor", + "location": "D4D_Motivation::classes::FundingMechanism::attributes::grantor", + "description": "Name/identifier of the organization providing monetary or resource support.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FundingMechanism.grants", + "location": "D4D_Motivation::classes::FundingMechanism::attributes::grants", + "description": "Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "Grant.grant_number", + "location": "D4D_Motivation::classes::Grant::attributes::grant_number", + "description": "The alphanumeric identifier for the grant.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 9, + "with_examples": 1, + "with_examples_pct": 11.11111111111111, + "complete_sentences": 9, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Preprocessing", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "PreprocessingStrategy.preprocessing_details", + "location": "D4D_Preprocessing::classes::PreprocessingStrategy::attributes::preprocessing_details", + "description": "Details on preprocessing steps applied to the data.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "CleaningStrategy.cleaning_details", + "location": "D4D_Preprocessing::classes::CleaningStrategy::attributes::cleaning_details", + "description": "Details on data cleaning procedures applied.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.data_annotation_protocol", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::data_annotation_protocol", + "description": "Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guidelines, quality control procedures, and task definitions.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.annotations_per_item", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::annotations_per_item", + "description": "Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "LabelingStrategy.labeling_details", + "location": "D4D_Preprocessing::classes::LabelingStrategy::attributes::labeling_details", + "description": "Details on labeling/annotation procedures and quality metrics.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawData.access_url", + "location": "D4D_Preprocessing::classes::RawData::attributes::access_url", + "description": "URL or access point for the raw data.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "RawData.raw_data_details", + "location": "D4D_Preprocessing::classes::RawData::attributes::raw_data_details", + "description": "Details on raw data availability and access procedures.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_method", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_method", + "description": "Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, model-based imputation, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputed_fields", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputed_fields", + "description": "Fields or columns where imputation was applied.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_rationale", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_rationale", + "description": "Justification for the imputation approach chosen, including assumptions made about missing data mechanisms.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ImputationProtocol.imputation_validation", + "location": "D4D_Preprocessing::classes::ImputationProtocol::attributes::imputation_validation", + "description": "Methods used to validate imputation quality (if any).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.agreement_metric", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::agreement_metric", + "description": "Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreement, etc.).\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.analysis_method", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::analysis_method", + "description": "Methodology used to assess annotation quality and resolve disagreements.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "AnnotationAnalysis.annotation_quality_details", + "location": "D4D_Preprocessing::classes::AnnotationAnalysis::attributes::annotation_quality_details", + "description": "Additional details on annotation quality assessment and findings.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "MachineAnnotationTools.tool_descriptions", + "location": "D4D_Preprocessing::classes::MachineAnnotationTools::attributes::tool_descriptions", + "description": "Descriptions of what each tool does in the annotation process and what types of annotations it produces. Should correspond to the tools list.\n", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 3, + "with_examples_pct": 42.857142857142854, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 22, + "with_examples": 7, + "with_examples_pct": 31.818181818181817, + "complete_sentences": 22, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Uses", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "ExistingUse.examples", + "location": "D4D_Uses::classes::ExistingUse::attributes::examples", + "description": "List of examples of known/previous uses of the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UseRepository.repository_url", + "location": "D4D_Uses::classes::UseRepository::attributes::repository_url", + "description": "URL to a repository of known dataset uses.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "UseRepository.repository_details", + "location": "D4D_Uses::classes::UseRepository::attributes::repository_details", + "description": "Details on the repository of known dataset uses.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "OtherTask.task_details", + "location": "D4D_Uses::classes::OtherTask::attributes::task_details", + "description": "Details on other potential tasks the dataset could be used for.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "FutureUseImpact.impact_details", + "location": "D4D_Uses::classes::FutureUseImpact::attributes::impact_details", + "description": "Details on potential impacts, risks, and mitigation strategies.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "DiscouragedUse.discouragement_details", + "location": "D4D_Uses::classes::DiscouragedUse::attributes::discouragement_details", + "description": "Details on tasks for which the dataset should not be used.\n", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IntendedUse.examples", + "location": "D4D_Uses::classes::IntendedUse::attributes::examples", + "description": "List of example intended uses for this dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "IntendedUse.usage_notes", + "location": "D4D_Uses::classes::IntendedUse::attributes::usage_notes", + "description": "Notes or caveats about using the dataset for intended purposes.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 7, + "with_examples": 1, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 7, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 10, + "with_examples": 2, + "with_examples_pct": 20.0, + "complete_sentences": 10, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + } + } + }, + { + "module": "D4D_Variables", + "issues": [ + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.variable_name", + "location": "D4D_Variables::classes::VariableMetadata::attributes::variable_name", + "description": "The name or identifier of the variable as it appears in the data files.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.unit", + "location": "D4D_Variables::classes::VariableMetadata::attributes::unit", + "description": "The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/). Examples: qudt:Kilogram, qudt:Meter, qudt:DegreeCelsius.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.missing_value_code", + "location": "D4D_Variables::classes::VariableMetadata::attributes::missing_value_code", + "description": "Code(s) used to represent missing values for this variable. Examples: \"NA\", \"-999\", \"null\", \"\". Multiple codes may be specified.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.minimum_value", + "location": "D4D_Variables::classes::VariableMetadata::attributes::minimum_value", + "description": "The minimum value that the variable can take. Applicable to numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.maximum_value", + "location": "D4D_Variables::classes::VariableMetadata::attributes::maximum_value", + "description": "The maximum value that the variable can take. Applicable to numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.categories", + "location": "D4D_Variables::classes::VariableMetadata::attributes::categories", + "description": "The permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.examples", + "location": "D4D_Variables::classes::VariableMetadata::attributes::examples", + "description": "Example values for this variable to illustrate typical data.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.is_identifier", + "location": "D4D_Variables::classes::VariableMetadata::attributes::is_identifier", + "description": "Indicates whether this variable serves as a unique identifier or key for records in the dataset.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.precision", + "location": "D4D_Variables::classes::VariableMetadata::attributes::precision", + "description": "The precision or number of decimal places for numeric variables.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.measurement_technique", + "location": "D4D_Variables::classes::VariableMetadata::attributes::measurement_technique", + "description": "The technique or method used to measure this variable. Examples: \"mass spectrometry\", \"self-report survey\", \"GPS coordinates\".", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.derivation", + "location": "D4D_Variables::classes::VariableMetadata::attributes::derivation", + "description": "Description of how this variable was derived or calculated from other variables, if applicable.", + "priority": "LOW" + }, + { + "type": "CONSIDER_EXAMPLE", + "element_type": "attribute", + "element_name": "VariableMetadata.quality_notes", + "location": "D4D_Variables::classes::VariableMetadata::attributes::quality_notes", + "description": "Notes about data quality, reliability, or known issues specific to this variable.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.integer", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::integer", + "description": "Whole numbers.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.string", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::string", + "description": "Text strings.", + "priority": "LOW" + }, + { + "type": "TOO_BRIEF", + "element_type": "enum_value", + "element_name": "VariableTypeEnum.boolean", + "location": "D4D_Variables::enums::VariableTypeEnum::permissible_values::boolean", + "description": "True/false values.", + "priority": "LOW" + } + ], + "stats": {}, + "quality_metrics": { + "module": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "class": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 1, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "attribute": { + "total": 14, + "with_examples": 2, + "with_examples_pct": 14.285714285714285, + "complete_sentences": 14, + "complete_sentences_pct": 100.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum": { + "total": 1, + "with_examples": 0, + "with_examples_pct": 0.0, + "complete_sentences": 0, + "complete_sentences_pct": 0.0, + "too_brief": 0, + "too_brief_pct": 0.0 + }, + "enum_value": { + "total": 13, + "with_examples": 1, + "with_examples_pct": 7.6923076923076925, + "complete_sentences": 13, + "complete_sentences_pct": 100.0, + "too_brief": 8, + "too_brief_pct": 61.53846153846154 + } + } + } + ] +} \ No newline at end of file diff --git a/scripts/range_description_checker.py b/scripts/range_description_checker.py new file mode 100755 index 00000000..159c31a6 --- /dev/null +++ b/scripts/range_description_checker.py @@ -0,0 +1,290 @@ +#!/usr/bin/env python3 +""" +Check alignment between description semantics and range types in D4D schema. + +Detects cases where: +- Boolean is used where richer types (enum, string, structured) are needed +- Description implies list but multivalued is false +- Description implies structured data but uses primitive type +""" + +import yaml +import json +import argparse +import re +from pathlib import Path +from typing import Dict, List, Tuple + +def load_schema_module(file_path: Path) -> dict: + """Load a YAML schema module.""" + with open(file_path, 'r') as f: + return yaml.safe_load(f) + +def analyze_description(description: str) -> dict: + """ + Analyze description to infer semantic intent. + + Returns dict with: + - implies_list: bool + - implies_structured: bool + - implies_choices: bool + - implied_type: str + """ + if not description: + return { + 'implies_list': False, + 'implies_structured': False, + 'implies_choices': False, + 'implied_type': 'unknown' + } + + desc_lower = description.lower() + + # Check for list indicators + list_indicators = [ + 'list of', 'multiple', 'array of', 'collection of', + 'one or more', 'set of', '(e.g.,', 'and/or' + ] + implies_list = any(ind in desc_lower for ind in list_indicators) + + # Check for structured data indicators + structured_indicators = [ + 'details of', 'information about', 'structured', + 'metadata', 'properties of' + ] + implies_structured = any(ind in desc_lower for ind in structured_indicators) + + # Check for choice indicators (should use enum) + choice_indicators = [ + 'whether', 'if', 'type of', 'which', 'one of', + '(e.g.,', 'such as' + ] + implies_choices = any(ind in desc_lower for ind in choice_indicators) + + # Infer implied type + if 'url' in desc_lower or 'link' in desc_lower or 'http' in desc_lower: + implied_type = 'uri' + elif 'date' in desc_lower and 'time' not in desc_lower: + implied_type = 'date' + elif 'date' in desc_lower and 'time' in desc_lower: + implied_type = 'datetime' + elif implies_choices and not implies_list: + implied_type = 'enum' + elif implies_structured: + implied_type = 'class' + else: + implied_type = 'string' + + return { + 'implies_list': implies_list, + 'implies_structured': implies_structured, + 'implies_choices': implies_choices, + 'implied_type': implied_type + } + +def check_range_mismatch(attr_def: dict, attr_name: str) -> Tuple[bool, str]: + """ + Check if range type matches description semantics. + + Returns (is_mismatch, reason) + """ + description = attr_def.get('description', '') + range_type = attr_def.get('range', 'string') + multivalued = attr_def.get('multivalued', False) + + if not description: + return False, '' + + analysis = analyze_description(description) + + # Check for boolean reductionism + if range_type == 'boolean' and (analysis['implies_structured'] or + analysis['implied_type'] not in ['boolean', 'unknown']): + return True, f"Boolean oversimplifies - description implies {analysis['implied_type']}" + + # Check for missing multivalued + if analysis['implies_list'] and not multivalued: + return True, "Description implies list but multivalued=false" + + # Check for primitive type when structured needed + if analysis['implies_structured'] and range_type in ['string', 'boolean', 'integer', 'float']: + return True, f"Primitive type '{range_type}' used but description implies structured class" + + # Check for string when enum appropriate + if (analysis['implies_choices'] and + range_type == 'string' and + not multivalued and + '?' not in description): # Questions are okay as strings + return True, "String used but description implies enum (limited choices)" + + # Check for URI field without uri range + if analysis['implied_type'] == 'uri' and range_type not in ['uri', 'uriorcurie', 'string']: + return True, f"Description implies URI but range is {range_type}" + + return False, '' + +def check_module(module_path: Path) -> List[dict]: + """Check all attributes in a module for range mismatches.""" + issues = [] + module_name = module_path.stem + module_data = load_schema_module(module_path) + + # Check classes and their attributes + classes = module_data.get('classes', {}) + for class_name, class_def in classes.items(): + if not class_def: + continue + + attributes = class_def.get('attributes', {}) + for attr_name, attr_def in attributes.items(): + if not attr_def: + continue + + is_mismatch, reason = check_range_mismatch(attr_def, attr_name) + + if is_mismatch: + issues.append({ + 'module': module_name, + 'file': module_path.name, + 'class': class_name, + 'attribute': attr_name, + 'description': attr_def.get('description', ''), + 'range': attr_def.get('range', 'string'), + 'multivalued': attr_def.get('multivalued', False), + 'issue': reason, + 'severity': categorize_severity(reason) + }) + + # Check top-level slots + slots = module_data.get('slots', {}) + for slot_name, slot_def in slots.items(): + if not slot_def: + continue + + is_mismatch, reason = check_range_mismatch(slot_def, slot_name) + + if is_mismatch: + issues.append({ + 'module': module_name, + 'file': module_path.name, + 'class': None, + 'attribute': slot_name, + 'description': slot_def.get('description', ''), + 'range': slot_def.get('range', 'string'), + 'multivalued': slot_def.get('multivalued', False), + 'issue': reason, + 'severity': categorize_severity(reason) + }) + + return issues + +def categorize_severity(issue_reason: str) -> str: + """Categorize issue severity based on reason.""" + if 'boolean oversimplifies' in issue_reason.lower(): + return 'HIGH' + elif 'implies list but' in issue_reason.lower(): + return 'HIGH' + elif 'implies structured' in issue_reason.lower(): + return 'MEDIUM' + elif 'implies enum' in issue_reason.lower(): + return 'MEDIUM' + else: + return 'LOW' + +def generate_report(all_issues: List[dict], output_format: str = 'json') -> str: + """Generate report of range mismatches.""" + # Group by severity + by_severity = {} + for issue in all_issues: + severity = issue['severity'] + if severity not in by_severity: + by_severity[severity] = [] + by_severity[severity].append(issue) + + report = { + 'metadata': { + 'tool': 'range_description_checker', + 'total_issues': len(all_issues) + }, + 'summary': { + severity: len(issues) + for severity, issues in by_severity.items() + }, + 'issues': all_issues + } + + if output_format == 'json': + return json.dumps(report, indent=2) + else: + # Text format + lines = [] + lines.append("=" * 80) + lines.append("RANGE-DESCRIPTION MISMATCH REPORT") + lines.append("=" * 80) + lines.append("") + lines.append(f"Total issues found: {len(all_issues)}") + lines.append("") + for severity in ['HIGH', 'MEDIUM', 'LOW']: + if severity in by_severity: + lines.append(f"\n{severity} PRIORITY ({len(by_severity[severity])} issues):") + lines.append("-" * 80) + for issue in by_severity[severity]: + location = f"{issue['module']}::{issue['class'] or 'slots'}::{issue['attribute']}" + lines.append(f"\n {location}") + lines.append(f" Range: {issue['range']}, Multivalued: {issue['multivalued']}") + lines.append(f" Issue: {issue['issue']}") + desc_preview = issue['description'][:70] + lines.append(f" Description: \"{desc_preview}...\"") + + lines.append("") + lines.append("=" * 80) + return "\n".join(lines) + +def main(): + parser = argparse.ArgumentParser( + description='Check range-description alignment in D4D schema' + ) + parser.add_argument( + '--schema-dir', + type=Path, + default=Path('src/data_sheets_schema/schema'), + help='Directory containing schema modules' + ) + parser.add_argument( + '--output', + type=Path, + help='Output file path (default: stdout)' + ) + parser.add_argument( + '--format', + choices=['json', 'text'], + default='json', + help='Output format' + ) + + args = parser.parse_args() + + # Check all modules + all_issues = [] + for module_path in sorted(args.schema_dir.glob('D4D_*.yaml')): + if module_path.name == 'D4D_Evaluation_Summary.yaml': + continue + + issues = check_module(module_path) + all_issues.extend(issues) + + # Generate report + report = generate_report(all_issues, args.format) + + # Output + if args.output: + args.output.write_text(report) + print(f"Report written to: {args.output}") + else: + print(report) + + # Exit with error code if issues found + return 1 if all_issues else 0 + +if __name__ == '__main__': + exit(main()) diff --git a/scripts/slot_uri_conflict_detector.py b/scripts/slot_uri_conflict_detector.py new file mode 100755 index 00000000..a5748990 --- /dev/null +++ b/scripts/slot_uri_conflict_detector.py @@ -0,0 +1,258 @@ +#!/usr/bin/env python3 +""" +Detect slot_uri conflicts in D4D schema modules. + +Identifies when multiple slots map to the same ontology term with different semantics, +which can cause RDF serialization issues and semantic ambiguity. +""" + +import yaml +import json +import argparse +from pathlib import Path +from collections import defaultdict +from typing import Dict, List, Tuple + +def load_schema_module(file_path: Path) -> dict: + """Load a YAML schema module.""" + with open(file_path, 'r') as f: + return yaml.safe_load(f) + +def extract_slot_uris(module_data: dict, module_name: str) -> List[Tuple[str, str, str, int]]: + """ + Extract all slot_uri declarations from a module. + + Returns list of (slot_uri, slot_name, description, line_number) tuples. + """ + slot_uris = [] + + # Check classes and their attributes + classes = module_data.get('classes', {}) + for class_name, class_def in classes.items(): + if not class_def: + continue + + attributes = class_def.get('attributes', {}) + for attr_name, attr_def in attributes.items(): + if not attr_def: + continue + + slot_uri = attr_def.get('slot_uri') + if slot_uri: + description = attr_def.get('description', '') + # Line number would require parsing with ruamel.yaml, so we use 0 for now + slot_uris.append((slot_uri, attr_name, description, 0)) + + # Check top-level slots + slots = module_data.get('slots', {}) + for slot_name, slot_def in slots.items(): + if not slot_def: + continue + + slot_uri = slot_def.get('slot_uri') + if slot_uri: + description = slot_def.get('description', '') + slot_uris.append((slot_uri, slot_name, description, 0)) + + return slot_uris + +def find_conflicts(modules_dir: Path) -> Dict[str, List[dict]]: + """ + Find all slot_uri conflicts across D4D modules. + + Returns dict mapping slot_uri to list of conflicting usages. + """ + # Map slot_uri to list of (slot_name, module, description) tuples + uri_map = defaultdict(list) + + # Process all D4D modules + for module_path in sorted(modules_dir.glob('D4D_*.yaml')): + if module_path.name == 'D4D_Evaluation_Summary.yaml': + continue # Skip evaluation summary schema + + module_name = module_path.stem + module_data = load_schema_module(module_path) + + slot_uris = extract_slot_uris(module_data, module_name) + + for slot_uri, slot_name, description, line_num in slot_uris: + uri_map[slot_uri].append({ + 'slot_name': slot_name, + 'module': module_name, + 'file': str(module_path.name), + 'description': description, + 'line': line_num + }) + + # Find conflicts (same slot_uri with different slot names) + conflicts = {} + for slot_uri, usages in uri_map.items(): + # Group by slot name + unique_names = set(u['slot_name'] for u in usages) + + if len(unique_names) > 1: + conflicts[slot_uri] = usages + + return conflicts + +def analyze_conflict_severity(slot_uri: str, usages: List[dict]) -> str: + """ + Analyze conflict severity based on semantic similarity of descriptions. + + Returns: CRITICAL, HIGH, MEDIUM, or LOW + """ + descriptions = [u['description'].lower() for u in usages if u['description']] + + # If descriptions are very different, it's critical + # Simple heuristic: check for overlapping keywords + if len(descriptions) < 2: + return "HIGH" # Missing descriptions make it hard to assess + + # Check for semantic similarity (simple keyword overlap) + words1 = set(descriptions[0].split()) + words2 = set(descriptions[1].split()) + overlap = len(words1 & words2) / len(words1 | words2) if (words1 | words2) else 0 + + if overlap < 0.2: + return "CRITICAL" # Very different semantics + elif overlap < 0.4: + return "HIGH" # Somewhat different semantics + elif overlap < 0.6: + return "MEDIUM" # Some overlap + else: + return "LOW" # Very similar (might be intentional) + +def assess_impact(slot_uri: str, usages: List[dict]) -> dict: + """Assess the impact of a slot_uri conflict.""" + return { + 'data_corruption_risk': 'low', # slot_uri changes don't affect data + 'tool_breakage_risk': 'high', # RDF converters may fail + 'semantic_integrity': 'critical', # Ambiguous meaning + 'migration_complexity': 'low' # Just schema changes + } + +def recommend_fix(slot_uri: str, usages: List[dict]) -> dict: + """Recommend how to fix the conflict.""" + # Analyze which usage seems most aligned with the ontology term + # For now, provide generic recommendation + + recommendations = [] + for i, usage in enumerate(usages): + if i == 0: + # Suggest keeping the first one + recommendations.append({ + 'slot': usage['slot_name'], + 'action': 'keep', + 'new_slot_uri': slot_uri, + 'rationale': 'Appears to match ontology term semantics' + }) + else: + # Suggest changing others + recommendations.append({ + 'slot': usage['slot_name'], + 'action': 'change', + 'new_slot_uri': f"d4d:{usage['slot_name']}", + 'rationale': 'Create custom D4D term to avoid conflict' + }) + + return { + 'approach': 'differentiate_mappings', + 'recommendations': recommendations + } + +def generate_report(conflicts: Dict[str, List[dict]], output_format: str = 'json') -> str: + """Generate conflict report in specified format.""" + + report = { + 'metadata': { + 'tool': 'slot_uri_conflict_detector', + 'total_conflicts': len(conflicts) + }, + 'conflicts': [] + } + + for slot_uri, usages in sorted(conflicts.items()): + severity = analyze_conflict_severity(slot_uri, usages) + impact = assess_impact(slot_uri, usages) + fix = recommend_fix(slot_uri, usages) + + conflict_entry = { + 'slot_uri': slot_uri, + 'severity': severity, + 'conflict_count': len(set(u['slot_name'] for u in usages)), + 'usages': usages, + 'impact': impact, + 'recommended_fix': fix + } + + report['conflicts'].append(conflict_entry) + + if output_format == 'json': + return json.dumps(report, indent=2) + else: + # Text format + lines = [] + lines.append("=" * 80) + lines.append("SLOT_URI CONFLICT DETECTION REPORT") + lines.append("=" * 80) + lines.append("") + lines.append(f"Total conflicts found: {len(conflicts)}") + lines.append("") + + for conflict in sorted(report['conflicts'], key=lambda c: c['severity']): + lines.append(f"\n[{conflict['severity']}] slot_uri: {conflict['slot_uri']}") + lines.append(f" Conflict: {conflict['conflict_count']} different slot names use this URI") + lines.append(f" Usages:") + for usage in conflict['usages']: + lines.append(f" - {usage['slot_name']} ({usage['module']})") + if usage['description']: + desc_preview = usage['description'][:60] + lines.append(f" '{desc_preview}...'") + lines.append(f" Recommended fix: {conflict['recommended_fix']['approach']}") + + lines.append("") + lines.append("=" * 80) + return "\n".join(lines) + +def main(): + parser = argparse.ArgumentParser( + description='Detect slot_uri conflicts in D4D schema modules' + ) + parser.add_argument( + '--schema-dir', + type=Path, + default=Path('src/data_sheets_schema/schema'), + help='Directory containing schema modules' + ) + parser.add_argument( + '--output', + type=Path, + help='Output file path (default: stdout)' + ) + parser.add_argument( + '--format', + choices=['json', 'text'], + default='json', + help='Output format' + ) + + args = parser.parse_args() + + # Find conflicts + conflicts = find_conflicts(args.schema_dir) + + # Generate report + report = generate_report(conflicts, args.format) + + # Output + if args.output: + args.output.write_text(report) + print(f"Report written to: {args.output}") + else: + print(report) + + # Exit with error code if conflicts found + return 1 if conflicts else 0 + +if __name__ == '__main__': + exit(main()) diff --git a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_comprehensive.tsv b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_comprehensive.tsv index 846dbeab..dc041d0c 100644 --- a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_comprehensive.tsv +++ b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_comprehensive.tsv @@ -1,350 +1,331 @@ # Comprehensive SSSOM Mapping - ALL D4D Attributes # Includes mapped, recommended, novel, free text, and unmapped attributes -# Date: 2026-03-25T22:41:15.618808 -# Total attributes: 270 +# Date: 2026-04-09T10:17:24.818193 +# Total attributes: 284 # # Status breakdown: -# free_text: 54 -# mapped: 68 -# novel_d4d: 39 +# free_text: 55 +# mapped: 63 +# novel_d4d: 45 # recommended: 69 -# unmapped: 40 +# unmapped: 52 # d4d_schema_path subject_id subject_label predicate_id rocrate_json_path object_id object_label mapping_justification confidence comment author_id mapping_date subject_source object_source mapping_set_id mapping_set_version mapping_status d4d_description -Dataset.access_details d4d:access_details Access Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Information on how to access or retrieve the raw source data. -" -Dataset.access_url d4d:access_url Access Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the raw data. -Dataset.access_urls d4d:access_urls Access Urls semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details of the distribution channel(s) or format(s). -Dataset.acquisition_details d4d:acquisition_details Acquisition Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how data was acquired for each instance. -" -Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps skos:exactMatch @graph[?@type='Dataset']['d4d:addressing_gaps'] d4d:addressing_gaps addressing_gaps semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.affected_subsets d4d:affected_subsets Affected Subsets semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Specific subsets or features of the dataset affected by this bias. -" -Dataset.affiliation d4d:affiliation Affiliation semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The organization(s) to which the person belongs in the context of this dataset. May vary across data... -Dataset.affiliations d4d:affiliations Affiliations semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Organizations with which the creator or team is affiliated. -Dataset.agreement_metric d4d:agreement_metric Agreement Metric semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreem... -Dataset.analysis_method d4d:analysis_method Analysis Method semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Methodology used to assess annotation quality and resolve disagreements. -" -Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses skos:exactMatch @graph[?@type='Dataset']['d4d:annotation_analyses'] d4d:annotation_analyses annotation_analyses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Analysis of annotation quality and inter-annotator agreement. -Dataset.annotation_quality_details d4d:annotation_quality_details Annotation Quality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Additional details on annotation quality assessment and findings. -" -Dataset.annotations_per_item d4d:annotations_per_item Annotations Per Item semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Number of annotations collected per data item. Multiple annotations per item enable calculation of i... -Dataset.annotator_demographics d4d:annotator_demographics Annotator Demographics semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Demographic information about annotators, if available and relevant (e.g., geographic location, lang... -Dataset.anomalies d4d:anomalies Anomalies skos:exactMatch @graph[?@type='Dataset']['d4d:dataAnomalies'] d4d:dataAnomalies dataAnomalies semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.anomaly_details d4d:anomaly_details Anomaly Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on errors, noise sources, or redundancies in the dataset. -" -Dataset.anonymization_method d4d:anonymization_method Anonymization Method semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text What methods were used to anonymize or de-identify participant data? Include technical details of pr... -Dataset.archival d4d:archival Archival semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indication whether official archival versions of external resources are included. -" -Dataset.assent_procedures d4d:assent_procedures Assent Procedures semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For research involving minors, what assent procedures were used? How was developmentally appropriate... -Dataset.at_risk_groups_included d4d:at_risk_groups_included At Risk Groups Included semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaire... -Dataset.at_risk_populations d4d:at_risk_populations At Risk Populations skos:exactMatch @graph[?@type='Dataset']['d4d:atRiskPopulations'] d4d:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about protections for at-risk populations (e.g., minors, pregnant women, prisoners) incl... -Dataset.bias_description d4d:bias_description Bias Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of how this bias manifests in the dataset, including affected populations, feat... -Dataset.bias_type d4d:bias_type Bias Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of bias identified, using standardized categories from the Artificial Intelligence Ontology... -Dataset.bytes d4d:bytes Bytes skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Size of the data in bytes. -Dataset.categories d4d:categories Categories semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The permitted categories or values for a categorical variable. Each entry should describe a possible... -Dataset.citation d4d:citation Citation skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Recommended citation for this dataset in DataCite or BibTeX format. Provides a standard way to cite ... -Dataset.cleaning_details d4d:cleaning_details Cleaning Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on data cleaning procedures applied. -" -Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:cleaning_strategies'] d4d:cleaning_strategies cleaning_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.collection_details d4d:collection_details Collection Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on direct vs. indirect collection methods and sources. -" -Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionTimeframe'] rai:dataCollectionTimeframe dataCollectionTimeframe semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.collector_details d4d:collector_details Collector Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on who collected the data and their compensation. -" -Dataset.comment_prefix d4d:comment_prefix Comment Prefix semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text -Dataset.compensation_amount d4d:compensation_amount Compensation Amount skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_amount'] d4d:compensation_amount compensation_amount semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "What was the amount or value of compensation provided? Include currency or equivalent value. -" -Dataset.compensation_provided d4d:compensation_provided Compensation Provided skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_provided'] d4d:compensation_provided compensation_provided semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Were participants compensated for their participation? -Dataset.compensation_rationale d4d:compensation_rationale Compensation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_rationale'] d4d:compensation_rationale compensation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What was the rationale for the compensation structure? How was the amount determined to be appropria... -Dataset.compensation_type d4d:compensation_type Compensation Type skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_type'] d4d:compensation_type compensation_type semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other inc... -Dataset.compression d4d:compression Compression skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped compression format used, if any. e.g., gzip, bzip2, zip -Dataset.confidential_elements d4d:confidential_elements Confidential Elements skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements'] d4d:confidential_elements confidential_elements semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.confidential_elements_present d4d:confidential_elements_present Confidential Elements Present skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements_present'] d4d:confidential_elements_present confidential_elements_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any confidential data elements are present. -Dataset.confidentiality_details d4d:confidentiality_details Confidentiality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on confidential data elements and handling procedures. -" -Dataset.confidentiality_level d4d:confidentiality_level Confidentiality Level skos:exactMatch @graph[?@type='Dataset']['d4d:confidentiality_level'] d4d:confidentiality_level confidentiality_level semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidentiality classification of the dataset indicating level of access restrictions and sensitivit... -Dataset.conforms_to d4d:conforms_to Conforms To skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.consent_details d4d:consent_details Consent Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how consent was requested, provided, and documented. -" -Dataset.consent_documentation d4d:consent_documentation Consent Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "How is consent documented? Include references to consent forms or procedures used. -" -Dataset.consent_obtained d4d:consent_obtained Consent Obtained semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was informed consent obtained from all participants? -Dataset.consent_scope d4d:consent_scope Consent Scope semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "What specific uses did participants consent to? Are there limitations on data use based on consent? -" -Dataset.consent_type d4d:consent_type Consent Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)... -Dataset.contact_person d4d:contact_person Contact Person skos:exactMatch @graph[?@type='Dataset']['d4d:contact_person'] d4d:contact_person contact_person semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for questions about ethical review. Provides structured contact information including... -Dataset.content_warnings d4d:content_warnings Content Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings'] d4d:content_warnings content_warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.content_warnings_present d4d:content_warnings_present Content Warnings Present skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings_present'] d4d:content_warnings_present content_warnings_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any content warnings are needed. -Dataset.contribution_url d4d:contribution_url Contribution Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL for contribution guidelines or process. -Dataset.counts d4d:counts Counts semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "How many instances are there in total (of each type, if appropriate)? -" -Dataset.created_by d4d:created_by Created By skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.created_on d4d:created_on Created On skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.creators d4d:creators Creators skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.credit_roles d4d:credit_roles Credit Roles skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal investigator or cr... -Dataset.data_annotation_platform d4d:data_annotation_platform Data Annotation Platform semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Platform or tool used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom an... -Dataset.data_annotation_protocol d4d:data_annotation_protocol Data Annotation Protocol skos:exactMatch @graph[?@type='Dataset']['d4d:data_annotation_protocol'] d4d:data_annotation_protocol data_annotation_protocol semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guideline... -Dataset.data_collectors d4d:data_collectors Data Collectors skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.data_linkage d4d:data_linkage Data Linkage semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Can this dataset be linked to other datasets in ways that might compromise participant privacy? -" -Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:data_protection_impacts'] d4d:data_protection_impacts data_protection_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.data_substrate d4d:data_substrate Data Substrate semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Type of data (e.g., raw text, images) from Bridge2AI standards. -" -Dataset.data_topic d4d:data_topic Data Topic semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "General topic of each instance (e.g., from Bridge2AI standards). -" -Dataset.data_type d4d:data_type Data Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The data type of the variable (e.g., integer, float, string, boolean, date, categorical). Use standa... -Dataset.data_use_permission d4d:data_use_permission Data Use Permission semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., g... -Dataset.deidentification_details d4d:deidentification_details Deidentification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on de-identification procedures and residual risks. -" -Dataset.delimiter d4d:delimiter Delimiter semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended -Dataset.derivation d4d:derivation Derivation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of how this variable was derived or calculated from other variables, if applicable. -Dataset.description d4d:description Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A human-readable description for a thing. -Dataset.dialect d4d:dialect Dialect skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile). -Dataset.disagreement_patterns d4d:disagreement_patterns Disagreement Patterns semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Systematic patterns in annotator disagreements (e.g., by demographic group, annotation difficulty, t... -Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses skos:exactMatch @graph[?@type='Dataset']['rai:prohibitedUses'] rai:prohibitedUses prohibitedUses semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.discouragement_details d4d:discouragement_details Discouragement Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on tasks for which the dataset should not be used. -" -Dataset.distribution d4d:distribution Distribution semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped -Dataset.distribution_dates d4d:distribution_dates Distribution Dates skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.distribution_formats d4d:distribution_formats Distribution Formats skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.doi d4d:doi Doi skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped digital object identifier -Dataset.double_quote d4d:double_quote Double Quote semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended -Dataset.download_url d4d:download_url Download Url skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped URL from which the data can be downloaded. This is not the same as the landing page, which is a page... -Dataset.email d4d:email Email semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The email address of the person. Represents current/preferred contact information in the context of ... -Dataset.encoding d4d:encoding Encoding skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped the character encoding of the data -Dataset.end_date d4d:end_date End Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended End date of data collection -Dataset.errata d4d:errata Errata skos:exactMatch @graph[?@type='Dataset']['d4d:errata'] d4d:errata errata semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.erratum_details d4d:erratum_details Erratum Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on any errata or corrections to the dataset. -" -Dataset.erratum_url d4d:erratum_url Erratum Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the erratum. -Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews skos:exactMatch @graph[?@type='Dataset']['d4d:ethical_reviews'] d4d:ethical_reviews ethical_reviews semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.ethics_review_board d4d:ethics_review_board Ethics Review Board semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "What ethics review board(s) reviewed this research? Include institution names and approval details. -" -Dataset.examples d4d:examples Examples semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of examples of known/previous uses of the dataset. -Dataset.existing_uses d4d:existing_uses Existing Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.extension_details d4d:extension_details Extension Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on extension mechanisms, contribution validation, and communication. -" -Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.external_resources d4d:external_resources External Resources skos:closeMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Links or identifiers for external resources. Can be used either as a list of ExternalResource object... -Dataset.format d4d:format Format skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file format, physical medium, or dimensions of a resource. This should be a file extension or MI... -Dataset.frequency d4d:frequency Frequency semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How often updates are planned (e.g., quarterly, annually). -Dataset.funders d4d:funders Funders skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.future_guarantees d4d:future_guarantees Future Guarantees semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of any commitments that external resources will remain available and stable over time. -" -Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:future_use_impacts'] d4d:future_use_impacts future_use_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.governance_committee_contact d4d:governance_committee_contact Governance Committee Contact skos:exactMatch @graph[?@type='Dataset']['d4d:governance_committee_contact'] d4d:governance_committee_contact governance_committee_contact semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for data governance committee. This person can answer questions about data governance... -Dataset.grant_number d4d:grant_number Grant Number semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The alphanumeric identifier for the grant. -Dataset.grantor d4d:grantor Grantor semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Name/identifier of the organization providing monetary or resource support. -Dataset.grants d4d:grants Grants semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset. -Dataset.guardian_consent d4d:guardian_consent Guardian Consent semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For participants unable to provide their own consent, how was guardian or surrogate consent obtained... -Dataset.handling_strategy d4d:handling_strategy Handling Strategy skos:exactMatch @graph[?@type='Dataset']['d4d:handling_strategy'] d4d:handling_strategy handling_strategy semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Strategy used to handle missing data (e.g., deletion, imputation, flagging, multiple imputation). -" -Dataset.hash d4d:hash Hash skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped hash of the data -Dataset.header d4d:header Header semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended -Dataset.hipaa_compliant d4d:hipaa_compliant Hipaa Compliant semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA app... -Dataset.human_subject_research d4d:human_subject_research Human Subject Research skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about whether dataset involves human subjects research, including IRB approval, ethics r... -Dataset.id d4d:id Id skos:exactMatch @graph[?@type='Dataset']['ID'] rdf:ID ID semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-comprehensive-v1 1.0 mapped A unique identifier for a thing. -Dataset.identifiable_elements_present d4d:identifiable_elements_present Identifiable Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether data subjects can be identified. -Dataset.identification d4d:identification Identification semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped -Dataset.identifiers_removed d4d:identifiers_removed Identifiers Removed skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended List of identifier types removed during de-identification. -Dataset.impact_details d4d:impact_details Impact Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on potential impacts, risks, and mitigation strategies. -" -Dataset.imputation_method d4d:imputation_method Imputation Method skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_method'] d4d:imputation_method imputation_method semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, ... -Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_protocols'] d4d:imputation_protocols imputation_protocols semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data imputation methodology and techniques. -Dataset.imputation_rationale d4d:imputation_rationale Imputation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_rationale'] d4d:imputation_rationale imputation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Justification for the imputation approach chosen, including assumptions made about missing data mech... -Dataset.imputation_validation d4d:imputation_validation Imputation Validation skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_validation'] d4d:imputation_validation imputation_validation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Methods used to validate imputation quality (if any). -" -Dataset.imputed_fields d4d:imputed_fields Imputed Fields skos:exactMatch @graph[?@type='Dataset']['d4d:imputed_fields'] d4d:imputed_fields imputed_fields semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Fields or columns where imputation was applied. -" -Dataset.informed_consent d4d:informed_consent Informed Consent semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details about informed consent procedures, including consent type, documentation, and withdrawal mec... -Dataset.instance_type d4d:instance_type Instance Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Multiple types of instances? (e.g., movies, users, and ratings). -" -Dataset.instances d4d:instances Instances skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.intended_uses d4d:intended_uses Intended Uses skos:exactMatch @graph[?@type='Dataset']['d4d:intended_uses'] d4d:intended_uses intended_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicit intended and recommended uses for this dataset. Complements future_use_impacts by focusing ... -Dataset.inter_annotator_agreement d4d:inter_annotator_agreement Inter Annotator Agreement semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, p... -Dataset.inter_annotator_agreement_score d4d:inter_annotator_agreement_score Inter Annotator Agreement Score semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Measured agreement between annotators (e.g., Cohen's kappa value, Fleiss' kappa, Krippendorff's alph... -Dataset.involves_human_subjects d4d:involves_human_subjects Involves Human Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does this dataset involve human subjects research? -Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.irb_approval d4d:irb_approval Irb Approval semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if a... -Dataset.is_data_split d4d:is_data_split Is Data Split semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a split of the larger dataset, e.g., is it a set for model training, testing, or vali... -Dataset.is_deidentified d4d:is_deidentified Is Deidentified skos:exactMatch @graph[?@type='Dataset']['d4d:is_deidentified'] d4d:is_deidentified is_deidentified semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.is_direct d4d:is_direct Is Direct semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether collection was direct from individuals -Dataset.is_identifier d4d:is_identifier Is Identifier semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable serves as a unique identifier or key for records in the dataset. -Dataset.is_random d4d:is_random Is Random semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether the sample is random. -Dataset.is_representative d4d:is_representative Is Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether the sample is representative of the larger set. -" -Dataset.is_sample d4d:is_sample Is Sample semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether it is a sample of a larger set. -Dataset.is_sensitive d4d:is_sensitive Is Sensitive semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable contains sensitive information (e.g., personal data, protected healt... -Dataset.is_shared d4d:is_shared Is Shared semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Boolean indicating whether the dataset is distributed to parties external to the dataset-creating en... -Dataset.is_subpopulation d4d:is_subpopulation Is Subpopulation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a subpopulation of the larger dataset, e.g., is it a set of data for a specific demog... -Dataset.is_tabular d4d:is_tabular Is Tabular skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.issued d4d:issued Issued skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.keywords d4d:keywords Keywords skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.known_biases d4d:known_biases Known Biases skos:exactMatch @graph[?@type='Dataset']['d4d:known_biases'] d4d:known_biases known_biases semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known biases present in the dataset that may affect fairness, representativeness, or model performan... -Dataset.known_limitations d4d:known_limitations Known Limitations skos:exactMatch @graph[?@type='Dataset']['d4d:known_limitations'] d4d:known_limitations known_limitations semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known limitations of the dataset that may affect its use or interpretation. Distinct from biases (sy... -Dataset.label d4d:label Label semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Is there a label or target associated with each instance? -" -Dataset.label_description d4d:label_description Label Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "If labeled, what pattern or format do labels follow? -" -Dataset.labeling_details d4d:labeling_details Labeling Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on labeling/annotation procedures and quality metrics. -" -Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:labeling_strategies'] d4d:labeling_strategies labeling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.language d4d:language Language skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped language in which the information is expressed -Dataset.last_updated_on d4d:last_updated_on Last Updated On skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.latest_version_doi d4d:latest_version_doi Latest Version Doi semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended DOI or URL of the latest dataset version. -Dataset.license d4d:license License skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.license_terms d4d:license_terms License Terms semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of the dataset's license and terms of use (including links, costs, or usage constraints)... -Dataset.limitation_description d4d:limitation_description Limitation Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Detailed description of the limitation and its implications. -" -Dataset.limitation_type d4d:limitation_type Limitation Type skos:closeMatch @graph[?@type='Dataset']['temporalCoverage'] schema:temporalCoverage temporalCoverage semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Category of limitation (e.g., scope, coverage, temporal, methodological). -" -Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Automated annotation tools used in dataset creation. -Dataset.maintainer_details d4d:maintainer_details Maintainer Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on who will support, host, or maintain the dataset. -" -Dataset.maintainers d4d:maintainers Maintainers skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.maximum_value d4d:maximum_value Maximum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The maximum value that the variable can take. Applicable to numeric variables. -Dataset.md5 d4d:md5 Md5 skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped md5 hash of the data -Dataset.measurement_technique d4d:measurement_technique Measurement Technique semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The technique or method used to measure this variable. Examples: ""mass spectrometry"", ""self-report s..." -Dataset.mechanism_details d4d:mechanism_details Mechanism Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on mechanisms or procedures used to collect the data. -" -Dataset.media_type d4d:media_type Media Type skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The media type of the data. This should be a MIME type. -Dataset.method d4d:method Method semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Method used for de-identification (e.g., HIPAA Safe Harbor). -Dataset.minimum_value d4d:minimum_value Minimum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The minimum value that the variable can take. Applicable to numeric variables. -Dataset.missing d4d:missing Missing semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the missing data fields or elements. -" -Dataset.missing_data_causes d4d:missing_data_causes Missing Data Causes semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Known or suspected causes of missing data (e.g., sensor failures, participant dropout, privacy const... -Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Documentation of missing data patterns and handling strategies. -Dataset.missing_data_patterns d4d:missing_data_patterns Missing Data Patterns semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of patterns in missing data (e.g., missing completely at random, missing at random, miss... -Dataset.missing_information d4d:missing_information Missing Information semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "References to one or more MissingInfo objects describing missing data. -" -Dataset.missing_value_code d4d:missing_value_code Missing Value Code skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Code(s) used to represent missing values for this variable. Examples: ""NA"", ""-999"", ""null"", """". Mult..." -Dataset.mitigation_strategy d4d:mitigation_strategy Mitigation Strategy semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Steps taken or recommended to mitigate this bias. -" -Dataset.modified_by d4d:modified_by Modified By skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.name d4d:name Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A human-readable name for a thing. -Dataset.notification_details d4d:notification_details Notification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on how individuals were notified about data collection. -" -Dataset.orcid d4d:orcid Orcid semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format... -Dataset.other_compliance d4d:other_compliance Other Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Other regulatory compliance frameworks applicable to this dataset (e.g., CCPA, PIPEDA, industry-spec... -Dataset.other_tasks d4d:other_tasks Other Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.page d4d:page Page skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.parent_datasets d4d:parent_datasets Parent Datasets skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Parent datasets that this dataset is part of or derived from. Enables hierarchical dataset compositi... -Dataset.participant_compensation d4d:participant_compensation Participant Compensation skos:exactMatch @graph[?@type='Dataset']['d4d:participant_compensation'] d4d:participant_compensation participant_compensation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Information about compensation or incentives provided to human research participants. -Dataset.participant_privacy d4d:participant_privacy Participant Privacy skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about privacy protections and anonymization procedures for human research participants. -Dataset.path d4d:path Path skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.precision d4d:precision Precision skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The precision or number of decimal places for numeric variables. -Dataset.preprocessing_details d4d:preprocessing_details Preprocessing Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on preprocessing steps applied to the data. -" -Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:preprocessing_strategies'] d4d:preprocessing_strategies preprocessing_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.principal_investigator d4d:principal_investigator Principal Investigator semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A key individual (Principal Investigator) responsible for or overseeing dataset creation. -Dataset.privacy_techniques d4d:privacy_techniques Privacy Techniques semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data maski... -Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses skos:exactMatch @graph[?@type='Dataset']['d4d:prohibited_uses'] d4d:prohibited_uses prohibited_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicitly prohibited or forbidden uses for this dataset. Stronger than discouraged_uses - these are... -Dataset.prohibition_reason d4d:prohibition_reason Prohibition Reason skos:exactMatch @graph[?@type='Dataset']['d4d:prohibition_reason'] d4d:prohibition_reason prohibition_reason semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Reason why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal c... -Dataset.publisher d4d:publisher Publisher skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.purposes d4d:purposes Purposes skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.quality_notes d4d:quality_notes Quality Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes about data quality, reliability, or known issues specific to this variable. -Dataset.quote_char d4d:quote_char Quote Char semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended -Dataset.raw_data_details d4d:raw_data_details Raw Data Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on raw data availability and access procedures. -" -Dataset.raw_data_format d4d:raw_data_format Raw Data Format semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Format of the raw data before any preprocessing. -" -Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of raw data sources before preprocessing. -Dataset.raw_sources d4d:raw_sources Raw Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.recommended_mitigation d4d:recommended_mitigation Recommended Mitigation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Recommended approaches for users to address this limitation. -" -Dataset.regulatory_compliance d4d:regulatory_compliance Regulatory Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)? -" -Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions skos:closeMatch @graph[?@type='Dataset']['conditionsOfAccess'] schema:conditionsOfAccess conditionsOfAccess semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.reidentification_risk d4d:reidentification_risk Reidentification Risk semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What is the assessed risk of re-identification? What measures were taken to minimize this risk? -" -Dataset.related_datasets d4d:related_datasets Related Datasets skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use Data... -Dataset.relationship_details d4d:relationship_details Relationship Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on relationships between instances (e.g., graph edges, ratings). -" -Dataset.relationship_type d4d:relationship_type Relationship Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of relationship (e.g., derives_from, supplements, is_version_of). Uses DatasetRelationshipT... -Dataset.release_dates d4d:release_dates Release Dates semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Dates or timeframe for dataset release. Could be a one-time release date or multiple scheduled relea... -Dataset.repository_details d4d:repository_details Repository Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on the repository of known dataset uses. -" -Dataset.repository_url d4d:repository_url Repository Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL to a repository of known dataset uses. -Dataset.representative_verification d4d:representative_verification Representative Verification skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Explanation of how representativeness was validated or verified. -" -Dataset.resources d4d:resources Resources skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Sub-resources or component datasets. Used in DatasetCollection to contain Dataset objects, and in Da... -Dataset.response d4d:response Response semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Short explanation describing the primary purpose of creating the dataset. -Dataset.restrictions d4d:restrictions Restrictions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of any restrictions or fees associated with external resources. -" -Dataset.retention_details d4d:retention_details Retention Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on data retention limits and enforcement procedures. -" -Dataset.retention_limit d4d:retention_limit Retention Limit skos:exactMatch @graph[?@type='Dataset']['d4d:retention_limit'] d4d:retention_limit retention_limit semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.retention_period d4d:retention_period Retention Period skos:exactMatch @graph[?@type='Dataset']['d4d:retention_period'] d4d:retention_period retention_period semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time period for data retention. -Dataset.review_details d4d:review_details Review Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on ethical review processes, outcomes, and supporting documentation. -" -Dataset.reviewing_organization d4d:reviewing_organization Reviewing Organization skos:exactMatch @graph[?@type='Dataset']['d4d:reviewing_organization'] d4d:reviewing_organization reviewing_organization semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Organization that conducted the ethical review (e.g., Institutional Review Board, Ethics Committee, ... -Dataset.revocation_details d4d:revocation_details Revocation Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on consent revocation mechanisms and procedures. -" -Dataset.role d4d:role Role semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Role of the data collector (e.g., researcher, crowdworker) -Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:sampling_strategies'] d4d:sampling_strategies sampling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.scope_impact d4d:scope_impact Scope Impact semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "How this limitation affects the scope or applicability of the dataset. -" -Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.sensitive_elements_present d4d:sensitive_elements_present Sensitive Elements Present semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether sensitive data elements are present. -Dataset.sensitivity_details d4d:sensitivity_details Sensitivity Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on sensitive data elements present and handling procedures. -" -Dataset.sha256 d4d:sha256 Sha256 skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped sha256 hash of the data -Dataset.source_data d4d:source_data Source Data semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the larger set from which the sample was drawn, if any. -" -Dataset.source_description d4d:source_description Source Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collec... -Dataset.source_type d4d:source_type Source Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Type of raw source (sensor, database, user input, web scraping, etc.). -" -Dataset.special_populations d4d:special_populations Special Populations semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does the research involve any special populations that require additional protections (e.g., minors,... -Dataset.special_protections d4d:special_protections Special Protections semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped What additional protections were implemented for at-risk populations? Include safeguards, modified p... -Dataset.split_details d4d:split_details Split Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on recommended data splits and their rationale. -" -Dataset.start_date d4d:start_date Start Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Start date of data collection -Dataset.status d4d:status Status skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.strategies d4d:strategies Strategies semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the sampling strategy (deterministic, probabilistic, etc.). -" -Dataset.subpopulation_elements_present d4d:subpopulation_elements_present Subpopulation Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether any subpopulations are explicitly identified. -Dataset.subpopulations d4d:subpopulations Subpopulations skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.subsets d4d:subsets Subsets skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.target_dataset d4d:target_dataset Target Dataset skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object... -Dataset.task_details d4d:task_details Task Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on other potential tasks the dataset could be used for. -" -Dataset.tasks d4d:tasks Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.timeframe_details d4d:timeframe_details Timeframe Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on the collection timeframe and relationship to data creation dates. -" -Dataset.title d4d:title Title skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped the official title of the element -Dataset.tool_accuracy d4d:tool_accuracy Tool Accuracy skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Known accuracy or performance metrics for the automated tools (if available). Include metric name an... -Dataset.tool_descriptions d4d:tool_descriptions Tool Descriptions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Descriptions of what each tool does in the annotation process and what types of annotations it produ... -Dataset.tools d4d:tools Tools skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "List of automated annotation tools with their versions. Format each entry as ""ToolName version"" (e.g..." -Dataset.unit d4d:unit Unit semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/).... -Dataset.update_details d4d:update_details Update Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on update plans, responsible parties, and communication methods. -" -Dataset.updates d4d:updates Updates skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.url d4d:url Url semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped -Dataset.usage_notes d4d:usage_notes Usage Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes or caveats about using the dataset for intended purposes. -Dataset.use_category d4d:use_category Use Category semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Category of intended use (e.g., research, clinical, educational, commercial, policy). -Dataset.use_repository d4d:use_repository Use Repository skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.used_software d4d:used_software Used Software semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What software was used as part of this dataset property? -Dataset.variable_name d4d:variable_name Variable Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The name or identifier of the variable as it appears in the data files. -Dataset.variables d4d:variables Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Metadata describing individual variables, fields, or columns in the dataset. -Dataset.version d4d:version Version skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.version_access d4d:version_access Version Access skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.version_details d4d:version_details Version Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on version support policies and obsolescence communication. -" -Dataset.versions_available d4d:versions_available Versions Available semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of available versions with metadata. -Dataset.warnings d4d:warnings Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:warnings'] d4d:warnings warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d -Dataset.was_derived_from d4d:was_derived_from Was Derived From skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped -Dataset.was_directly_observed d4d:was_directly_observed Was Directly Observed semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was directly observed -Dataset.was_inferred_derived d4d:was_inferred_derived Was Inferred Derived skos:closeMatch @graph[?@type='Dataset']['wasDerivedFrom'] prov:wasDerivedFrom wasDerivedFrom semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ http://www.w3.org/ns/prov# d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was inferred or derived from other data -Dataset.was_reported_by_subjects d4d:was_reported_by_subjects Was Reported By Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was reported directly by the subjects themselves -Dataset.was_validated_verified d4d:was_validated_verified Was Validated Verified skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether the data was validated or verified in any way -Dataset.why_missing d4d:why_missing Why Missing semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of why each piece of data is missing. -" -Dataset.why_not_representative d4d:why_not_representative Why Not Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Explanation of why the sample is not representative, if applicable. -" -Dataset.withdrawal_mechanism d4d:withdrawal_mechanism Withdrawal Mechanism semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-03-25 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended How can participants withdraw their consent? What procedures are in place for data deletion upon wit... +Dataset.access_details d4d:access_details Access Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Information on how to access or retrieve the raw source data. +" +Dataset.access_url d4d:access_url Access Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the raw data. +Dataset.access_urls d4d:access_urls Access Urls semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more URLs providing access to the distribution channel(s) or format(s). +Dataset.acquisition_details d4d:acquisition_details Acquisition Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how data was acquired for each instance, including instruments, protocols, ... +Dataset.acquisition_methods d4d:acquisition_methods Acquisition Methods skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Methods used to acquire or obtain dataset instances. List of InstanceAcquisition objects from the Co... +Dataset.addressing_gaps d4d:addressing_gaps Addressing Gaps skos:exactMatch @graph[?@type='Dataset']['d4d:addressing_gaps'] d4d:addressing_gaps addressing_gaps semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Research or practical gaps this dataset addresses. List of AddressingGap objects from the Motivation... +Dataset.affected_subsets d4d:affected_subsets Affected Subsets semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more specific subsets or features of the dataset affected by this bias (e.g., ""female partici..." +Dataset.affiliation d4d:affiliation Affiliation semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The organization(s) to which the person belongs in the context of this dataset. May vary across data... +Dataset.affiliations d4d:affiliations Affiliations semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Organizations with which the creator or team is affiliated. +Dataset.agreement_metric d4d:agreement_metric Agreement Metric semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Type of agreement metric used (Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, percentage agreem... +Dataset.analysis_method d4d:analysis_method Analysis Method semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Methodology used to assess annotation quality and resolve disagreements. +" +Dataset.annotation_analyses d4d:annotation_analyses Annotation Analyses skos:exactMatch @graph[?@type='Dataset']['d4d:annotation_analyses'] d4d:annotation_analyses annotation_analyses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Analysis of annotation quality and inter-annotator agreement. +Dataset.annotation_quality_details d4d:annotation_quality_details Annotation Quality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Additional details on annotation quality assessment and findings. +" +Dataset.annotations_per_item d4d:annotations_per_item Annotations Per Item semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Number of annotations collected per data item. Multiple annotations per item enable calculation of i... +Dataset.annotator_demographics d4d:annotator_demographics Annotator Demographics semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more demographic characteristics of the annotators, if available and relevant (e.g., geograph... +Dataset.anomalies d4d:anomalies Anomalies skos:exactMatch @graph[?@type='Dataset']['d4d:dataAnomalies'] d4d:dataAnomalies dataAnomalies semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Known data quality issues, errors, or irregularities in the dataset. List of DataAnomaly objects fro... +Dataset.anomaly_details d4d:anomaly_details Anomaly Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of errors, noise sources, or redundancies in the dataset, including their know... +Dataset.anonymization_method d4d:anonymization_method Anonymization Method semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text What methods were used to anonymize or de-identify participant data? Include technical details of pr... +Dataset.archival d4d:archival Archival semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether official archival versions of external resources are included in the dataset. +" +Dataset.assent_procedures d4d:assent_procedures Assent Procedures semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For research involving minors, what assent procedures were used? How was developmentally appropriate... +Dataset.at_risk_groups_included d4d:at_risk_groups_included At Risk Groups Included semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Are any at-risk populations included (e.g., children, pregnant women, prisoners, cognitively impaire... +Dataset.at_risk_populations d4d:at_risk_populations At Risk Populations skos:exactMatch @graph[?@type='Dataset']['d4d:atRiskPopulations'] d4d:atRiskPopulations atRiskPopulations semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about protections for at-risk populations (e.g., minors, pregnant women, prisoners) incl... +Dataset.bias_description d4d:bias_description Bias Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of how this bias manifests in the dataset, including affected populations, feat... +Dataset.bias_type d4d:bias_type Bias Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of bias identified, using standardized categories from the Artificial Intelligence Ontology... +Dataset.bytes d4d:bytes Bytes skos:exactMatch @graph[?@type='Dataset']['contentSize'] schema:contentSize contentSize semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Size of the data in bytes. +Dataset.categories d4d:categories Categories semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more permitted categories or values for a categorical variable. Each entry should describe a ... +Dataset.citation d4d:citation Citation skos:exactMatch @graph[?@type='Dataset']['citation'] schema:citation citation semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Recommended citation for this dataset in DataCite or BibTeX format. Provides a standard way to cite ... +Dataset.cleaning_details d4d:cleaning_details Cleaning Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of data cleaning procedures applied, including criteria for removing or correc... +Dataset.cleaning_strategies d4d:cleaning_strategies Cleaning Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:cleaning_strategies'] d4d:cleaning_strategies cleaning_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data cleaning and quality control procedures applied to the dataset. List of CleaningStrategy object... +Dataset.collection_consents d4d:collection_consents Collection Consents semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Consent obtained from individuals for data collection and use. List of CollectionConsent objects fro... +Dataset.collection_details d4d:collection_details Collection Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of whether data was collected directly from individuals or obtained via third ... +Dataset.collection_mechanisms d4d:collection_mechanisms Collection Mechanisms skos:exactMatch @graph[?@type='Dataset']['rai:dataCollection'] rai:dataCollection dataCollection semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Mechanisms, instruments, or tools used for data collection. List of CollectionMechanism objects from... +Dataset.collection_notifications d4d:collection_notifications Collection Notifications semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Notifications provided to individuals about data collection. List of CollectionNotification objects ... +Dataset.collection_timeframes d4d:collection_timeframes Collection Timeframes skos:exactMatch @graph[?@type='Dataset']['d4d:collection_timeframes'] d4d:collection_timeframes collection_timeframes semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time periods during which data was collected. List of CollectionTimeframe objects from the Collectio... +Dataset.collection_type d4d:collection_type Collection Type semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Type(s) of content in this file collection. A collection may have multiple types, for example a coll... +Dataset.collector_details d4d:collector_details Collector Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of who was involved in data collection (e.g., students, crowdworkers, contract... +Dataset.comment_prefix d4d:comment_prefix Comment Prefix semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Character(s) used to indicate comment lines (e.g., ""#"" for CSV comments)." +Dataset.compensation_amount d4d:compensation_amount Compensation Amount skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_amount'] d4d:compensation_amount compensation_amount semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "What was the amount or value of compensation provided? Include currency or equivalent value. +" +Dataset.compensation_provided d4d:compensation_provided Compensation Provided skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_provided'] d4d:compensation_provided compensation_provided semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Were participants compensated for their participation? +Dataset.compensation_rationale d4d:compensation_rationale Compensation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_rationale'] d4d:compensation_rationale compensation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What was the rationale for the compensation structure? How was the amount determined to be appropria... +Dataset.compensation_type d4d:compensation_type Compensation Type skos:exactMatch @graph[?@type='Dataset']['d4d:compensation_type'] d4d:compensation_type compensation_type semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d What type of compensation was provided (e.g., monetary payment, gift cards, course credit, other inc... +Dataset.compression d4d:compression Compression skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Compression format used, if any (e.g., gzip, bzip2, zip). +Dataset.confidential_elements d4d:confidential_elements Confidential Elements skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements'] d4d:confidential_elements confidential_elements semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidential or restricted information within the dataset that requires access controls. List of Con... +Dataset.confidential_elements_present d4d:confidential_elements_present Confidential Elements Present skos:exactMatch @graph[?@type='Dataset']['d4d:confidential_elements_present'] d4d:confidential_elements_present confidential_elements_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any confidential data elements are present. +Dataset.confidentiality_details d4d:confidentiality_details Confidentiality Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of which data elements are confidential, the basis for confidentiality (e.g., ... +Dataset.confidentiality_level d4d:confidentiality_level Confidentiality Level skos:exactMatch @graph[?@type='Dataset']['d4d:confidentiality_level'] d4d:confidentiality_level confidentiality_level semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Confidentiality classification of the dataset indicating level of access restrictions and sensitivit... +Dataset.conforms_to d4d:conforms_to Conforms To skos:exactMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped An established standard, specification, or schema to which the resource conforms. +Dataset.conforms_to_class d4d:conforms_to_class Conforms To Class skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The specific class or type within a schema to which the resource conforms. +Dataset.conforms_to_schema d4d:conforms_to_schema Conforms To Schema skos:narrowMatch @graph[?@type='Dataset']['conformsTo'] schema:conformsTo conformsTo semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The schema or data model to which the resource conforms. +Dataset.consent_details d4d:consent_details Consent Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how consent was requested (e.g., opt-in form, verbal agreement), provided, ... +Dataset.consent_documentation d4d:consent_documentation Consent Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "How is consent documented? Include references to consent forms or procedures used. +" +Dataset.consent_obtained d4d:consent_obtained Consent Obtained semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was informed consent obtained from all participants? +Dataset.consent_revocations d4d:consent_revocations Consent Revocations semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Mechanisms for individuals to revoke previously given consent. List of ConsentRevocation objects fro... +Dataset.consent_scope d4d:consent_scope Consent Scope semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "What specific uses did participants consent to? Are there limitations on data use based on consent? +" +Dataset.consent_type d4d:consent_type Consent Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What type of consent was obtained (e.g., written, verbal, electronic, implied through participation)... +Dataset.contact_person d4d:contact_person Contact Person skos:exactMatch @graph[?@type='Dataset']['d4d:contact_person'] d4d:contact_person contact_person semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for questions about ethical review. Provides structured contact information including... +Dataset.content_warnings d4d:content_warnings Content Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings'] d4d:content_warnings content_warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Content warnings for potentially harmful, offensive, or disturbing material in the dataset. List of ... +Dataset.content_warnings_present d4d:content_warnings_present Content Warnings Present skos:exactMatch @graph[?@type='Dataset']['d4d:content_warnings_present'] d4d:content_warnings_present content_warnings_present semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Indicates whether any content warnings are needed. +Dataset.contribution_url d4d:contribution_url Contribution Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL for contribution guidelines or process. +Dataset.counts d4d:counts Counts semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "How many instances are there in total (of each type, if appropriate)? +" +Dataset.created_by d4d:created_by Created By skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The person or organization primarily responsible for creating the resource. +Dataset.created_on d4d:created_on Created On skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The date and time when the resource was created. +Dataset.creators d4d:creators Creators skos:closeMatch @graph[?@type='Dataset']['author'] schema:author author semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations who created the dataset. List of Creator objects describing authorship,... +Dataset.credit_roles d4d:credit_roles Credit Roles skos:closeMatch @graph[?@type='Dataset']['creator'] schema:creator creator semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for the principal invest... +Dataset.data_annotation_platform d4d:data_annotation_platform Data Annotation Platform semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical T... +Dataset.data_annotation_protocol d4d:data_annotation_protocol Data Annotation Protocol skos:exactMatch @graph[?@type='Dataset']['d4d:data_annotation_protocol'] d4d:data_annotation_protocol data_annotation_protocol semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Annotation methodology, tasks, and protocols followed during labeling. Includes annotation guideline... +Dataset.data_collectors d4d:data_collectors Data Collectors skos:relatedMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations responsible for collecting the data. List of DataCollector objects from... +Dataset.data_linkage d4d:data_linkage Data Linkage semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Can this dataset be linked to other datasets in ways that might compromise participant privacy? +" +Dataset.data_protection_impacts d4d:data_protection_impacts Data Protection Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:data_protection_impacts'] d4d:data_protection_impacts data_protection_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data protection impact assessments (DPIAs) conducted for the dataset. List of DataProtectionImpact o... +Dataset.data_substrate d4d:data_substrate Data Substrate semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Type of data (e.g., raw text, images) from Bridge2AI standards. +" +Dataset.data_topic d4d:data_topic Data Topic semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "General topic of each instance (e.g., from Bridge2AI standards). +" +Dataset.data_type d4d:data_type Data Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The data type of the variable (e.g., integer, float, string, boolean, date, categorical). Use standa... +Dataset.data_use_permission d4d:data_use_permission Data Use Permission semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., g... +Dataset.deidentification_details d4d:deidentification_details Deidentification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on de-identification procedures and residual risks. +" +Dataset.delimiter d4d:delimiter Delimiter semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Field delimiter character (e.g., "","" for CSV, ""\t"" for TSV)." +Dataset.derivation d4d:derivation Derivation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of how this variable was derived or calculated from other variables, if applicable. +Dataset.description d4d:description Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A human-readable description for a thing. +Dataset.dialect d4d:dialect Dialect skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Specific format dialect or variation (e.g., CSV dialect, JSON-LD profile). +Dataset.direct_collection d4d:direct_collection Direct Collection semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Whether data was collected directly from individuals or via third parties. List of DirectCollection ... +Dataset.disagreement_patterns d4d:disagreement_patterns Disagreement Patterns semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Systematic patterns in annotator disagreements (e.g., by demographic group, annotation difficulty, t... +Dataset.discouraged_uses d4d:discouraged_uses Discouraged Uses skos:exactMatch @graph[?@type='Dataset']['d4d:discouraged_uses'] d4d:discouraged_uses discouraged_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Uses that are not recommended for this dataset due to limitations, risks, or ethical concerns. List ... +Dataset.discouragement_details d4d:discouragement_details Discouragement Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of tasks or applications for which the dataset is not recommended, with explan... +Dataset.distribution d4d:distribution Distribution semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The distribution of instances across identified subpopulations, including counts, percentages, or pr... +Dataset.distribution_dates d4d:distribution_dates Distribution Dates skos:exactMatch @graph[?@type='Dataset']['dateCreated'] schema:dateCreated dateCreated semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Dates when the dataset was or will be distributed or released. List of DistributionDate objects from... +Dataset.distribution_formats d4d:distribution_formats Distribution Formats skos:exactMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Formats in which the dataset is distributed or made available. List of DistributionFormat objects fr... +Dataset.doi d4d:doi Doi skos:exactMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification (e.g., '... +Dataset.double_quote d4d:double_quote Double Quote semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Whether quotes within quoted fields are escaped by doubling them. Expected values: ""true"" or ""false""..." +Dataset.download_url d4d:download_url Download Url skos:exactMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped URL from which the data can be downloaded. This is not the same as the landing page, which is a page... +Dataset.email d4d:email Email semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The email address of the person. Represents current/preferred contact information in the context of ... +Dataset.encoding d4d:encoding Encoding skos:closeMatch @graph[?@type='Dataset']['evi:formats'] evi:formats formats semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped The character encoding of the data. +Dataset.end_date d4d:end_date End Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended End date of data collection. +Dataset.errata d4d:errata Errata skos:exactMatch @graph[?@type='Dataset']['d4d:errata'] d4d:errata errata semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known errors or corrections to the dataset since publication. List of Erratum objects from the Maint... +Dataset.erratum_details d4d:erratum_details Erratum Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the error, its scope, the affected data or records, and the correction appl... +Dataset.erratum_url d4d:erratum_url Erratum Url skos:closeMatch @graph[?@type='Dataset']['accessURL'] dcat:accessURL accessURL semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://www.w3.org/ns/dcat# d4d-rocrate-comprehensive-v1 1.0 recommended URL or access point for the erratum. +Dataset.ethical_reviews d4d:ethical_reviews Ethical Reviews skos:exactMatch @graph[?@type='Dataset']['d4d:ethical_reviews'] d4d:ethical_reviews ethical_reviews semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Ethical reviews and institutional oversight for the dataset. List of EthicalReview objects from the ... +Dataset.ethics_review_board d4d:ethics_review_board Ethics Review Board semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "What ethics review board(s) reviewed this research? Include institution names and approval details. +" +Dataset.examples d4d:examples Examples semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of examples of known/previous uses of the dataset. +Dataset.existing_uses d4d:existing_uses Existing Uses skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Known existing uses of the dataset at the time of publication. List of ExistingUse objects from the ... +Dataset.extension_details d4d:extension_details Extension Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how third parties can contribute to the dataset, how contributions are vali... +Dataset.extension_mechanism d4d:extension_mechanism Extension Mechanism skos:closeMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Mechanisms for extending or contributing to the dataset. ExtensionMechanism object from the Maintena... +Dataset.external_resources d4d:external_resources External Resources semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text External resources referenced at the dataset level (e.g., related publications, repositories, docume... +Dataset.file_collections d4d:file_collections File Collections semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Collections of files within this dataset. Each collection represents a logical grouping of files wit... +Dataset.file_count d4d:file_count File Count semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Number of files in this collection. +Dataset.file_type d4d:file_type File Type semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Semantic type or purpose of this file (e.g., data_file, code_file, documentation_file, metadata_file... +Dataset.format d4d:format Format skos:exactMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file format, physical medium, or dimensions of a resource. This should be a file extension or MI... +Dataset.frequency d4d:frequency Frequency semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How often updates are planned (e.g., quarterly, annually). +Dataset.funders d4d:funders Funders skos:exactMatch @graph[?@type='Dataset']['funder'] schema:funder funder semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Funding mechanisms that supported dataset creation. List of FundingMechanism objects describing gran... +Dataset.future_guarantees d4d:future_guarantees Future Guarantees semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of any commitments that external resources will remain available and stable over time. +" +Dataset.future_use_impacts d4d:future_use_impacts Future Use Impacts skos:exactMatch @graph[?@type='Dataset']['d4d:future_use_impacts'] d4d:future_use_impacts future_use_impacts semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Anticipated impacts of future uses, including risks and benefits. List of FutureUseImpact objects fr... +Dataset.governance_committee_contact d4d:governance_committee_contact Governance Committee Contact skos:exactMatch @graph[?@type='Dataset']['d4d:governance_committee_contact'] d4d:governance_committee_contact governance_committee_contact semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Contact person for data governance committee. This person can answer questions about data governance... +Dataset.grant_number d4d:grant_number Grant Number semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The alphanumeric identifier for the grant. +Dataset.grantor d4d:grantor Grantor semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Name/identifier of the organization providing monetary or resource support. +Dataset.grants d4d:grants Grants semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Grant mechanisms supporting dataset creation. Multiple grants may fund a single dataset. +Dataset.guardian_consent d4d:guardian_consent Guardian Consent semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended For participants unable to provide their own consent, how was guardian or surrogate consent obtained... +Dataset.handling_strategy d4d:handling_strategy Handling Strategy skos:exactMatch @graph[?@type='Dataset']['d4d:handling_strategy'] d4d:handling_strategy handling_strategy semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d The primary strategy used to handle missing data (e.g., listwise deletion, mean imputation, multiple... +Dataset.hash d4d:hash Hash skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped Cryptographic hash value of the data for integrity verification (e.g., SHA-256: 'e3b0c44298fc1c149af... +Dataset.header d4d:header Header semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Whether the first row of the file contains column headers. Expected values: ""true"" or ""false"" (as st..." +Dataset.hipaa_compliant d4d:hipaa_compliant Hipaa Compliant semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates compliance with the Health Insurance Portability and Accountability Act (HIPAA). HIPAA app... +Dataset.human_subject_research d4d:human_subject_research Human Subject Research skos:exactMatch @graph[?@type='Dataset']['d4d:humanSubject'] d4d:humanSubject humanSubject semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about whether dataset involves human subjects research, including IRB approval, ethics r... +Dataset.id d4d:id Id skos:exactMatch @graph[?@type='Dataset']['ID'] rdf:ID ID semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ unknown d4d-rocrate-comprehensive-v1 1.0 mapped A unique identifier for a thing. +Dataset.identifiable_elements_present d4d:identifiable_elements_present Identifiable Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether data subjects can be identified. +Dataset.identification d4d:identification Identification semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped How subpopulations are identified and defined (e.g., by age groups, gender, geographic region, disea... +Dataset.identifiers_removed d4d:identifiers_removed Identifiers Removed skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended List of identifier types removed during de-identification (e.g., 'name', 'date of birth', 'SSN', 'em... +Dataset.impact_details d4d:impact_details Impact Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of potential future impacts or risks arising from the dataset's composition or... +Dataset.imputation_method d4d:imputation_method Imputation Method skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_method'] d4d:imputation_method imputation_method semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Specific imputation technique used (mean, median, mode, forward fill, backward fill, interpolation, ... +Dataset.imputation_protocols d4d:imputation_protocols Imputation Protocols skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_protocols'] d4d:imputation_protocols imputation_protocols semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data imputation protocols applied to handle missing values. List of ImputationProtocol objects from ... +Dataset.imputation_rationale d4d:imputation_rationale Imputation Rationale skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_rationale'] d4d:imputation_rationale imputation_rationale semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Justification for the imputation approach chosen, including assumptions made about missing data mech... +Dataset.imputation_validation d4d:imputation_validation Imputation Validation skos:exactMatch @graph[?@type='Dataset']['d4d:imputation_validation'] d4d:imputation_validation imputation_validation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Methods used to validate imputation quality (if any). +" +Dataset.imputed_fields d4d:imputed_fields Imputed Fields skos:exactMatch @graph[?@type='Dataset']['d4d:imputed_fields'] d4d:imputed_fields imputed_fields semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d "Fields or columns where imputation was applied. +" +Dataset.informed_consent d4d:informed_consent Informed Consent semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Details about informed consent procedures, including consent type, documentation, and withdrawal mec... +Dataset.instance_type d4d:instance_type Instance Type semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The type or types of instances in the dataset (e.g., ""movie"", ""user"", ""rating"", ""clinical record""). ..." +Dataset.instances d4d:instances Instances skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individual data instances or records in the dataset. List of Instance objects from the Composition m... +Dataset.intended_uses d4d:intended_uses Intended Uses skos:exactMatch @graph[?@type='Dataset']['d4d:intended_uses'] d4d:intended_uses intended_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicit intended and recommended uses for this dataset. Complements future_use_impacts by focusing ... +Dataset.inter_annotator_agreement d4d:inter_annotator_agreement Inter Annotator Agreement semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, Krippendorff's alpha, p... +Dataset.inter_annotator_agreement_score d4d:inter_annotator_agreement_score Inter Annotator Agreement Score semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Measured agreement between annotators (e.g., Cohen's kappa value, Fleiss' kappa, Krippendorff's alph... +Dataset.involves_human_subjects d4d:involves_human_subjects Involves Human Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does this dataset involve human subjects research? +Dataset.ip_restrictions d4d:ip_restrictions Ip Restrictions skos:exactMatch @graph[?@type='Dataset']['d4d:ip_restrictions'] d4d:ip_restrictions ip_restrictions semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Intellectual property restrictions on dataset use or redistribution. IPRestrictions object from the ... +Dataset.irb_approval d4d:irb_approval Irb Approval semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Was Institutional Review Board (IRB) approval obtained? Include approval number and institution if a... +Dataset.is_data_split d4d:is_data_split Is Data Split semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a split of the larger dataset, e.g., is it a set for model training, testing, or vali... +Dataset.is_deidentified d4d:is_deidentified Is Deidentified skos:exactMatch @graph[?@type='Dataset']['d4d:is_deidentified'] d4d:is_deidentified is_deidentified semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d De-identification status and procedures applied to the dataset. Deidentification object describing w... +Dataset.is_direct d4d:is_direct Is Direct semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Whether collection was direct from individuals. +Dataset.is_identifier d4d:is_identifier Is Identifier semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable serves as a unique identifier or key for records in the dataset. +Dataset.is_random d4d:is_random Is Random semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether the sample is random. +Dataset.is_representative d4d:is_representative Is Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Indicates whether the sample is representative of the larger set. +" +Dataset.is_sample d4d:is_sample Is Sample semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether it is a sample of a larger set. +Dataset.is_sensitive d4d:is_sensitive Is Sensitive semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether this variable contains sensitive information (e.g., personal data, protected healt... +Dataset.is_shared d4d:is_shared Is Shared semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Boolean indicating whether the dataset is distributed to parties external to the dataset-creating en... +Dataset.is_subpopulation d4d:is_subpopulation Is Subpopulation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Is this subset a subpopulation of the larger dataset, e.g., is it a set of data for a specific demog... +Dataset.is_tabular d4d:is_tabular Is Tabular skos:narrowMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Whether the dataset is in tabular format (rows and columns). True if the data is structured as a tab... +Dataset.issued d4d:issued Issued skos:exactMatch @graph[?@type='Dataset']['datePublished'] schema:datePublished datePublished semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Date of formal issuance or publication of the resource. +Dataset.keywords d4d:keywords Keywords skos:exactMatch @graph[?@type='Dataset']['keywords'] schema:keywords keywords semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Keywords or tags describing the resource for discovery and classification. +Dataset.known_biases d4d:known_biases Known Biases skos:exactMatch @graph[?@type='Dataset']['d4d:known_biases'] d4d:known_biases known_biases semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known biases present in the dataset that may affect fairness, representativeness, or model performan... +Dataset.known_limitations d4d:known_limitations Known Limitations skos:exactMatch @graph[?@type='Dataset']['d4d:known_limitations'] d4d:known_limitations known_limitations semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Known limitations of the dataset that may affect its use or interpretation. Distinct from biases (sy... +Dataset.label d4d:label Label semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Is there a label or target associated with each instance? +" +Dataset.label_description d4d:label_description Label Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "If labeled, what pattern or format do labels follow? +" +Dataset.labeling_details d4d:labeling_details Labeling Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the labeling or annotation procedures, including annotation guidelines, tas... +Dataset.labeling_strategies d4d:labeling_strategies Labeling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:labeling_strategies'] d4d:labeling_strategies labeling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Labeling or annotation methodologies applied to the data. List of LabelingStrategy objects from the ... +Dataset.language d4d:language Language skos:exactMatch @graph[?@type='Dataset']['inLanguage'] schema:inLanguage inLanguage semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Language in which the information is expressed. +Dataset.last_updated_on d4d:last_updated_on Last Updated On skos:exactMatch @graph[?@type='Dataset']['dateModified'] schema:dateModified dateModified semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The date and time when the resource was most recently modified or updated. +Dataset.latest_version_doi d4d:latest_version_doi Latest Version Doi semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI ... +Dataset.license d4d:license License skos:exactMatch @graph[?@type='Dataset']['license'] schema:license license semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped "The legal license under which the resource is made available (e.g., ""MIT"", ""CC-BY-4.0"")." +Dataset.license_and_use_terms d4d:license_and_use_terms License And Use Terms skos:exactMatch @graph[?@type='Dataset']['d4d:license_and_use_terms'] d4d:license_and_use_terms license_and_use_terms semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d License and usage terms governing dataset access and use. LicenseAndUseTerms object from the Data Go... +Dataset.license_terms d4d:license_terms License Terms semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of the dataset's license and terms of use, including links, costs, or usage constraints ... +Dataset.limitation_description d4d:limitation_description Limitation Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Detailed description of the limitation and its implications. +" +Dataset.limitation_type d4d:limitation_type Limitation Type skos:closeMatch @graph[?@type='Dataset']['temporalCoverage'] schema:temporalCoverage temporalCoverage semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Category of limitation (e.g., scope, coverage, temporal, methodological). +" +Dataset.machine_annotation_tools d4d:machine_annotation_tools Machine Annotation Tools skos:closeMatch @graph[?@type='Dataset']['rai:machineAnnotationTools'] rai:machineAnnotationTools machineAnnotationTools semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Automated annotation tools used in dataset creation. +Dataset.maintainer_details d4d:maintainer_details Maintainer Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the organization, team, or individual responsible for maintaining the datas... +Dataset.maintainers d4d:maintainers Maintainers skos:relatedMatch @graph[?@type='Dataset']['maintainer'] schema:maintainer maintainer semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Individuals or organizations responsible for maintaining the dataset. List of Maintainer objects fro... +Dataset.maximum_value d4d:maximum_value Maximum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The maximum value that the variable can take. Applicable to numeric variables. +Dataset.md5 d4d:md5 Md5 skos:exactMatch @graph[?@type='Dataset']['evi:md5'] evi:md5 md5 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped MD5 hash value of the data (128-bit cryptographic hash). +Dataset.measurement_technique d4d:measurement_technique Measurement Technique semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "The technique or method used to measure this variable. Examples: ""mass spectrometry"", ""self-report s..." +Dataset.mechanism_details d4d:mechanism_details Mechanism Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the specific mechanisms or procedures used to collect the data (e.g., hardw... +Dataset.media_type d4d:media_type Media Type skos:closeMatch @graph[?@type='Dataset']['encodingFormat'] schema:encodingFormat encodingFormat semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The media type of the data. This should be a MIME type. +Dataset.method d4d:method Method semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Method used for de-identification (e.g., HIPAA Safe Harbor). +Dataset.minimum_value d4d:minimum_value Minimum Value semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The minimum value that the variable can take. Applicable to numeric variables. +Dataset.missing d4d:missing Missing semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Description of the missing data fields or elements. +" +Dataset.missing_data_causes d4d:missing_data_causes Missing Data Causes semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Known or suspected causes of missing data (e.g., sensor failures, participant dropout, privacy const... +Dataset.missing_data_documentation d4d:missing_data_documentation Missing Data Documentation semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Documentation of missing data patterns and handling strategies. +Dataset.missing_data_patterns d4d:missing_data_patterns Missing Data Patterns semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Description of patterns in missing data (e.g., missing completely at random, missing at random, miss... +Dataset.missing_information d4d:missing_information Missing Information semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "References to one or more MissingInfo objects describing missing data. +" +Dataset.missing_value_code d4d:missing_value_code Missing Value Code skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "Code(s) used to represent missing values for this variable. Examples: ""NA"", ""-999"", ""null"", """". Mult..." +Dataset.mitigation_strategy d4d:mitigation_strategy Mitigation Strategy semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Steps taken or recommended to mitigate this bias. +" +Dataset.modified_by d4d:modified_by Modified By skos:closeMatch @graph[?@type='Dataset']['contributor'] schema:contributor contributor semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A person or organization that contributed to modifying or updating the resource. +Dataset.name d4d:name Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A human-readable name for a thing. +Dataset.notification_details d4d:notification_details Notification Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how individuals were notified about data collection, including the notifica... +Dataset.orcid d4d:orcid Orcid semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped ORCID (Open Researcher and Contributor ID) - a persistent digital identifier for researchers. Format... +Dataset.other_compliance d4d:other_compliance Other Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Other regulatory compliance frameworks applicable to this dataset (e.g., CCPA, PIPEDA, industry-spec... +Dataset.other_tasks d4d:other_tasks Other Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Additional tasks the dataset may support beyond its original intent. List of OtherTask objects from ... +Dataset.page d4d:page Page skos:exactMatch @graph[?@type='Dataset']['url'] schema:url url semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A landing page or web page providing access to or information about the resource. +Dataset.parent_datasets d4d:parent_datasets Parent Datasets skos:exactMatch @graph[?@type='Dataset']['isPartOf'] schema:isPartOf isPartOf semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Parent datasets that this dataset is part of or derived from. Enables hierarchical dataset compositi... +Dataset.participant_compensation d4d:participant_compensation Participant Compensation skos:exactMatch @graph[?@type='Dataset']['d4d:participant_compensation'] d4d:participant_compensation participant_compensation semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Information about compensation or incentives provided to human research participants. +Dataset.participant_privacy d4d:participant_privacy Participant Privacy skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about privacy protections and anonymization procedures for human research participants. +Dataset.path d4d:path Path skos:narrowMatch @graph[?@type='Dataset']['contentUrl'] schema:contentUrl contentUrl semapv:ManualMappingCuration 0.8 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The file path or URL where the content is located. +Dataset.precision d4d:precision Precision skos:closeMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The precision or number of decimal places for numeric variables. +Dataset.preprocessing_details d4d:preprocessing_details Preprocessing Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of preprocessing steps applied to the data, including tools used, parameters, ... +Dataset.preprocessing_strategies d4d:preprocessing_strategies Preprocessing Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:preprocessing_strategies'] d4d:preprocessing_strategies preprocessing_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Preprocessing steps applied to the raw data. List of PreprocessingStrategy objects from the Preproce... +Dataset.principal_investigator d4d:principal_investigator Principal Investigator semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped A key individual (Principal Investigator) responsible for or overseeing dataset creation. +Dataset.privacy_techniques d4d:privacy_techniques Privacy Techniques semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What privacy-preserving techniques were applied (e.g., differential privacy, k-anonymity, data maski... +Dataset.prohibited_uses d4d:prohibited_uses Prohibited Uses skos:exactMatch @graph[?@type='Dataset']['d4d:prohibited_uses'] d4d:prohibited_uses prohibited_uses semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Explicitly prohibited or forbidden uses for this dataset. Stronger than discouraged_uses - these are... +Dataset.prohibition_reason d4d:prohibition_reason Prohibition Reason skos:exactMatch @graph[?@type='Dataset']['d4d:prohibition_reason'] d4d:prohibition_reason prohibition_reason semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy ... +Dataset.publisher d4d:publisher Publisher skos:exactMatch @graph[?@type='Dataset']['publisher'] schema:publisher publisher semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The organization or entity responsible for making the resource available. +Dataset.purposes d4d:purposes Purposes skos:closeMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Purposes for which the dataset was created. List of Purpose objects from the Motivation module, each... +Dataset.quality_notes d4d:quality_notes Quality Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Notes about data quality, reliability, or known issues specific to this variable. +Dataset.quote_char d4d:quote_char Quote Char semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Character used for quoting fields (e.g., '""' for CSV)." +Dataset.raw_data_details d4d:raw_data_details Raw Data Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of raw data availability, access procedures, and any conditions or restriction... +Dataset.raw_data_format d4d:raw_data_format Raw Data Format semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON). +" +Dataset.raw_data_sources d4d:raw_data_sources Raw Data Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped List of raw data sources before preprocessing. Each RawDataSource object describes where the origina... +Dataset.raw_sources d4d:raw_sources Raw Sources skos:exactMatch @graph[?@type='Dataset']['rai:dataCollectionRawData'] rai:dataCollectionRawData dataCollectionRawData semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Raw, unprocessed source data before any preprocessing was applied. List of RawData objects from the ... +Dataset.recommended_mitigation d4d:recommended_mitigation Recommended Mitigation semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "Recommended approaches for users to address this limitation. +" +Dataset.regulatory_compliance d4d:regulatory_compliance Regulatory Compliance semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What regulatory frameworks govern this human subjects research (e.g., 45 CFR 46, HIPAA)? +" +Dataset.regulatory_restrictions d4d:regulatory_restrictions Regulatory Restrictions skos:exactMatch @graph[?@type='Dataset']['d4d:regulatory_restrictions'] d4d:regulatory_restrictions regulatory_restrictions semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Regulatory and export control restrictions applicable to the dataset. ExportControlRegulatoryRestric... +Dataset.reidentification_risk d4d:reidentification_risk Reidentification Risk semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "What is the assessed risk of re-identification? What measures were taken to minimize this risk? +" +Dataset.related_datasets d4d:related_datasets Related Datasets skos:exactMatch @graph[?@type='Dataset']['isRelatedTo'] schema:isRelatedTo isRelatedTo semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Related datasets with typed relationships (e.g., supplements, derives from, is version of). Use Data... +Dataset.relationship_details d4d:relationship_details Relationship Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of how relationships between instances are represented (e.g., graph edges, rat... +Dataset.relationship_type d4d:relationship_type Relationship Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended The type of relationship (e.g., derives_from, supplements, is_version_of). Uses DatasetRelationshipT... +Dataset.relationships d4d:relationships Relationships semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Explicit relationships between individual instances in the dataset. List of Relationships objects fr... +Dataset.release_dates d4d:release_dates Release Dates semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "One or more dates or timeframes for dataset release, in ISO 8601 format (e.g., ""2024-03-15"") or as a..." +Dataset.repository_details d4d:repository_details Repository Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the repository of known dataset uses, including how it is maintained and ho... +Dataset.repository_url d4d:repository_url Repository Url semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended URL to a repository of known dataset uses. +Dataset.representative_verification d4d:representative_verification Representative Verification skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more explanations of how representativeness was validated or verified (e.g., statistical test... +Dataset.resources d4d:resources Resources skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Sub-resources or component items. In DatasetCollection, contains Dataset objects. In Dataset, contai... +Dataset.response d4d:response Response semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Short explanation describing the primary purpose of creating the dataset. +Dataset.restrictions d4d:restrictions Restrictions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text One or more descriptions of restrictions or fees associated with accessing these external resources ... +Dataset.retention_details d4d:retention_details Retention Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of applicable retention limits, legal or ethical basis for those limits, and h... +Dataset.retention_limit d4d:retention_limit Retention Limit skos:exactMatch @graph[?@type='Dataset']['d4d:retention_limit'] d4d:retention_limit retention_limit semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Data retention policies and limits for the dataset. RetentionLimits object from the Maintenance modu... +Dataset.retention_period d4d:retention_period Retention Period skos:exactMatch @graph[?@type='Dataset']['d4d:retention_period'] d4d:retention_period retention_period semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Time period for data retention. +Dataset.review_details d4d:review_details Review Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the ethical review process, board decisions, outcomes, and any supporting d... +Dataset.reviewing_organization d4d:reviewing_organization Reviewing Organization skos:exactMatch @graph[?@type='Dataset']['d4d:reviewing_organization'] d4d:reviewing_organization reviewing_organization semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Organization that conducted the ethical review (e.g., Institutional Review Board, Ethics Committee, ... +Dataset.revocation_details d4d:revocation_details Revocation Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the mechanism provided for individuals to revoke consent (e.g., opt-out por... +Dataset.role d4d:role Role semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Role of the data collector (e.g., researcher, crowdworker). +Dataset.sampling_strategies d4d:sampling_strategies Sampling Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:sampling_strategies'] d4d:sampling_strategies sampling_strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d Strategies used to select data instances from a larger population. List of SamplingStrategy objects ... +Dataset.scope_impact d4d:scope_impact Scope Impact semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "How this limitation affects the scope or applicability of the dataset. +" +Dataset.sensitive_elements d4d:sensitive_elements Sensitive Elements skos:closeMatch @graph[?@type='Dataset']['rai:personalSensitiveInformation'] rai:personalSensitiveInformation personalSensitiveInformation semapv:ManualMappingCuration 0.9 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Sensitive data elements requiring special handling or access controls. List of SensitiveElement obje... +Dataset.sensitive_elements_present d4d:sensitive_elements_present Sensitive Elements Present semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Indicates whether sensitive data elements are present. +Dataset.sensitivity_details d4d:sensitivity_details Sensitivity Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "Details on sensitive data elements present and handling procedures. +" +Dataset.sha256 d4d:sha256 Sha256 skos:exactMatch @graph[?@type='Dataset']['evi:sha256'] evi:sha256 sha256 semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/EVI# d4d-rocrate-comprehensive-v1 1.0 mapped SHA-256 hash value of the data (256-bit cryptographic hash, recommended). +Dataset.source_data d4d:source_data Source Data semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text "One or more descriptions of the larger sets from which the sample was drawn, if applicable. +" +Dataset.source_description d4d:source_description Source Description semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Detailed description of where raw data comes from (e.g., sensors, databases, web APIs, manual collec... +Dataset.source_type d4d:source_type Source Type semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more types of raw source (e.g., sensor, database, user input, web scraping). +" +Dataset.special_populations d4d:special_populations Special Populations semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Does the research involve any special populations that require additional protections (e.g., minors,... +Dataset.special_protections d4d:special_protections Special Protections semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped What additional protections were implemented for at-risk populations? Include safeguards, modified p... +Dataset.split_details d4d:split_details Split Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the recommended data splits (e.g., 80/10/10 train/ validation/test), how th... +Dataset.splits d4d:splits Splits semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Recommended data splits for this dataset. List of Splits objects from the Composition module describ... +Dataset.start_date d4d:start_date Start Date skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended Start date of data collection. +Dataset.status d4d:status Status skos:exactMatch @graph[?@type='Dataset']['creativeWorkStatus'] schema:creativeWorkStatus creativeWorkStatus semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The status of the resource (e.g., draft, published, deprecated). +Dataset.strategies d4d:strategies Strategies skos:exactMatch @graph[?@type='Dataset']['d4d:strategies'] d4d:strategies strategies semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more sampling strategies used (e.g., deterministic, simple random, stratified, cluster, syste... +Dataset.subpopulation_elements_present d4d:subpopulation_elements_present Subpopulation Elements Present semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended Indicates whether any subpopulations are explicitly identified. +Dataset.subpopulations d4d:subpopulations Subpopulations skos:relatedMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Subpopulations represented within the dataset. List of Subpopulation objects from the Composition mo... +Dataset.subsets d4d:subsets Subsets skos:relatedMatch @graph[?@type='Dataset']['hasPart'] schema:hasPart hasPart semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Subsets or splits of this dataset. List of DataSubset objects from the Composition module, each repr... +Dataset.target_dataset d4d:target_dataset Target Dataset skos:closeMatch @graph[?@type='Dataset']['identifier'] schema:identifier identifier semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object... +Dataset.task_details d4d:task_details Task Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of other potential tasks the dataset could support, including any prerequisite... +Dataset.tasks d4d:tasks Tasks skos:exactMatch @graph[?@type='Dataset']['rai:dataUseCases'] rai:dataUseCases dataUseCases semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Tasks the dataset is intended to support. List of Task objects from the Motivation module describing... +Dataset.third_party_sharing d4d:third_party_sharing Third Party Sharing semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Third-party distribution policies for the dataset. List of ThirdPartySharing objects from the Distri... +Dataset.timeframe_details d4d:timeframe_details Timeframe Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of the data collection period and whether this timeframe matches the creation ... +Dataset.title d4d:title Title skos:exactMatch @graph[?@type='Dataset']['name'] schema:name name semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped The official title of the element. +Dataset.tool_accuracy d4d:tool_accuracy Tool Accuracy skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more known accuracy or performance metrics for the automated tools (if available). Include me... +Dataset.tool_descriptions d4d:tool_descriptions Tool Descriptions semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Descriptions of what each tool does in the annotation process and what types of annotations it produ... +Dataset.tools d4d:tools Tools skos:closeMatch @graph[?@type='Dataset']['name'] schema:name name semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended "List of automated annotation tools with their versions. Format each entry as ""ToolName version"" (e.g..." +Dataset.total_bytes d4d:total_bytes Total Bytes semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize. +Dataset.total_file_count d4d:total_file_count Total File Count semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total number of files across all file collections in this dataset. Can be aggregated from file_colle... +Dataset.total_size_bytes d4d:total_size_bytes Total Size Bytes semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped Total size of all files in bytes across all file collections. Can be aggregated from file_collection... +Dataset.unit d4d:unit Unit semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The unit of measurement for the variable, preferably using QUDT units (http://qudt.org/vocab/unit/).... +Dataset.update_details d4d:update_details Update Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of planned update types (e.g., corrections, additions, deletions), responsible... +Dataset.updates d4d:updates Updates skos:exactMatch @graph[?@type='Dataset']['rai:dataReleaseMaintenancePlan'] rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://mlcommons.org/croissant/RAI/ d4d-rocrate-comprehensive-v1 1.0 mapped Plans for future updates or versioning of the dataset. UpdatePlan object from the Maintenance module... +Dataset.url d4d:url Url semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text URL where the software can be found (e.g., homepage, repository, or documentation). +Dataset.usage_notes d4d:usage_notes Usage Notes semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text A note or caveat about using the dataset for its intended purposes. +Dataset.use_category d4d:use_category Use Category semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended One or more categories of intended use (e.g., research, clinical, educational, commercial, policy). +Dataset.use_repository d4d:use_repository Use Repository skos:relatedMatch @graph[?@type='Dataset']['relatedLink'] schema:relatedLink relatedLink semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Repositories or registries tracking how the dataset has been used. List of UseRepository objects fro... +Dataset.used_software d4d:used_software Used Software semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended What software was used as part of this dataset property? +Dataset.variable_name d4d:variable_name Variable Name semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped The name or identifier of the variable as it appears in the data files. +Dataset.variables d4d:variables Variables skos:exactMatch @graph[?@type='Dataset']['variableMeasured'] schema:variableMeasured variableMeasured semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Metadata describing individual variables, fields, or columns in the dataset. +Dataset.version d4d:version Version skos:exactMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped "The version identifier of the resource (e.g., ""1.0"", ""2.3.1"")." +Dataset.version_access d4d:version_access Version Access skos:relatedMatch @graph[?@type='Dataset']['version'] schema:version version semapv:ManualMappingCuration 0.7 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped Information about access to different versions of the dataset. VersionAccess object from the Mainten... +Dataset.version_details d4d:version_details Version Details semapv:UnmappableProperty semapv:FreeTextProperty 0.0 Free text/narrative field - no URI needed https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 free_text Free-text description of version support policies, how long older versions will be hosted, and how d... +Dataset.versions_available d4d:versions_available Versions Available semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended List of available versions with metadata. +Dataset.warnings d4d:warnings Warnings skos:exactMatch @graph[?@type='Dataset']['d4d:warnings'] d4d:warnings warnings semapv:ManualMappingCuration 1.0 Novel D4D concept - uses D4D namespace https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 novel_d4d One or more specific content warnings describing potentially offensive, insulting, threatening, or a... +Dataset.was_derived_from d4d:was_derived_from Was Derived From skos:exactMatch @graph[?@type='Dataset']['isBasedOn'] schema:isBasedOn isBasedOn semapv:ManualMappingCuration 1.0 Mapped via SKOS alignment https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 mapped A resource from which this resource was derived, in whole or in part. +Dataset.was_directly_observed d4d:was_directly_observed Was Directly Observed semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was directly observed by a researcher or instrument; false if it was obtained throu... +Dataset.was_inferred_derived d4d:was_inferred_derived Was Inferred Derived skos:closeMatch @graph[?@type='Dataset']['wasDerivedFrom'] prov:wasDerivedFrom wasDerivedFrom semapv:SuggestedMapping 0.5 Recommended mapping (confidence: medium) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ http://www.w3.org/ns/prov# d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was computationally inferred or derived from other data (e.g., model outputs, imput... +Dataset.was_reported_by_subjects d4d:was_reported_by_subjects Was Reported By Subjects semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data was self-reported directly by the subjects themselves (e.g., survey responses, ques... +Dataset.was_validated_verified d4d:was_validated_verified Was Validated Verified skos:closeMatch @graph[?@type='Dataset']['date'] schema:date date semapv:SuggestedMapping 0.7 Recommended mapping (confidence: high) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ https://schema.org/ d4d-rocrate-comprehensive-v1 1.0 recommended True if the data underwent a validation or verification process (e.g., expert review, cross-checking... +Dataset.why_missing d4d:why_missing Why Missing semapv:UnmappedProperty semapv:RequiresResearch 0.0 Unmapped - needs vocabulary research https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 unmapped "Explanation of why each piece of data is missing. +" +Dataset.why_not_representative d4d:why_not_representative Why Not Representative semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended "One or more explanations of why the sample is not representative of the larger set, if applicable. +" +Dataset.withdrawal_mechanism d4d:withdrawal_mechanism Withdrawal Mechanism semapv:UnmappedProperty semapv:SuggestedMapping 0.5 Recommended mapping (confidence: low) https://orcid.org/0000-0000-0000-0000 2026-04-09 https://w3id.org/bridge2ai/data-sheets-schema/ d4d-rocrate-comprehensive-v1 1.0 recommended How can participants withdraw their consent? What procedures are in place for data deletion upon wit... diff --git a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_comprehensive.tsv b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_comprehensive.tsv index bde31d5d..353dd4b0 100644 --- a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_comprehensive.tsv +++ b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_comprehensive.tsv @@ -1,286 +1,300 @@ # Comprehensive URI-level SSSOM - ALL D4D Attributes # Shows current and recommended slot_uri for every attribute -# Date: 2026-03-25T22:41:09.320023 -# Total attributes: 270 +# Date: 2026-04-09T10:17:21.922299 +# Total attributes: 284 # # Status breakdown: -# free_text: 54 -# mapped: 68 -# novel_d4d: 39 +# free_text: 55 +# mapped: 63 +# novel_d4d: 45 # recommended: 69 -# unmapped: 40 +# unmapped: 52 # -# Current slot_uri coverage: 31/270 (11.5%) -# Attributes needing slot_uri: 108/270 (40.0%) +# Current slot_uri coverage: 31/284 (10.9%) +# Attributes needing slot_uri: 114/284 (40.1%) # d4d_slot_name d4d_slot_uri_current subject_source predicate_id d4d_slot_uri_recommended object_id object_label object_source confidence mapping_justification comment mapping_status needs_slot_uri vocab_crosswalk author_id mapping_date mapping_set_id mapping_set_version -access_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -access_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -access_urls semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -acquisition_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -acquisition_methods skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -addressing_gaps skos:exactMatch d4d:addressing_gaps d4d:addressing_gaps addressing_gaps https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -affected_subsets semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -affiliation semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -affiliations semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -agreement_metric semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -analysis_method semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -annotation_analyses skos:exactMatch d4d:annotation_analyses d4d:annotation_analyses annotation_analyses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -annotation_quality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -annotations_per_item semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -annotator_demographics semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -anomalies skos:exactMatch d4d:dataAnomalies d4d:dataAnomalies dataAnomalies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -anomaly_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -anonymization_method semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -archival semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -assent_procedures semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -at_risk_groups_included semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -at_risk_populations skos:exactMatch d4d:atRiskPopulations d4d:atRiskPopulations atRiskPopulations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -bias_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -bias_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -bytes dcat:byteSize https://www.w3.org/ns/dcat# skos:exactMatch schema:contentSize schema:contentSize contentSize https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -categories semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -citation skos:exactMatch schema:citation schema:citation citation https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -cleaning_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -cleaning_strategies skos:exactMatch d4d:cleaning_strategies d4d:cleaning_strategies cleaning_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -collection_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -collection_mechanisms skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -collection_timeframes skos:exactMatch rai:dataCollectionTimeframe rai:dataCollectionTimeframe dataCollectionTimeframe http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -collector_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -comment_prefix semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -compensation_amount skos:exactMatch d4d:compensation_amount d4d:compensation_amount compensation_amount https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -compensation_provided skos:exactMatch d4d:compensation_provided d4d:compensation_provided compensation_provided https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -compensation_rationale skos:exactMatch d4d:compensation_rationale d4d:compensation_rationale compensation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -compensation_type skos:exactMatch d4d:compensation_type d4d:compensation_type compensation_type https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -compression dcat:compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -confidential_elements skos:exactMatch d4d:confidential_elements d4d:confidential_elements confidential_elements https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -confidential_elements_present skos:exactMatch d4d:confidential_elements_present d4d:confidential_elements_present confidential_elements_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -confidentiality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -confidentiality_level skos:exactMatch d4d:confidentiality_level d4d:confidentiality_level confidentiality_level https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -conforms_to dcterms:conformsTo http://purl.org/dc/terms/ skos:exactMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -conforms_to_class dcterms:conformsTo http://purl.org/dc/terms/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -conforms_to_schema dcterms:conformsTo http://purl.org/dc/terms/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -consent_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -consent_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -consent_obtained semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -consent_scope semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -consent_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -contact_person skos:exactMatch d4d:contact_person d4d:contact_person contact_person https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -content_warnings skos:exactMatch d4d:content_warnings d4d:content_warnings content_warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -content_warnings_present skos:exactMatch d4d:content_warnings_present d4d:content_warnings_present content_warnings_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -contribution_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -counts semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -created_by dcterms:creator http://purl.org/dc/terms/ skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -created_on dcterms:created http://purl.org/dc/terms/ skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -creators skos:closeMatch schema:author schema:author author https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -credit_roles skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_annotation_platform semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_annotation_protocol skos:exactMatch d4d:data_annotation_protocol d4d:data_annotation_protocol data_annotation_protocol https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_collectors skos:relatedMatch schema:contributor schema:contributor contributor https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_linkage semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_protection_impacts skos:exactMatch d4d:data_protection_impacts d4d:data_protection_impacts data_protection_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_substrate semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_topic semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -data_use_permission semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -deidentification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -delimiter semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -derivation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -dialect schema:encodingFormat https://schema.org/ skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -disagreement_patterns semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -discouraged_uses skos:exactMatch rai:prohibitedUses rai:prohibitedUses prohibitedUses http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -discouragement_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -distribution semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -distribution_dates skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -distribution_formats skos:exactMatch evi:formats evi:formats formats https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -doi dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch schema:identifier schema:identifier identifier https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -double_quote semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -download_url dcat:downloadURL https://www.w3.org/ns/dcat# skos:exactMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -email semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -encoding dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -end_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -errata skos:exactMatch d4d:errata d4d:errata errata https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -erratum_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -erratum_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -ethical_reviews skos:exactMatch d4d:ethical_reviews d4d:ethical_reviews ethical_reviews https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -ethics_review_board semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -examples semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -existing_uses skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -extension_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -extension_mechanism skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -external_resources dcterms:references http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -format dcterms:format http://purl.org/dc/terms/ skos:exactMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -frequency semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -funders skos:exactMatch schema:funder schema:funder funder https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -future_guarantees semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -future_use_impacts skos:exactMatch d4d:future_use_impacts d4d:future_use_impacts future_use_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -governance_committee_contact skos:exactMatch d4d:governance_committee_contact d4d:governance_committee_contact governance_committee_contact https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -grant_number semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -grantor semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -grants semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -guardian_consent semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -handling_strategy skos:exactMatch d4d:handling_strategy d4d:handling_strategy handling_strategy https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -hash dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -header semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -hipaa_compliant semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -human_subject_research skos:exactMatch d4d:humanSubject d4d:humanSubject humanSubject https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -id skos:exactMatch rdf:ID rdf:ID ID unknown 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -identifiable_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -identification semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -identifiers_removed skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -impact_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -imputation_method skos:exactMatch d4d:imputation_method d4d:imputation_method imputation_method https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -imputation_protocols skos:exactMatch d4d:imputation_protocols d4d:imputation_protocols imputation_protocols https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -imputation_rationale skos:exactMatch d4d:imputation_rationale d4d:imputation_rationale imputation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -imputation_validation skos:exactMatch d4d:imputation_validation d4d:imputation_validation imputation_validation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -imputed_fields skos:exactMatch d4d:imputed_fields d4d:imputed_fields imputed_fields https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -informed_consent semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -instance_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -instances skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -intended_uses skos:exactMatch d4d:intended_uses d4d:intended_uses intended_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -inter_annotator_agreement semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -inter_annotator_agreement_score semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -involves_human_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -ip_restrictions skos:closeMatch schema:conditionsOfAccess schema:conditionsOfAccess conditionsOfAccess https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -irb_approval semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_data_split semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_deidentified skos:exactMatch d4d:is_deidentified d4d:is_deidentified is_deidentified https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_direct semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_identifier semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_random semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_sample semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_sensitive semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_shared semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_subpopulation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -is_tabular skos:narrowMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -issued dcterms:issued http://purl.org/dc/terms/ skos:exactMatch schema:datePublished schema:datePublished datePublished https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -keywords dcat:keyword https://www.w3.org/ns/dcat# skos:exactMatch schema:keywords schema:keywords keywords https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -known_biases skos:exactMatch d4d:known_biases d4d:known_biases known_biases https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -known_limitations skos:exactMatch d4d:known_limitations d4d:known_limitations known_limitations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -label semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -label_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -labeling_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -labeling_strategies skos:exactMatch d4d:labeling_strategies d4d:labeling_strategies labeling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -language dcterms:language http://purl.org/dc/terms/ skos:exactMatch schema:inLanguage schema:inLanguage inLanguage https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -last_updated_on dcterms:modified http://purl.org/dc/terms/ skos:exactMatch schema:dateModified schema:dateModified dateModified https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -latest_version_doi semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -license dcterms:license http://purl.org/dc/terms/ skos:exactMatch schema:license schema:license license https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -license_and_use_terms skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -license_terms semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -limitation_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -limitation_type skos:closeMatch schema:temporalCoverage schema:temporalCoverage temporalCoverage https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -machine_annotation_tools skos:closeMatch rai:machineAnnotationTools rai:machineAnnotationTools machineAnnotationTools http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -maintainer_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -maintainers skos:relatedMatch schema:maintainer schema:maintainer maintainer https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -maximum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -md5 dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -measurement_technique semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -mechanism_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -media_type dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -method semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -minimum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing_data_causes semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing_data_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing_data_patterns semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing_information semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -missing_value_code skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -mitigation_strategy semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -modified_by dcterms:contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor schema:contributor contributor https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -notification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -orcid semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -other_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -other_tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -page dcat:landingPage https://www.w3.org/ns/dcat# skos:exactMatch schema:url schema:url url https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -parent_datasets skos:exactMatch schema:isPartOf schema:isPartOf isPartOf https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -participant_compensation skos:exactMatch d4d:participant_compensation d4d:participant_compensation participant_compensation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -participant_privacy skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -path schema:contentUrl https://schema.org/ skos:narrowMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -precision skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -preprocessing_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -preprocessing_strategies skos:exactMatch d4d:preprocessing_strategies d4d:preprocessing_strategies preprocessing_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -principal_investigator semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -privacy_techniques semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -prohibited_uses skos:exactMatch d4d:prohibited_uses d4d:prohibited_uses prohibited_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -prohibition_reason skos:exactMatch d4d:prohibition_reason d4d:prohibition_reason prohibition_reason https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -publisher dcterms:publisher http://purl.org/dc/terms/ skos:exactMatch schema:publisher schema:publisher publisher https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -purposes skos:closeMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -quality_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -quote_char semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -raw_data_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -raw_data_format semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -raw_data_sources semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -raw_sources skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -recommended_mitigation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -regulatory_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -regulatory_restrictions skos:closeMatch schema:conditionsOfAccess schema:conditionsOfAccess conditionsOfAccess https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -reidentification_risk semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -related_datasets skos:exactMatch schema:isRelatedTo schema:isRelatedTo isRelatedTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -relationship_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -relationship_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -release_dates semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -repository_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -repository_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -representative_verification skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -resources schema:hasPart https://schema.org/ skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -response semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -restrictions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -retention_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -retention_limit skos:exactMatch d4d:retention_limit d4d:retention_limit retention_limit https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -retention_period skos:exactMatch d4d:retention_period d4d:retention_period retention_period https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -review_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -reviewing_organization skos:exactMatch d4d:reviewing_organization d4d:reviewing_organization reviewing_organization https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -revocation_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -role semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -sampling_strategies skos:exactMatch d4d:sampling_strategies d4d:sampling_strategies sampling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -scope_impact semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -sensitive_elements skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -sensitive_elements_present semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -sensitivity_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -sha256 dcterms:identifier http://purl.org/dc/terms/ skos:exactMatch evi:sha256 evi:sha256 sha256 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -source_data semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -source_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -source_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -special_populations semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -special_protections semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -split_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -start_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -status dcterms:type http://purl.org/dc/terms/ skos:exactMatch schema:creativeWorkStatus schema:creativeWorkStatus creativeWorkStatus https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -strategies semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -subpopulation_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -subpopulations skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -subsets skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -target_dataset skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -task_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -timeframe_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -title dcterms:title http://purl.org/dc/terms/ skos:exactMatch schema:name schema:name name https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -tool_accuracy skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -tool_descriptions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -tools skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -unit semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -update_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -updates skos:exactMatch rai:dataReleaseMaintenancePlan rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -url semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -usage_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -use_category semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -use_repository skos:relatedMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -used_software semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -variable_name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -variables skos:exactMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -version dcterms:hasVersion http://purl.org/dc/terms/ skos:exactMatch schema:version schema:version version https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -version_access skos:relatedMatch schema:version schema:version version https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -version_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -versions_available semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -warnings skos:exactMatch d4d:warnings d4d:warnings warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -was_derived_from prov:wasDerivedFrom http://www.w3.org/ns/prov# skos:exactMatch schema:isBasedOn schema:isBasedOn isBasedOn https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -was_directly_observed semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -was_inferred_derived skos:closeMatch prov:wasDerivedFrom prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -was_reported_by_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -was_validated_verified skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -why_missing semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -why_not_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 -withdrawal_mechanism semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-comprehensive-v1 1.0 +access_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +access_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +access_urls semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +acquisition_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +acquisition_methods skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +addressing_gaps skos:exactMatch d4d:addressing_gaps d4d:addressing_gaps addressing_gaps https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affected_subsets semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affiliation semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +affiliations semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +agreement_metric semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +analysis_method semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotation_analyses skos:exactMatch d4d:annotation_analyses d4d:annotation_analyses annotation_analyses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotation_quality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotations_per_item semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +annotator_demographics semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anomalies skos:exactMatch d4d:dataAnomalies d4d:dataAnomalies dataAnomalies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anomaly_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +anonymization_method semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +archival semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +assent_procedures semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +at_risk_groups_included semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +at_risk_populations skos:exactMatch d4d:atRiskPopulations d4d:atRiskPopulations atRiskPopulations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bias_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bias_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +bytes dcat:byteSize https://www.w3.org/ns/dcat# skos:exactMatch schema:contentSize schema:contentSize contentSize https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +categories semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +citation skos:exactMatch schema:citation schema:citation citation https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +cleaning_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +cleaning_strategies skos:exactMatch d4d:cleaning_strategies d4d:cleaning_strategies cleaning_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_consents semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_mechanisms skos:exactMatch rai:dataCollection rai:dataCollection dataCollection http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_notifications semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_timeframes skos:exactMatch d4d:collection_timeframes d4d:collection_timeframes collection_timeframes https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collection_type semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +collector_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +comment_prefix semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_amount skos:exactMatch d4d:compensation_amount d4d:compensation_amount compensation_amount https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_provided skos:exactMatch d4d:compensation_provided d4d:compensation_provided compensation_provided https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_rationale skos:exactMatch d4d:compensation_rationale d4d:compensation_rationale compensation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compensation_type skos:exactMatch d4d:compensation_type d4d:compensation_type compensation_type https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +compression dcat:compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidential_elements skos:exactMatch d4d:confidential_elements d4d:confidential_elements confidential_elements https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidential_elements_present skos:exactMatch d4d:confidential_elements_present d4d:confidential_elements_present confidential_elements_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidentiality_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +confidentiality_level skos:exactMatch d4d:confidentiality_level d4d:confidentiality_level confidentiality_level https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to dcterms:conformsTo http://purl.org/dc/terms/ skos:exactMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to_class d4d:conformsToClass https://w3id.org/bridge2ai/data-sheets-schema/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +conforms_to_schema d4d:conformsToSchema https://w3id.org/bridge2ai/data-sheets-schema/ skos:narrowMatch schema:conformsTo schema:conformsTo conformsTo https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_obtained semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_revocations semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_scope semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +consent_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +contact_person skos:exactMatch d4d:contact_person d4d:contact_person contact_person https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +content_warnings skos:exactMatch d4d:content_warnings d4d:content_warnings content_warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +content_warnings_present skos:exactMatch d4d:content_warnings_present d4d:content_warnings_present content_warnings_present https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +contribution_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +counts semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +created_by dcterms:creator http://purl.org/dc/terms/ skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +created_on dcterms:created http://purl.org/dc/terms/ skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +creators skos:closeMatch schema:author schema:author author https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +credit_roles skos:closeMatch schema:creator schema:creator creator https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_annotation_platform semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_annotation_protocol skos:exactMatch d4d:data_annotation_protocol d4d:data_annotation_protocol data_annotation_protocol https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_collectors skos:relatedMatch schema:contributor schema:contributor contributor https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_linkage semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_protection_impacts skos:exactMatch d4d:data_protection_impacts d4d:data_protection_impacts data_protection_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_substrate semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_topic semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +data_use_permission semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +deidentification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +delimiter semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +derivation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +dialect schema:encodingFormat https://schema.org/ skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +direct_collection semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +disagreement_patterns semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +discouraged_uses skos:exactMatch d4d:discouraged_uses d4d:discouraged_uses discouraged_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +discouragement_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution_dates skos:exactMatch schema:dateCreated schema:dateCreated dateCreated https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +distribution_formats skos:exactMatch evi:formats evi:formats formats https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +doi d4d:doiIdentifier https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch schema:identifier schema:identifier identifier https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +double_quote semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +download_url dcat:downloadURL https://www.w3.org/ns/dcat# skos:exactMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +email semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +encoding d4d:characterEncoding https://w3id.org/bridge2ai/data-sheets-schema/ skos:closeMatch evi:formats evi:formats formats https://w3id.org/EVI# 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +end_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +errata skos:exactMatch d4d:errata d4d:errata errata https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +erratum_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +erratum_url skos:closeMatch dcat:accessURL dcat:accessURL accessURL https://www.w3.org/ns/dcat# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ethical_reviews skos:exactMatch d4d:ethical_reviews d4d:ethical_reviews ethical_reviews https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ethics_review_board semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +examples semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +existing_uses skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +extension_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +extension_mechanism skos:closeMatch schema:license schema:license license https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +external_resources dcterms:references http://purl.org/dc/terms/ semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_collections semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_count semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +file_type semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +format dcterms:format http://purl.org/dc/terms/ skos:exactMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +frequency semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +funders skos:exactMatch schema:funder schema:funder funder https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +future_guarantees semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +future_use_impacts skos:exactMatch d4d:future_use_impacts d4d:future_use_impacts future_use_impacts https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +governance_committee_contact skos:exactMatch d4d:governance_committee_contact d4d:governance_committee_contact governance_committee_contact https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grant_number semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grantor semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +grants semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +guardian_consent semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +handling_strategy skos:exactMatch d4d:handling_strategy d4d:handling_strategy handling_strategy https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +hash d4d:hashValue https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +header semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +hipaa_compliant semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +human_subject_research skos:exactMatch d4d:humanSubject d4d:humanSubject humanSubject https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +id skos:exactMatch rdf:ID rdf:ID ID unknown 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identifiable_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identification semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +identifiers_removed skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +impact_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_method skos:exactMatch d4d:imputation_method d4d:imputation_method imputation_method https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_protocols skos:exactMatch d4d:imputation_protocols d4d:imputation_protocols imputation_protocols https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_rationale skos:exactMatch d4d:imputation_rationale d4d:imputation_rationale imputation_rationale https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputation_validation skos:exactMatch d4d:imputation_validation d4d:imputation_validation imputation_validation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +imputed_fields skos:exactMatch d4d:imputed_fields d4d:imputed_fields imputed_fields https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +informed_consent semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +instance_type semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +instances skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +intended_uses skos:exactMatch d4d:intended_uses d4d:intended_uses intended_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +inter_annotator_agreement semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +inter_annotator_agreement_score semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +involves_human_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +ip_restrictions skos:exactMatch d4d:ip_restrictions d4d:ip_restrictions ip_restrictions https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +irb_approval semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_data_split semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_deidentified skos:exactMatch d4d:is_deidentified d4d:is_deidentified is_deidentified https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_direct semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_identifier semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_random semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_sample semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_sensitive semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_shared semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_subpopulation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +is_tabular skos:narrowMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +issued dcterms:issued http://purl.org/dc/terms/ skos:exactMatch schema:datePublished schema:datePublished datePublished https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +keywords dcat:keyword https://www.w3.org/ns/dcat# skos:exactMatch schema:keywords schema:keywords keywords https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +known_biases skos:exactMatch d4d:known_biases d4d:known_biases known_biases https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +known_limitations skos:exactMatch d4d:known_limitations d4d:known_limitations known_limitations https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +label semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +label_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +labeling_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +labeling_strategies skos:exactMatch d4d:labeling_strategies d4d:labeling_strategies labeling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +language dcterms:language http://purl.org/dc/terms/ skos:exactMatch schema:inLanguage schema:inLanguage inLanguage https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +last_updated_on dcterms:modified http://purl.org/dc/terms/ skos:exactMatch schema:dateModified schema:dateModified dateModified https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +latest_version_doi semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license dcterms:license http://purl.org/dc/terms/ skos:exactMatch schema:license schema:license license https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license_and_use_terms skos:exactMatch d4d:license_and_use_terms d4d:license_and_use_terms license_and_use_terms https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +license_terms semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +limitation_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +limitation_type skos:closeMatch schema:temporalCoverage schema:temporalCoverage temporalCoverage https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +machine_annotation_tools skos:closeMatch rai:machineAnnotationTools rai:machineAnnotationTools machineAnnotationTools http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maintainer_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maintainers skos:relatedMatch schema:maintainer schema:maintainer maintainer https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +maximum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +md5 d4d:md5Checksum https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch evi:md5 evi:md5 md5 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +measurement_technique semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +mechanism_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +media_type dcat:mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat schema:encodingFormat encodingFormat https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +method semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +minimum_value semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_causes semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_documentation semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_data_patterns semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_information semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +missing_value_code skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +mitigation_strategy semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +modified_by dcterms:contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor schema:contributor contributor https://schema.org/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +notification_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +orcid semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +other_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +other_tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +page dcat:landingPage https://www.w3.org/ns/dcat# skos:exactMatch schema:url schema:url url https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +parent_datasets skos:exactMatch schema:isPartOf schema:isPartOf isPartOf https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +participant_compensation skos:exactMatch d4d:participant_compensation d4d:participant_compensation participant_compensation https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +participant_privacy skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +path schema:contentUrl https://schema.org/ skos:narrowMatch schema:contentUrl schema:contentUrl contentUrl https://schema.org/ 0.8 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +precision skos:closeMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +preprocessing_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +preprocessing_strategies skos:exactMatch d4d:preprocessing_strategies d4d:preprocessing_strategies preprocessing_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +principal_investigator semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +privacy_techniques semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +prohibited_uses skos:exactMatch d4d:prohibited_uses d4d:prohibited_uses prohibited_uses https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +prohibition_reason skos:exactMatch d4d:prohibition_reason d4d:prohibition_reason prohibition_reason https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +publisher dcterms:publisher http://purl.org/dc/terms/ skos:exactMatch schema:publisher schema:publisher publisher https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +purposes skos:closeMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +quality_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +quote_char semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_format semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_data_sources skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +raw_sources skos:exactMatch rai:dataCollectionRawData rai:dataCollectionRawData dataCollectionRawData http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +recommended_mitigation semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +regulatory_compliance semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +regulatory_restrictions skos:exactMatch d4d:regulatory_restrictions d4d:regulatory_restrictions regulatory_restrictions https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +reidentification_risk semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +related_datasets skos:exactMatch schema:isRelatedTo schema:isRelatedTo isRelatedTo https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationship_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationship_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +relationships semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +release_dates semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +repository_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +repository_url semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +representative_verification skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +resources schema:hasPart https://schema.org/ skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +response semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +restrictions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_limit skos:exactMatch d4d:retention_limit d4d:retention_limit retention_limit https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +retention_period skos:exactMatch d4d:retention_period d4d:retention_period retention_period https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +review_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +reviewing_organization skos:exactMatch d4d:reviewing_organization d4d:reviewing_organization reviewing_organization https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +revocation_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +role semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sampling_strategies skos:exactMatch d4d:sampling_strategies d4d:sampling_strategies sampling_strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +scope_impact semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitive_elements skos:closeMatch rai:personalSensitiveInformation rai:personalSensitiveInformation personalSensitiveInformation http://mlcommons.org/croissant/RAI/ 0.9 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitive_elements_present semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sensitivity_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +sha256 schema:sha256 https://schema.org/ skos:exactMatch evi:sha256 evi:sha256 sha256 https://w3id.org/EVI# 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_data semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_description semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +source_type semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +special_populations semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +special_protections semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +split_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +splits semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +start_date skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +status d4d:publicationStatus https://w3id.org/bridge2ai/data-sheets-schema/ skos:exactMatch schema:creativeWorkStatus schema:creativeWorkStatus creativeWorkStatus https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +strategies skos:exactMatch d4d:strategies d4d:strategies strategies https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subpopulation_elements_present semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subpopulations skos:relatedMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +subsets skos:relatedMatch schema:hasPart schema:hasPart hasPart https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +target_dataset skos:closeMatch schema:identifier schema:identifier identifier https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +task_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tasks skos:exactMatch rai:dataUseCases rai:dataUseCases dataUseCases http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +third_party_sharing semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +timeframe_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +title dcterms:title http://purl.org/dc/terms/ skos:exactMatch schema:name schema:name name https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tool_accuracy skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tool_descriptions semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +tools skos:closeMatch schema:name schema:name name https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_bytes semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_file_count semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +total_size_bytes semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +unit semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +update_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +updates skos:exactMatch rai:dataReleaseMaintenancePlan rai:dataReleaseMaintenancePlan dataReleaseMaintenancePlan http://mlcommons.org/croissant/RAI/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +url semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +usage_notes semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +use_category semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +use_repository skos:relatedMatch schema:relatedLink schema:relatedLink relatedLink https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +used_software semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +variable_name semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +variables skos:exactMatch schema:variableMeasured schema:variableMeasured variableMeasured https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version schema:version https://schema.org/ skos:exactMatch schema:version schema:version version https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no false https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version_access skos:relatedMatch schema:version schema:version version https://schema.org/ 0.7 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +version_details semapv:UnmappableProperty 0.0 semapv:FreeTextProperty Free text/narrative field - no slot_uri needed free_text no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +versions_available semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +warnings skos:exactMatch d4d:warnings d4d:warnings warnings https://w3id.org/bridge2ai/data-sheets-schema/ 1.0 semapv:ManualMappingCuration Novel D4D concept - should use d4d: namespace novel_d4d yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_derived_from prov:wasDerivedFrom http://www.w3.org/ns/prov# skos:exactMatch schema:isBasedOn schema:isBasedOn isBasedOn https://schema.org/ 1.0 semapv:ManualMappingCuration Has SKOS alignment to RO-Crate vocabulary mapped no true https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_directly_observed semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_inferred_derived skos:closeMatch prov:wasDerivedFrom prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: medium) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_reported_by_subjects semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +was_validated_verified skos:closeMatch schema:date schema:date date https://schema.org/ 0.7 semapv:SuggestedMapping Recommended slot_uri (confidence: high) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +why_missing semapv:UnmappedProperty 0.0 semapv:RequiresResearch Unmapped - needs vocabulary research for slot_uri unmapped no N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +why_not_representative semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 +withdrawal_mechanism semapv:UnmappedProperty 0.5 semapv:SuggestedMapping Recommended slot_uri (confidence: low) recommended yes N/A https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-comprehensive-v1 1.0 diff --git a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv index fabebfd0..3596eaf1 100644 --- a/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv +++ b/src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv @@ -1,6 +1,6 @@ # SSSOM URI-level Mapping (D4D slot URIs ↔ RO-Crate property URIs) # Generated from D4D LinkML schema slot_uri definitions -# Date: 2026-03-25T22:41:06.728180 +# Date: 2026-04-09T10:17:17.452134 # Total mappings: 33 # # Maps at the vocabulary/semantic level using: @@ -8,36 +8,36 @@ # - RO-Crate: JSON-LD property URIs (schema.org, EVI, RAI, D4D) # subject_id subject_label subject_source predicate_id object_id object_label object_source mapping_justification confidence comment author_id mapping_date mapping_set_id mapping_set_version d4d_slot_name vocab_crosswalk -schema:sameAs sameAs https://schema.org/ skos:exactMatch schema:sameAs sameAs https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'same_as' (slot_uri: schema:sameAs) → RO-Crate 'schema:sameAs' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 same_as false -dcat:theme theme https://www.w3.org/ns/dcat# skos:closeMatch schema:about about https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'themes' (slot_uri: dcat:theme) → RO-Crate 'schema:about' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 themes true -dcterms:title title http://purl.org/dc/terms/ skos:closeMatch schema:name name https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'title' (slot_uri: dcterms:title) → RO-Crate 'schema:name' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 title true -dcterms:language language http://purl.org/dc/terms/ skos:closeMatch schema:inLanguage inLanguage https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'language' (slot_uri: dcterms:language) → RO-Crate 'schema:inLanguage' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 language true -dcterms:publisher publisher http://purl.org/dc/terms/ skos:closeMatch schema:publisher publisher https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'publisher' (slot_uri: dcterms:publisher) → RO-Crate 'schema:publisher' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 publisher true -dcterms:issued issued http://purl.org/dc/terms/ skos:closeMatch schema:datePublished datePublished https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'issued' (slot_uri: dcterms:issued) → RO-Crate 'schema:datePublished' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 issued true -dcat:landingPage landingPage https://www.w3.org/ns/dcat# skos:closeMatch schema:url url https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'page' (slot_uri: dcat:landingPage) → RO-Crate 'schema:url' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 page true -schema:encodingFormat encodingFormat https://schema.org/ skos:exactMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'dialect' (slot_uri: schema:encodingFormat) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 dialect false -dcat:byteSize byteSize https://www.w3.org/ns/dcat# skos:closeMatch schema:contentSize contentSize https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'bytes' (slot_uri: dcat:byteSize) → RO-Crate 'schema:contentSize' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 bytes true -schema:contentUrl contentUrl https://schema.org/ skos:exactMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'path' (slot_uri: schema:contentUrl) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 path false -dcat:downloadURL downloadURL https://www.w3.org/ns/dcat# skos:closeMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'download_url' (slot_uri: dcat:downloadURL) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 download_url true -dcterms:format format http://purl.org/dc/terms/ skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'format' (slot_uri: dcterms:format) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 format true -dcat:mediaType mediaType https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'encoding' (slot_uri: dcat:mediaType) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 encoding true -dcat:compressFormat compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'compression' (slot_uri: dcat:compressFormat) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 compression true -dcat:mediaType mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'media_type' (slot_uri: dcat:mediaType) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 media_type true -dcterms:identifier identifier http://purl.org/dc/terms/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'hash' (slot_uri: dcterms:identifier) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 hash true -dcterms:identifier identifier http://purl.org/dc/terms/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'md5' (slot_uri: dcterms:identifier) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 md5 true -dcterms:identifier identifier http://purl.org/dc/terms/ skos:relatedMatch evi:sha256 sha256 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'sha256' (slot_uri: dcterms:identifier) → RO-Crate 'evi:sha256' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 sha256 true -dcterms:conformsTo conformsTo http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 conforms_to true -dcterms:conformsTo conformsTo http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to_schema' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_schema true -dcterms:conformsTo conformsTo http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to_class' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_class true -dcterms:license license http://purl.org/dc/terms/ skos:closeMatch schema:license license https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'license' (slot_uri: dcterms:license) → RO-Crate 'schema:license' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 license true -dcat:keyword keyword https://www.w3.org/ns/dcat# skos:closeMatch schema:keywords keywords https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'keywords' (slot_uri: dcat:keyword) → RO-Crate 'schema:keywords' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 keywords true -dcterms:hasVersion hasVersion http://purl.org/dc/terms/ skos:closeMatch schema:version version https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'version' (slot_uri: dcterms:hasVersion) → RO-Crate 'schema:version' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 version true -dcterms:creator creator http://purl.org/dc/terms/ skos:closeMatch schema:creator creator https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_by' (slot_uri: dcterms:creator) → RO-Crate 'schema:creator' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 created_by true -dcterms:created created http://purl.org/dc/terms/ skos:closeMatch schema:dateCreated dateCreated https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_on' (slot_uri: dcterms:created) → RO-Crate 'schema:dateCreated' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 created_on true -dcterms:modified modified http://purl.org/dc/terms/ skos:closeMatch schema:dateModified dateModified https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'last_updated_on' (slot_uri: dcterms:modified) → RO-Crate 'schema:dateModified' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 last_updated_on true -dcterms:contributor contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor contributor https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'modified_by' (slot_uri: dcterms:contributor) → RO-Crate 'schema:contributor' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 modified_by true -dcterms:type type http://purl.org/dc/terms/ skos:closeMatch schema:creativeWorkStatus creativeWorkStatus https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'status' (slot_uri: dcterms:type) → RO-Crate 'schema:creativeWorkStatus' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 status true -prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# skos:relatedMatch schema:isBasedOn isBasedOn https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'was_derived_from' (slot_uri: prov:wasDerivedFrom) → RO-Crate 'schema:isBasedOn' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 was_derived_from true -dcterms:identifier identifier http://purl.org/dc/terms/ skos:closeMatch schema:identifier identifier https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'doi' (slot_uri: dcterms:identifier) → RO-Crate 'schema:identifier' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 doi true -dcterms:references references http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink relatedLink https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'external_resources' (slot_uri: dcterms:references) → RO-Crate 'schema:relatedLink' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 external_resources true -schema:hasPart hasPart https://schema.org/ skos:exactMatch schema:hasPart hasPart https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'resources' (slot_uri: schema:hasPart) → RO-Crate 'schema:hasPart' https://orcid.org/0000-0000-0000-0000 2026-03-25 d4d-rocrate-uri-alignment-v1 1.0 resources false +schema:sameAs sameAs https://schema.org/ skos:exactMatch schema:sameAs sameAs https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'same_as' (slot_uri: schema:sameAs) → RO-Crate 'schema:sameAs' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 same_as false +dcat:theme theme https://www.w3.org/ns/dcat# skos:closeMatch schema:about about https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'themes' (slot_uri: dcat:theme) → RO-Crate 'schema:about' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 themes true +dcterms:title title http://purl.org/dc/terms/ skos:closeMatch schema:name name https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'title' (slot_uri: dcterms:title) → RO-Crate 'schema:name' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 title true +dcterms:language language http://purl.org/dc/terms/ skos:closeMatch schema:inLanguage inLanguage https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'language' (slot_uri: dcterms:language) → RO-Crate 'schema:inLanguage' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 language true +dcterms:publisher publisher http://purl.org/dc/terms/ skos:closeMatch schema:publisher publisher https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'publisher' (slot_uri: dcterms:publisher) → RO-Crate 'schema:publisher' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 publisher true +dcterms:issued issued http://purl.org/dc/terms/ skos:closeMatch schema:datePublished datePublished https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'issued' (slot_uri: dcterms:issued) → RO-Crate 'schema:datePublished' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 issued true +dcat:landingPage landingPage https://www.w3.org/ns/dcat# skos:closeMatch schema:url url https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'page' (slot_uri: dcat:landingPage) → RO-Crate 'schema:url' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 page true +schema:encodingFormat encodingFormat https://schema.org/ skos:exactMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'dialect' (slot_uri: schema:encodingFormat) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 dialect false +dcat:byteSize byteSize https://www.w3.org/ns/dcat# skos:closeMatch schema:contentSize contentSize https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'bytes' (slot_uri: dcat:byteSize) → RO-Crate 'schema:contentSize' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 bytes true +schema:contentUrl contentUrl https://schema.org/ skos:exactMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'path' (slot_uri: schema:contentUrl) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 path false +dcat:downloadURL downloadURL https://www.w3.org/ns/dcat# skos:closeMatch schema:contentUrl contentUrl https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'download_url' (slot_uri: dcat:downloadURL) → RO-Crate 'schema:contentUrl' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 download_url true +dcterms:format format http://purl.org/dc/terms/ skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'format' (slot_uri: dcterms:format) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 format true +d4d:characterEncoding characterEncoding https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'encoding' (slot_uri: d4d:characterEncoding) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 encoding true +dcat:compressFormat compressFormat https://www.w3.org/ns/dcat# skos:closeMatch evi:formats formats https://w3id.org/EVI# semapv:ManualMappingCuration 0.9 D4D slot 'compression' (slot_uri: dcat:compressFormat) → RO-Crate 'evi:formats' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 compression true +dcat:mediaType mediaType https://www.w3.org/ns/dcat# skos:closeMatch schema:encodingFormat encodingFormat https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'media_type' (slot_uri: dcat:mediaType) → RO-Crate 'schema:encodingFormat' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 media_type true +d4d:hashValue hashValue https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'hash' (slot_uri: d4d:hashValue) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 hash true +d4d:md5Checksum md5Checksum https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch evi:md5 md5 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'md5' (slot_uri: d4d:md5Checksum) → RO-Crate 'evi:md5' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 md5 true +schema:sha256 sha256 https://schema.org/ skos:relatedMatch evi:sha256 sha256 https://w3id.org/EVI# semapv:ManualMappingCuration 0.7 D4D slot 'sha256' (slot_uri: schema:sha256) → RO-Crate 'evi:sha256' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 sha256 true +dcterms:conformsTo conformsTo http://purl.org/dc/terms/ skos:closeMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'conforms_to' (slot_uri: dcterms:conformsTo) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to true +d4d:conformsToSchema conformsToSchema https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'conforms_to_schema' (slot_uri: d4d:conformsToSchema) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_schema true +d4d:conformsToClass conformsToClass https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:conformsTo conformsTo https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'conforms_to_class' (slot_uri: d4d:conformsToClass) → RO-Crate 'schema:conformsTo' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 conforms_to_class true +dcterms:license license http://purl.org/dc/terms/ skos:closeMatch schema:license license https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'license' (slot_uri: dcterms:license) → RO-Crate 'schema:license' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 license true +dcat:keyword keyword https://www.w3.org/ns/dcat# skos:closeMatch schema:keywords keywords https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'keywords' (slot_uri: dcat:keyword) → RO-Crate 'schema:keywords' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 keywords true +schema:version version https://schema.org/ skos:exactMatch schema:version version https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'version' (slot_uri: schema:version) → RO-Crate 'schema:version' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 version false +dcterms:creator creator http://purl.org/dc/terms/ skos:closeMatch schema:creator creator https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_by' (slot_uri: dcterms:creator) → RO-Crate 'schema:creator' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 created_by true +dcterms:created created http://purl.org/dc/terms/ skos:closeMatch schema:dateCreated dateCreated https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'created_on' (slot_uri: dcterms:created) → RO-Crate 'schema:dateCreated' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 created_on true +dcterms:modified modified http://purl.org/dc/terms/ skos:closeMatch schema:dateModified dateModified https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'last_updated_on' (slot_uri: dcterms:modified) → RO-Crate 'schema:dateModified' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 last_updated_on true +dcterms:contributor contributor http://purl.org/dc/terms/ skos:closeMatch schema:contributor contributor https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'modified_by' (slot_uri: dcterms:contributor) → RO-Crate 'schema:contributor' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 modified_by true +d4d:publicationStatus publicationStatus https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:creativeWorkStatus creativeWorkStatus https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'status' (slot_uri: d4d:publicationStatus) → RO-Crate 'schema:creativeWorkStatus' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 status true +prov:wasDerivedFrom wasDerivedFrom http://www.w3.org/ns/prov# skos:relatedMatch schema:isBasedOn isBasedOn https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'was_derived_from' (slot_uri: prov:wasDerivedFrom) → RO-Crate 'schema:isBasedOn' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 was_derived_from true +d4d:doiIdentifier doiIdentifier https://w3id.org/bridge2ai/data-sheets-schema/ skos:relatedMatch schema:identifier identifier https://schema.org/ semapv:ManualMappingCuration 0.7 D4D slot 'doi' (slot_uri: d4d:doiIdentifier) → RO-Crate 'schema:identifier' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 doi true +dcterms:references references http://purl.org/dc/terms/ skos:closeMatch schema:relatedLink relatedLink https://schema.org/ semapv:ManualMappingCuration 0.9 D4D slot 'external_resources' (slot_uri: dcterms:references) → RO-Crate 'schema:relatedLink' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 external_resources true +schema:hasPart hasPart https://schema.org/ skos:exactMatch schema:hasPart hasPart https://schema.org/ semapv:ManualMappingCuration 1.0 D4D slot 'resources' (slot_uri: schema:hasPart) → RO-Crate 'schema:hasPart' https://orcid.org/0000-0000-0000-0000 2026-04-09 d4d-rocrate-uri-alignment-v1 1.0 resources false diff --git a/src/data_sheets_schema/datamodel/data_sheets_schema.py b/src/data_sheets_schema/datamodel/data_sheets_schema.py index c92d2412..993fe03f 100644 --- a/src/data_sheets_schema/datamodel/data_sheets_schema.py +++ b/src/data_sheets_schema/datamodel/data_sheets_schema.py @@ -1,5 +1,5 @@ # Auto generated from data_sheets_schema.yaml by pythongen.py version: 0.0.1 -# Generation date: 2026-04-07T13:03:28 +# Generation date: 2026-04-09T10:17:09 # Schema: data-sheets-schema # # id: https://w3id.org/bridge2ai/data-sheets-schema @@ -247,7 +247,7 @@ class Software(NamedThing): id: Union[str, SoftwareId] = None version: Optional[str] = None license: Optional[str] = None - url: Optional[str] = None + url: Optional[Union[str, URI]] = None def __post_init__(self, *_: str, **kwargs: Any): if self._is_empty(self.id): @@ -261,8 +261,8 @@ def __post_init__(self, *_: str, **kwargs: Any): if self.license is not None and not isinstance(self.license, str): self.license = str(self.license) - if self.url is not None and not isinstance(self.url, str): - self.url = str(self.url) + if self.url is not None and not isinstance(self.url, URI): + self.url = URI(self.url) super().__post_init__(**kwargs) @@ -469,11 +469,17 @@ class Dataset(Information): content_warnings: Optional[Union[Union[dict, "ContentWarning"], list[Union[dict, "ContentWarning"]]]] = empty_list() subpopulations: Optional[Union[Union[dict, "Subpopulation"], list[Union[dict, "Subpopulation"]]]] = empty_list() sensitive_elements: Optional[Union[Union[dict, "SensitiveElement"], list[Union[dict, "SensitiveElement"]]]] = empty_list() + relationships: Optional[Union[Union[dict, "Relationships"], list[Union[dict, "Relationships"]]]] = empty_list() + splits: Optional[Union[Union[dict, "Splits"], list[Union[dict, "Splits"]]]] = empty_list() acquisition_methods: Optional[Union[Union[dict, "InstanceAcquisition"], list[Union[dict, "InstanceAcquisition"]]]] = empty_list() collection_mechanisms: Optional[Union[Union[dict, "CollectionMechanism"], list[Union[dict, "CollectionMechanism"]]]] = empty_list() sampling_strategies: Optional[Union[Union[dict, "SamplingStrategy"], list[Union[dict, "SamplingStrategy"]]]] = empty_list() data_collectors: Optional[Union[Union[dict, "DataCollector"], list[Union[dict, "DataCollector"]]]] = empty_list() collection_timeframes: Optional[Union[Union[dict, "CollectionTimeframe"], list[Union[dict, "CollectionTimeframe"]]]] = empty_list() + direct_collection: Optional[Union[Union[dict, "DirectCollection"], list[Union[dict, "DirectCollection"]]]] = empty_list() + collection_notifications: Optional[Union[Union[dict, "CollectionNotification"], list[Union[dict, "CollectionNotification"]]]] = empty_list() + collection_consents: Optional[Union[Union[dict, "CollectionConsent"], list[Union[dict, "CollectionConsent"]]]] = empty_list() + consent_revocations: Optional[Union[Union[dict, "ConsentRevocation"], list[Union[dict, "ConsentRevocation"]]]] = empty_list() missing_data_documentation: Optional[Union[Union[dict, "MissingDataDocumentation"], list[Union[dict, "MissingDataDocumentation"]]]] = empty_list() raw_data_sources: Optional[Union[Union[dict, "RawDataSource"], list[Union[dict, "RawDataSource"]]]] = empty_list() ethical_reviews: Optional[Union[Union[dict, "EthicalReview"], list[Union[dict, "EthicalReview"]]]] = empty_list() @@ -499,6 +505,7 @@ class Dataset(Information): prohibited_uses: Optional[Union[Union[dict, "ProhibitedUse"], list[Union[dict, "ProhibitedUse"]]]] = empty_list() distribution_formats: Optional[Union[Union[dict, "DistributionFormat"], list[Union[dict, "DistributionFormat"]]]] = empty_list() distribution_dates: Optional[Union[Union[dict, "DistributionDate"], list[Union[dict, "DistributionDate"]]]] = empty_list() + third_party_sharing: Optional[Union[Union[dict, "ThirdPartySharing"], list[Union[dict, "ThirdPartySharing"]]]] = empty_list() license_and_use_terms: Optional[Union[dict, "LicenseAndUseTerms"]] = None ip_restrictions: Optional[Union[dict, "IPRestrictions"]] = None regulatory_restrictions: Optional[Union[dict, "ExportControlRegulatoryRestrictions"]] = None @@ -589,6 +596,14 @@ def __post_init__(self, *_: str, **kwargs: Any): self.sensitive_elements = [self.sensitive_elements] if self.sensitive_elements is not None else [] self.sensitive_elements = [v if isinstance(v, SensitiveElement) else SensitiveElement(**as_dict(v)) for v in self.sensitive_elements] + if not isinstance(self.relationships, list): + self.relationships = [self.relationships] if self.relationships is not None else [] + self.relationships = [v if isinstance(v, Relationships) else Relationships(**as_dict(v)) for v in self.relationships] + + if not isinstance(self.splits, list): + self.splits = [self.splits] if self.splits is not None else [] + self.splits = [v if isinstance(v, Splits) else Splits(**as_dict(v)) for v in self.splits] + if not isinstance(self.acquisition_methods, list): self.acquisition_methods = [self.acquisition_methods] if self.acquisition_methods is not None else [] self.acquisition_methods = [v if isinstance(v, InstanceAcquisition) else InstanceAcquisition(**as_dict(v)) for v in self.acquisition_methods] @@ -609,6 +624,22 @@ def __post_init__(self, *_: str, **kwargs: Any): self.collection_timeframes = [self.collection_timeframes] if self.collection_timeframes is not None else [] self.collection_timeframes = [v if isinstance(v, CollectionTimeframe) else CollectionTimeframe(**as_dict(v)) for v in self.collection_timeframes] + if not isinstance(self.direct_collection, list): + self.direct_collection = [self.direct_collection] if self.direct_collection is not None else [] + self.direct_collection = [v if isinstance(v, DirectCollection) else DirectCollection(**as_dict(v)) for v in self.direct_collection] + + if not isinstance(self.collection_notifications, list): + self.collection_notifications = [self.collection_notifications] if self.collection_notifications is not None else [] + self.collection_notifications = [v if isinstance(v, CollectionNotification) else CollectionNotification(**as_dict(v)) for v in self.collection_notifications] + + if not isinstance(self.collection_consents, list): + self.collection_consents = [self.collection_consents] if self.collection_consents is not None else [] + self.collection_consents = [v if isinstance(v, CollectionConsent) else CollectionConsent(**as_dict(v)) for v in self.collection_consents] + + if not isinstance(self.consent_revocations, list): + self.consent_revocations = [self.consent_revocations] if self.consent_revocations is not None else [] + self.consent_revocations = [v if isinstance(v, ConsentRevocation) else ConsentRevocation(**as_dict(v)) for v in self.consent_revocations] + if not isinstance(self.missing_data_documentation, list): self.missing_data_documentation = [self.missing_data_documentation] if self.missing_data_documentation is not None else [] self.missing_data_documentation = [v if isinstance(v, MissingDataDocumentation) else MissingDataDocumentation(**as_dict(v)) for v in self.missing_data_documentation] @@ -707,6 +738,10 @@ def __post_init__(self, *_: str, **kwargs: Any): self.distribution_dates = [self.distribution_dates] if self.distribution_dates is not None else [] self.distribution_dates = [v if isinstance(v, DistributionDate) else DistributionDate(**as_dict(v)) for v in self.distribution_dates] + if not isinstance(self.third_party_sharing, list): + self.third_party_sharing = [self.third_party_sharing] if self.third_party_sharing is not None else [] + self.third_party_sharing = [v if isinstance(v, ThirdPartySharing) else ThirdPartySharing(**as_dict(v)) for v in self.third_party_sharing] + if self.license_and_use_terms is not None and not isinstance(self.license_and_use_terms, LicenseAndUseTerms): self.license_and_use_terms = LicenseAndUseTerms(**as_dict(self.license_and_use_terms)) @@ -792,7 +827,7 @@ def __post_init__(self, *_: str, **kwargs: Any): @dataclass(repr=False) class FormatDialect(YAMLRoot): """ - Additional format information for a file + Additional format information for a file. """ _inherited_slots: ClassVar[list[str]] = [] @@ -1060,30 +1095,27 @@ class SamplingStrategy(DatasetProperty): class_name: ClassVar[str] = "SamplingStrategy" class_model_uri: ClassVar[URIRef] = DATA_SHEETS_SCHEMA.SamplingStrategy - is_sample: Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]] = empty_list() - is_random: Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]] = empty_list() + is_sample: Optional[Union[bool, Bool]] = None + is_random: Optional[Union[bool, Bool]] = None source_data: Optional[Union[str, list[str]]] = empty_list() - is_representative: Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]] = empty_list() + is_representative: Optional[Union[bool, Bool]] = None representative_verification: Optional[Union[str, list[str]]] = empty_list() why_not_representative: Optional[Union[str, list[str]]] = empty_list() strategies: Optional[Union[str, list[str]]] = empty_list() def __post_init__(self, *_: str, **kwargs: Any): - if not isinstance(self.is_sample, list): - self.is_sample = [self.is_sample] if self.is_sample is not None else [] - self.is_sample = [v if isinstance(v, Bool) else Bool(v) for v in self.is_sample] + if self.is_sample is not None and not isinstance(self.is_sample, Bool): + self.is_sample = Bool(self.is_sample) - if not isinstance(self.is_random, list): - self.is_random = [self.is_random] if self.is_random is not None else [] - self.is_random = [v if isinstance(v, Bool) else Bool(v) for v in self.is_random] + if self.is_random is not None and not isinstance(self.is_random, Bool): + self.is_random = Bool(self.is_random) if not isinstance(self.source_data, list): self.source_data = [self.source_data] if self.source_data is not None else [] self.source_data = [v if isinstance(v, str) else str(v) for v in self.source_data] - if not isinstance(self.is_representative, list): - self.is_representative = [self.is_representative] if self.is_representative is not None else [] - self.is_representative = [v if isinstance(v, Bool) else Bool(v) for v in self.is_representative] + if self.is_representative is not None and not isinstance(self.is_representative, Bool): + self.is_representative = Bool(self.is_representative) if not isinstance(self.representative_verification, list): self.representative_verification = [self.representative_verification] if self.representative_verification is not None else [] @@ -1278,7 +1310,7 @@ class ExternalResource(DatasetProperty): external_resources: Optional[Union[str, list[str]]] = empty_list() future_guarantees: Optional[Union[str, list[str]]] = empty_list() - archival: Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]] = empty_list() + archival: Optional[Union[bool, Bool]] = None restrictions: Optional[Union[str, list[str]]] = empty_list() def __post_init__(self, *_: str, **kwargs: Any): @@ -1290,9 +1322,8 @@ def __post_init__(self, *_: str, **kwargs: Any): self.future_guarantees = [self.future_guarantees] if self.future_guarantees is not None else [] self.future_guarantees = [v if isinstance(v, str) else str(v) for v in self.future_guarantees] - if not isinstance(self.archival, list): - self.archival = [self.archival] if self.archival is not None else [] - self.archival = [v if isinstance(v, Bool) else Bool(v) for v in self.archival] + if self.archival is not None and not isinstance(self.archival, Bool): + self.archival = Bool(self.archival) if not isinstance(self.restrictions, list): self.restrictions = [self.restrictions] if self.restrictions is not None else [] @@ -1757,7 +1788,7 @@ class LabelingStrategy(DatasetProperty): class_name: ClassVar[str] = "LabelingStrategy" class_model_uri: ClassVar[URIRef] = DATA_SHEETS_SCHEMA.LabelingStrategy - data_annotation_platform: Optional[str] = None + data_annotation_platform: Optional[Union[str, list[str]]] = empty_list() data_annotation_protocol: Optional[Union[str, list[str]]] = empty_list() annotations_per_item: Optional[int] = None inter_annotator_agreement: Optional[str] = None @@ -1765,8 +1796,9 @@ class LabelingStrategy(DatasetProperty): labeling_details: Optional[Union[str, list[str]]] = empty_list() def __post_init__(self, *_: str, **kwargs: Any): - if self.data_annotation_platform is not None and not isinstance(self.data_annotation_platform, str): - self.data_annotation_platform = str(self.data_annotation_platform) + if not isinstance(self.data_annotation_platform, list): + self.data_annotation_platform = [self.data_annotation_platform] if self.data_annotation_platform is not None else [] + self.data_annotation_platform = [v if isinstance(v, str) else str(v) for v in self.data_annotation_platform] if not isinstance(self.data_annotation_protocol, list): self.data_annotation_protocol = [self.data_annotation_protocol] if self.data_annotation_protocol is not None else [] @@ -1951,8 +1983,8 @@ def __post_init__(self, *_: str, **kwargs: Any): @dataclass(repr=False) class UseRepository(DatasetProperty): """ - Is there a repository that links to any or all papers or systems that use the dataset? If so, provide a link or - other access point. + A repository or registry of known uses of this dataset by third parties. Documents where the dataset has been + applied, enabling discoverability of downstream use cases and impact tracking. """ _inherited_slots: ClassVar[list[str]] = [] @@ -2131,12 +2163,12 @@ class DistributionFormat(DatasetProperty): class_name: ClassVar[str] = "DistributionFormat" class_model_uri: ClassVar[URIRef] = DATA_SHEETS_SCHEMA.DistributionFormat - access_urls: Optional[Union[str, list[str]]] = empty_list() + access_urls: Optional[Union[Union[str, URI], list[Union[str, URI]]]] = empty_list() def __post_init__(self, *_: str, **kwargs: Any): if not isinstance(self.access_urls, list): self.access_urls = [self.access_urls] if self.access_urls is not None else [] - self.access_urls = [v if isinstance(v, str) else str(v) for v in self.access_urls] + self.access_urls = [v if isinstance(v, URI) else URI(v) for v in self.access_urls] super().__post_init__(**kwargs) @@ -2283,13 +2315,13 @@ class VersionAccess(DatasetProperty): class_name: ClassVar[str] = "VersionAccess" class_model_uri: ClassVar[URIRef] = DATA_SHEETS_SCHEMA.VersionAccess - latest_version_doi: Optional[str] = None + latest_version_doi: Optional[Union[str, URIorCURIE]] = None versions_available: Optional[Union[str, list[str]]] = empty_list() version_details: Optional[Union[str, list[str]]] = empty_list() def __post_init__(self, *_: str, **kwargs: Any): - if self.latest_version_doi is not None and not isinstance(self.latest_version_doi, str): - self.latest_version_doi = str(self.latest_version_doi) + if self.latest_version_doi is not None and not isinstance(self.latest_version_doi, URIorCURIE): + self.latest_version_doi = URIorCURIE(self.latest_version_doi) if not isinstance(self.versions_available, list): self.versions_available = [self.versions_available] if self.versions_available is not None else [] @@ -2948,164 +2980,368 @@ def __post_init__(self, *_: str, **kwargs: Any): # Enumerations class FormatEnum(EnumDefinitionImpl): - - CSV = PermissibleValue(text="CSV") - TSV = PermissibleValue(text="TSV") - XML = PermissibleValue(text="XML") - JSON = PermissibleValue(text="JSON") - JSONL = PermissibleValue(text="JSONL") - YAML = PermissibleValue(text="YAML") - HTML = PermissibleValue(text="HTML") - PDF = PermissibleValue(text="PDF") - DOCX = PermissibleValue(text="DOCX") - XLSX = PermissibleValue(text="XLSX") - PPTX = PermissibleValue(text="PPTX") - TXT = PermissibleValue(text="TXT") - MD = PermissibleValue(text="MD") - ZIP = PermissibleValue(text="ZIP") - TAR = PermissibleValue(text="TAR") - GZ = PermissibleValue(text="GZ") - BZ2 = PermissibleValue(text="BZ2") - XZ = PermissibleValue(text="XZ") + """ + Common file format extensions for data files and documents. + """ + CSV = PermissibleValue( + text="CSV", + description="Comma-Separated Values - tabular data format.") + TSV = PermissibleValue( + text="TSV", + description="Tab-Separated Values - tabular data format with tab delimiters.") + XML = PermissibleValue( + text="XML", + description="Extensible Markup Language - structured markup format.") + JSON = PermissibleValue( + text="JSON", + description="JavaScript Object Notation - structured data interchange format.") + JSONL = PermissibleValue( + text="JSONL", + description="JSON Lines - newline-delimited JSON format.") + YAML = PermissibleValue( + text="YAML", + description="YAML Ain't Markup Language - human-readable data serialization format.") + HTML = PermissibleValue( + text="HTML", + description="HyperText Markup Language - web page markup format.") + PDF = PermissibleValue( + text="PDF", + description="Portable Document Format - fixed-layout document format.") + DOCX = PermissibleValue( + text="DOCX", + description="Microsoft Word Open XML Document - word processing document.") + XLSX = PermissibleValue( + text="XLSX", + description="Microsoft Excel Open XML Spreadsheet - spreadsheet format.") + PPTX = PermissibleValue( + text="PPTX", + description="Microsoft PowerPoint Open XML Presentation - presentation format.") + TXT = PermissibleValue( + text="TXT", + description="Plain text file.") + MD = PermissibleValue( + text="MD", + description="Markdown - lightweight markup language.") + ZIP = PermissibleValue( + text="ZIP", + description="ZIP archive - compressed file container.") + TAR = PermissibleValue( + text="TAR", + description="Tape Archive - file archive format.") + GZ = PermissibleValue( + text="GZ", + description="Gzip compressed file.") + BZ2 = PermissibleValue( + text="BZ2", + description="Bzip2 compressed file.") + XZ = PermissibleValue( + text="XZ", + description="XZ compressed file.") _defn = EnumDefinition( name="FormatEnum", + description="Common file format extensions for data files and documents.", ) class MediaTypeEnum(EnumDefinitionImpl): - + """ + MIME media types (Internet Media Types) for file content identification. + """ _defn = EnumDefinition( name="MediaTypeEnum", + description="MIME media types (Internet Media Types) for file content identification.", ) @classmethod def _addvals(cls): setattr(cls, "text/csv", - PermissibleValue(text="text/csv")) + PermissibleValue( + text="text/csv", + description="MIME type for CSV (Comma-Separated Values) files.")) setattr(cls, "text/tab-separated-values", - PermissibleValue(text="text/tab-separated-values")) + PermissibleValue( + text="text/tab-separated-values", + description="MIME type for TSV (Tab-Separated Values) files.")) setattr(cls, "application/json", - PermissibleValue(text="application/json")) + PermissibleValue( + text="application/json", + description="MIME type for JSON (JavaScript Object Notation) files.")) setattr(cls, "application/xml", - PermissibleValue(text="application/xml")) + PermissibleValue( + text="application/xml", + description="MIME type for XML (Extensible Markup Language) files.")) setattr(cls, "text/xml", - PermissibleValue(text="text/xml")) + PermissibleValue( + text="text/xml", + description="Alternative MIME type for XML files (text variant).")) setattr(cls, "application/yaml", - PermissibleValue(text="application/yaml")) + PermissibleValue( + text="application/yaml", + description="MIME type for YAML files.")) setattr(cls, "text/yaml", - PermissibleValue(text="text/yaml")) + PermissibleValue( + text="text/yaml", + description="Alternative MIME type for YAML files (text variant).")) setattr(cls, "text/html", - PermissibleValue(text="text/html")) + PermissibleValue( + text="text/html", + description="MIME type for HTML (HyperText Markup Language) files.")) setattr(cls, "application/pdf", - PermissibleValue(text="application/pdf")) + PermissibleValue( + text="application/pdf", + description="MIME type for PDF (Portable Document Format) files.")) setattr(cls, "application/vnd.openxmlformats-officedocument.wordprocessingml.document", - PermissibleValue(text="application/vnd.openxmlformats-officedocument.wordprocessingml.document")) + PermissibleValue( + text="application/vnd.openxmlformats-officedocument.wordprocessingml.document", + description="MIME type for Microsoft Word DOCX files.")) setattr(cls, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", - PermissibleValue(text="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")) + PermissibleValue( + text="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", + description="MIME type for Microsoft Excel XLSX files.")) setattr(cls, "application/vnd.openxmlformats-officedocument.presentationml.presentation", - PermissibleValue(text="application/vnd.openxmlformats-officedocument.presentationml.presentation")) + PermissibleValue( + text="application/vnd.openxmlformats-officedocument.presentationml.presentation", + description="MIME type for Microsoft PowerPoint PPTX files.")) setattr(cls, "text/plain", - PermissibleValue(text="text/plain")) + PermissibleValue( + text="text/plain", + description="MIME type for plain text files.")) setattr(cls, "text/markdown", - PermissibleValue(text="text/markdown")) + PermissibleValue( + text="text/markdown", + description="MIME type for Markdown files.")) setattr(cls, "application/zip", - PermissibleValue(text="application/zip")) + PermissibleValue( + text="application/zip", + description="MIME type for ZIP archive files.")) setattr(cls, "application/x-tar", - PermissibleValue(text="application/x-tar")) + PermissibleValue( + text="application/x-tar", + description="MIME type for TAR archive files.")) setattr(cls, "application/gzip", - PermissibleValue(text="application/gzip")) + PermissibleValue( + text="application/gzip", + description="MIME type for Gzip compressed files.")) setattr(cls, "application/x-bzip2", - PermissibleValue(text="application/x-bzip2")) + PermissibleValue( + text="application/x-bzip2", + description="MIME type for Bzip2 compressed files.")) setattr(cls, "application/x-xz", - PermissibleValue(text="application/x-xz")) + PermissibleValue( + text="application/x-xz", + description="MIME type for XZ compressed files.")) class CompressionEnum(EnumDefinitionImpl): - - gzip = PermissibleValue(text="gzip") - bzip2 = PermissibleValue(text="bzip2") - zip = PermissibleValue(text="zip") - tar = PermissibleValue(text="tar") - xz = PermissibleValue(text="xz") - lzma = PermissibleValue(text="lzma") - compress = PermissibleValue(text="compress") + """ + Compression algorithms and formats for file compression. + """ + gzip = PermissibleValue( + text="gzip", + description="GNU zip compression (commonly used with .gz extension).") + bzip2 = PermissibleValue( + text="bzip2", + description="Burrows-Wheeler block-sorting compression (commonly used with .bz2 extension).") + zip = PermissibleValue( + text="zip", + description="ZIP archive compression format.") + tar = PermissibleValue( + text="tar", + description="Tape Archive format (typically combined with gzip or bzip2).") + xz = PermissibleValue( + text="xz", + description="XZ Utils compression using LZMA2 algorithm.") + lzma = PermissibleValue( + text="lzma", + description="Lempel-Ziv-Markov chain algorithm compression.") + compress = PermissibleValue( + text="compress", + description="Unix compress utility (LZW compression).") _defn = EnumDefinition( name="CompressionEnum", + description="Compression algorithms and formats for file compression.", ) class EncodingEnum(EnumDefinitionImpl): - - ASCII = PermissibleValue(text="ASCII") - Big5 = PermissibleValue(text="Big5") - GB2312 = PermissibleValue(text="GB2312") - Shift_JIS = PermissibleValue(text="Shift_JIS") + """ + Character encoding schemes for text representation in different languages and scripts. + """ + ASCII = PermissibleValue( + text="ASCII", + description="American Standard Code for Information Interchange (7-bit, English characters only).") + Big5 = PermissibleValue( + text="Big5", + description="Traditional Chinese character encoding (primarily Taiwan and Hong Kong).") + GB2312 = PermissibleValue( + text="GB2312", + description="Simplified Chinese character encoding standard.") + Shift_JIS = PermissibleValue( + text="Shift_JIS", + description="Japanese character encoding (Microsoft and other systems).") _defn = EnumDefinition( name="EncodingEnum", + description="Character encoding schemes for text representation in different languages and scripts.", ) @classmethod def _addvals(cls): setattr(cls, "EUC-JP", - PermissibleValue(text="EUC-JP")) + PermissibleValue( + text="EUC-JP", + description="Extended Unix Code for Japanese.")) setattr(cls, "EUC-KR", - PermissibleValue(text="EUC-KR")) + PermissibleValue( + text="EUC-KR", + description="Extended Unix Code for Korean.")) setattr(cls, "EUC-TW", - PermissibleValue(text="EUC-TW")) + PermissibleValue( + text="EUC-TW", + description="Extended Unix Code for Traditional Chinese.")) setattr(cls, "HZ-GB-2312", - PermissibleValue(text="HZ-GB-2312")) + PermissibleValue( + text="HZ-GB-2312", + description="7-bit encoding for Simplified Chinese (GB2312).")) setattr(cls, "ISO-2022-CN-EXT", - PermissibleValue(text="ISO-2022-CN-EXT")) + PermissibleValue( + text="ISO-2022-CN-EXT", + description="Extended ISO-2022 encoding for Chinese (includes both Simplified and Traditional).")) setattr(cls, "ISO-2022-CN", - PermissibleValue(text="ISO-2022-CN")) + PermissibleValue( + text="ISO-2022-CN", + description="ISO-2022 encoding for Chinese.")) setattr(cls, "ISO-2022-JP-2", - PermissibleValue(text="ISO-2022-JP-2")) + PermissibleValue( + text="ISO-2022-JP-2", + description="Extended ISO-2022 encoding for Japanese (includes additional character sets).")) setattr(cls, "ISO-2022-JP", - PermissibleValue(text="ISO-2022-JP")) + PermissibleValue( + text="ISO-2022-JP", + description="ISO-2022 encoding for Japanese.")) setattr(cls, "ISO-2022-KR", - PermissibleValue(text="ISO-2022-KR")) + PermissibleValue( + text="ISO-2022-KR", + description="ISO-2022 encoding for Korean.")) setattr(cls, "ISO-8859-10", - PermissibleValue(text="ISO-8859-10")) + PermissibleValue( + text="ISO-8859-10", + description="Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic).")) setattr(cls, "ISO-8859-11", - PermissibleValue(text="ISO-8859-11")) + PermissibleValue( + text="ISO-8859-11", + description="Latin/Thai encoding.")) setattr(cls, "ISO-8859-13", - PermissibleValue(text="ISO-8859-13")) + PermissibleValue( + text="ISO-8859-13", + description="Latin-7 (Baltic Rim languages).")) setattr(cls, "ISO-8859-14", - PermissibleValue(text="ISO-8859-14")) + PermissibleValue( + text="ISO-8859-14", + description="Latin-8 (Celtic languages).")) setattr(cls, "ISO-8859-15", - PermissibleValue(text="ISO-8859-15")) + PermissibleValue( + text="ISO-8859-15", + description="Latin-9 (Western European with Euro sign).")) setattr(cls, "ISO-8859-16", - PermissibleValue(text="ISO-8859-16")) + PermissibleValue( + text="ISO-8859-16", + description="Latin-10 (South-Eastern European languages).")) setattr(cls, "ISO-8859-1", - PermissibleValue(text="ISO-8859-1")) + PermissibleValue( + text="ISO-8859-1", + description="Latin-1 (Western European languages).")) setattr(cls, "ISO-8859-2", - PermissibleValue(text="ISO-8859-2")) + PermissibleValue( + text="ISO-8859-2", + description="Latin-2 (Central European languages).")) setattr(cls, "ISO-8859-3", - PermissibleValue(text="ISO-8859-3")) + PermissibleValue( + text="ISO-8859-3", + description="Latin-3 (South European languages - Turkish, Maltese, Esperanto).")) setattr(cls, "ISO-8859-4", - PermissibleValue(text="ISO-8859-4")) + PermissibleValue( + text="ISO-8859-4", + description="Latin-4 (North European languages).")) setattr(cls, "ISO-8859-5", - PermissibleValue(text="ISO-8859-5")) + PermissibleValue( + text="ISO-8859-5", + description="Latin/Cyrillic encoding.")) setattr(cls, "ISO-8859-6", - PermissibleValue(text="ISO-8859-6")) + PermissibleValue( + text="ISO-8859-6", + description="Latin/Arabic encoding.")) setattr(cls, "ISO-8859-7", - PermissibleValue(text="ISO-8859-7")) + PermissibleValue( + text="ISO-8859-7", + description="Latin/Greek encoding.")) setattr(cls, "ISO-8859-8", - PermissibleValue(text="ISO-8859-8")) + PermissibleValue( + text="ISO-8859-8", + description="Latin/Hebrew encoding.")) setattr(cls, "ISO-8859-9", - PermissibleValue(text="ISO-8859-9")) + PermissibleValue( + text="ISO-8859-9", + description="Latin-5 (Turkish).")) setattr(cls, "KOI8-R", - PermissibleValue(text="KOI8-R")) + PermissibleValue( + text="KOI8-R", + description="Russian character encoding (Kod Obmena Informatsiey).")) setattr(cls, "KOI8-U", - PermissibleValue(text="KOI8-U")) + PermissibleValue( + text="KOI8-U", + description="Ukrainian character encoding.")) setattr(cls, "UTF-16", - PermissibleValue(text="UTF-16")) + PermissibleValue( + text="UTF-16", + description="Unicode Transformation Format 16-bit (variable-width encoding).")) setattr(cls, "UTF-32", - PermissibleValue(text="UTF-32")) + PermissibleValue( + text="UTF-32", + description="Unicode Transformation Format 32-bit (fixed-width encoding).")) setattr(cls, "UTF-7", - PermissibleValue(text="UTF-7")) + PermissibleValue( + text="UTF-7", + description="Unicode Transformation Format 7-bit (for 7-bit channels).")) setattr(cls, "UTF-8", - PermissibleValue(text="UTF-8")) + PermissibleValue( + text="UTF-8", + description="Unicode Transformation Format 8-bit (variable-width, most common Unicode encoding).")) + setattr(cls, "Windows-1250", + PermissibleValue( + text="Windows-1250", + description="Windows code page for Central European languages.")) + setattr(cls, "Windows-1251", + PermissibleValue( + text="Windows-1251", + description="Windows code page for Cyrillic script.")) + setattr(cls, "Windows-1252", + PermissibleValue( + text="Windows-1252", + description="Windows code page for Western European languages.")) + setattr(cls, "Windows-1253", + PermissibleValue( + text="Windows-1253", + description="Windows code page for Greek.")) + setattr(cls, "Windows-1254", + PermissibleValue( + text="Windows-1254", + description="Windows code page for Turkish.")) + setattr(cls, "Windows-1255", + PermissibleValue( + text="Windows-1255", + description="Windows code page for Hebrew.")) + setattr(cls, "Windows-1256", + PermissibleValue( + text="Windows-1256", + description="Windows code page for Arabic.")) + setattr(cls, "Windows-1257", + PermissibleValue( + text="Windows-1257", + description="Windows code page for Baltic languages.")) + setattr(cls, "Windows-1258", + PermissibleValue( + text="Windows-1258", + description="Windows code page for Vietnamese.")) class CRediTRoleEnum(EnumDefinitionImpl): """ @@ -3113,46 +3349,46 @@ class CRediTRoleEnum(EnumDefinitionImpl): """ conceptualization = PermissibleValue( text="conceptualization", - description="Ideas; formulation or evolution of overarching research goals and aims") + description="Ideas; formulation or evolution of overarching research goals and aims.") methodology = PermissibleValue( text="methodology", - description="Development or design of methodology; creation of models") + description="Development or design of methodology; creation of models.") software = PermissibleValue( text="software", - description="Programming, software development; designing computer programs") + description="Programming, software development; designing computer programs.") validation = PermissibleValue( text="validation", - description="Verification of the overall replication/reproducibility of results") + description="Verification of the overall replication/reproducibility of results.") formal_analysis = PermissibleValue( text="formal_analysis", - description="Application of statistical, mathematical, or other formal techniques") + description="Application of statistical, mathematical, or other formal techniques.") investigation = PermissibleValue( text="investigation", - description="Conducting the research and investigation process") + description="Conducting the research and investigation process.") resources = PermissibleValue( text="resources", description="Provision of study materials, reagents, patients, laboratory samples, etc.") data_curation = PermissibleValue( text="data_curation", - description="Management activities to annotate, scrub data and maintain research data") + description="Management activities to annotate, scrub data and maintain research data.") writing_original_draft = PermissibleValue( text="writing_original_draft", - description="Preparation, creation and/or presentation of the published work") + description="Preparation, creation and/or presentation of the published work.") writing_review_editing = PermissibleValue( text="writing_review_editing", - description="Critical review, commentary or revision of the work") + description="Critical review, commentary or revision of the work.") visualization = PermissibleValue( text="visualization", - description="Preparation, creation and/or presentation of visualizations/data presentation") + description="Preparation, creation and/or presentation of visualizations/data presentation.") supervision = PermissibleValue( text="supervision", - description="Oversight and leadership responsibility for the research activity") + description="Oversight and leadership responsibility for the research activity.") project_administration = PermissibleValue( text="project_administration", - description="Management and coordination responsibility for the research activity") + description="Management and coordination responsibility for the research activity.") funding_acquisition = PermissibleValue( text="funding_acquisition", - description="Acquisition of the financial support for the project") + description="Acquisition of the financial support for the project.") _defn = EnumDefinition( name="CRediTRoleEnum", @@ -3239,40 +3475,19 @@ class VersionTypeEnum(EnumDefinitionImpl): """ MAJOR = PermissibleValue( text="MAJOR", - description="Incompatible changes, breaking backward compatibility") + description="Incompatible changes, breaking backward compatibility.") MINOR = PermissibleValue( text="MINOR", - description="Backward-compatible new functionality or enhancements") + description="Backward-compatible new functionality or enhancements.") PATCH = PermissibleValue( text="PATCH", - description="Backward-compatible bug fixes or minor corrections") + description="Backward-compatible bug fixes or minor corrections.") _defn = EnumDefinition( name="VersionTypeEnum", description="Type of version change using semantic versioning principles. See https://semver.org/", ) - @classmethod - def _addvals(cls): - setattr(cls, "Windows-1250", - PermissibleValue(text="Windows-1250")) - setattr(cls, "Windows-1251", - PermissibleValue(text="Windows-1251")) - setattr(cls, "Windows-1252", - PermissibleValue(text="Windows-1252")) - setattr(cls, "Windows-1253", - PermissibleValue(text="Windows-1253")) - setattr(cls, "Windows-1254", - PermissibleValue(text="Windows-1254")) - setattr(cls, "Windows-1255", - PermissibleValue(text="Windows-1255")) - setattr(cls, "Windows-1256", - PermissibleValue(text="Windows-1256")) - setattr(cls, "Windows-1257", - PermissibleValue(text="Windows-1257")) - setattr(cls, "Windows-1258", - PermissibleValue(text="Windows-1258")) - class CreatorOrMaintainerEnum(EnumDefinitionImpl): """ Types of agents (persons or organizations) involved in dataset creation or maintenance. Mapped to schema.org @@ -3318,19 +3533,25 @@ class CreatorOrMaintainerEnum(EnumDefinitionImpl): ) class Boolean(EnumDefinitionImpl): - + """ + Three-valued boolean logic supporting true, false, and unknown states. + """ true = PermissibleValue( text="true", - title="True") + title="True", + description="Affirmative or positive value.") false = PermissibleValue( text="false", - title="False") + title="False", + description="Negative or false value.") unknown = PermissibleValue( text="unknown", - title="Unknown") + title="Unknown", + description="Unknown, uncertain, or not applicable value.") _defn = EnumDefinition( name="Boolean", + description="Three-valued boolean logic supporting true, false, and unknown states.", ) class DatasetRelationshipTypeEnum(EnumDefinitionImpl): @@ -3446,91 +3667,91 @@ class DataUsePermissionEnum(EnumDefinitionImpl): """ no_restriction = PermissibleValue( text="no_restriction", - description="No restriction on data use", + description="No restriction on data use.", meaning=DUO["0000004"]) general_research_use = PermissibleValue( text="general_research_use", - description="Data available for any research purpose (GRU)", + description="Data available for any research purpose (GRU).", meaning=DUO["0000042"]) health_medical_biomedical_research = PermissibleValue( text="health_medical_biomedical_research", - description="Data limited to health, medical, or biomedical research (HMB)", + description="Data limited to health, medical, or biomedical research (HMB).", meaning=DUO["0000006"]) disease_specific_research = PermissibleValue( text="disease_specific_research", - description="Data limited to research on specified disease(s) (DS)", + description="Data limited to research on specified disease(s) (DS).", meaning=DUO["0000007"]) population_origins_ancestry_research = PermissibleValue( text="population_origins_ancestry_research", - description="Data limited to population origins or ancestry research (POA)", + description="Data limited to population origins or ancestry research (POA).", meaning=DUO["0000011"]) clinical_care_use = PermissibleValue( text="clinical_care_use", - description="Data available for clinical care and applications (CC)", + description="Data available for clinical care and applications (CC).", meaning=DUO["0000043"]) no_commercial_use = PermissibleValue( text="no_commercial_use", - description="Data use limited to non-commercial purposes (NCU)", + description="Data use limited to non-commercial purposes (NCU).", meaning=DUO["0000046"]) non_profit_use_only = PermissibleValue( text="non_profit_use_only", - description="Data use limited to not-for-profit organizations (NPU)", + description="Data use limited to not-for-profit organizations (NPU).", meaning=DUO["0000045"]) non_profit_use_and_non_commercial_use = PermissibleValue( text="non_profit_use_and_non_commercial_use", - description="Data limited to not-for-profit organizations and non-commercial use (NPUNCU)", + description="Data limited to not-for-profit organizations and non-commercial use (NPUNCU).", meaning=DUO["0000018"]) no_methods_development = PermissibleValue( text="no_methods_development", - description="Data cannot be used for methods or software development (NMDS)", + description="Data cannot be used for methods or software development (NMDS).", meaning=DUO["0000015"]) genetic_studies_only = PermissibleValue( text="genetic_studies_only", - description="Data limited to genetic studies only (GSO)", + description="Data limited to genetic studies only (GSO).", meaning=DUO["0000016"]) ethics_approval_required = PermissibleValue( text="ethics_approval_required", - description="Ethics approval (e.g., IRB/ERB) required for data use (IRB)", + description="Ethics approval (e.g., IRB/ERB) required for data use (IRB).", meaning=DUO["0000021"]) collaboration_required = PermissibleValue( text="collaboration_required", - description="Collaboration with primary investigator required (COL)", + description="Collaboration with primary investigator required (COL).", meaning=DUO["0000020"]) publication_required = PermissibleValue( text="publication_required", - description="Results must be published/shared with research community (PUB)", + description="Results must be published/shared with research community (PUB).", meaning=DUO["0000019"]) geographic_restriction = PermissibleValue( text="geographic_restriction", - description="Data use limited to specific geographic region (GS)", + description="Data use limited to specific geographic region (GS).", meaning=DUO["0000022"]) institution_specific = PermissibleValue( text="institution_specific", - description="Data use limited to approved institutions (IS)", + description="Data use limited to approved institutions (IS).", meaning=DUO["0000028"]) project_specific = PermissibleValue( text="project_specific", - description="Data use limited to approved project(s) (PS)", + description="Data use limited to approved project(s) (PS).", meaning=DUO["0000027"]) user_specific = PermissibleValue( text="user_specific", - description="Data use limited to approved users (US)", + description="Data use limited to approved users (US).", meaning=DUO["0000026"]) time_limit = PermissibleValue( text="time_limit", - description="Data use approved for limited time period (TS)", + description="Data use approved for limited time period (TS).", meaning=DUO["0000025"]) return_to_database = PermissibleValue( text="return_to_database", - description="Derived data must be returned to database/resource (RTN)", + description="Derived data must be returned to database/resource (RTN).", meaning=DUO["0000029"]) publication_moratorium = PermissibleValue( text="publication_moratorium", - description="Publication restricted until specified date (MOR)", + description="Publication restricted until specified date (MOR).", meaning=DUO["0000024"]) no_population_ancestry_research = PermissibleValue( text="no_population_ancestry_research", - description="Population/ancestry research prohibited (NPOA)", + description="Population/ancestry research prohibited (NPOA).", meaning=DUO["0000044"]) _defn = EnumDefinition( @@ -3594,39 +3815,39 @@ class FileTypeEnum(EnumDefinitionImpl): """ data_file = PermissibleValue( text="data_file", - description="A data file containing dataset content", + description="A data file containing dataset content.", meaning=SCHEMA["DataDownload"]) code_file = PermissibleValue( text="code_file", - description="A source code or script file", + description="A source code or script file.", meaning=SCHEMA["SoftwareSourceCode"]) documentation_file = PermissibleValue( text="documentation_file", - description="A documentation file (README, guide, etc.)", + description="A documentation file (README, guide, etc.).", meaning=SCHEMA["Documentation"]) metadata_file = PermissibleValue( text="metadata_file", - description="A metadata or annotation file", + description="A metadata or annotation file.", meaning=DCAT["CatalogRecord"]) configuration_file = PermissibleValue( text="configuration_file", - description="A configuration or settings file", + description="A configuration or settings file.", meaning=D4D["ConfigurationFile"]) notebook_file = PermissibleValue( text="notebook_file", - description="A computational notebook file (Jupyter, R Markdown, etc.)", + description="A computational notebook file (Jupyter, R Markdown, etc.).", meaning=D4D["NotebookFile"]) image_file = PermissibleValue( text="image_file", - description="An image or visualization file", + description="An image or visualization file.", meaning=SCHEMA["ImageObject"]) archive_file = PermissibleValue( text="archive_file", - description="An archive or compressed file", + description="An archive or compressed file.", meaning=D4D["ArchiveFile"]) other = PermissibleValue( text="other", - description="Other file type", + description="Other file type.", meaning=D4D["OtherFile"]) _defn = EnumDefinition( @@ -3640,43 +3861,43 @@ class FileCollectionTypeEnum(EnumDefinitionImpl): """ raw_data = PermissibleValue( text="raw_data", - description="Raw, unprocessed data files", + description="Raw, unprocessed data files.", meaning=D4D["RawData"]) processed_data = PermissibleValue( text="processed_data", - description="Cleaned, processed, or transformed data files", + description="Cleaned, processed, or transformed data files.", meaning=D4D["ProcessedData"]) training_split = PermissibleValue( text="training_split", - description="Files designated for model training", + description="Files designated for model training.", meaning=D4D["TrainingSplit"]) test_split = PermissibleValue( text="test_split", - description="Files designated for model testing", + description="Files designated for model testing.", meaning=D4D["TestSplit"]) validation_split = PermissibleValue( text="validation_split", - description="Files designated for model validation", + description="Files designated for model validation.", meaning=D4D["ValidationSplit"]) documentation = PermissibleValue( text="documentation", - description="Documentation files (README, codebook, etc.)", + description="Documentation files (README, codebook, etc.).", meaning=SCHEMA["Documentation"]) metadata = PermissibleValue( text="metadata", - description="Metadata or annotation files", + description="Metadata or annotation files.", meaning=DCAT["CatalogRecord"]) code = PermissibleValue( text="code", - description="Code or script files", + description="Code or script files.", meaning=SCHEMA["SoftwareSourceCode"]) supplementary = PermissibleValue( text="supplementary", - description="Supplementary materials", + description="Supplementary materials.", meaning=SCHEMA["SupplementalMaterial"]) other = PermissibleValue( text="other", - description="Other file collection type", + description="Other file collection type.", meaning=D4D["OtherFileCollection"]) _defn = EnumDefinition( @@ -3724,7 +3945,7 @@ class slots: slots.format = Slot(uri=DCTERMS.format, name="format", curie=DCTERMS.curie('format'), model_uri=DATA_SHEETS_SCHEMA.format, domain=None, range=Optional[Union[str, "FormatEnum"]]) -slots.encoding = Slot(uri=DCAT.mediaType, name="encoding", curie=DCAT.curie('mediaType'), +slots.encoding = Slot(uri=D4D.characterEncoding, name="encoding", curie=D4D.curie('characterEncoding'), model_uri=DATA_SHEETS_SCHEMA.encoding, domain=None, range=Optional[Union[str, "EncodingEnum"]]) slots.compression = Slot(uri=DCAT.compressFormat, name="compression", curie=DCAT.curie('compressFormat'), @@ -3733,22 +3954,22 @@ class slots: slots.media_type = Slot(uri=DCAT.mediaType, name="media_type", curie=DCAT.curie('mediaType'), model_uri=DATA_SHEETS_SCHEMA.media_type, domain=None, range=Optional[Union[str, "MediaTypeEnum"]]) -slots.hash = Slot(uri=DCTERMS.identifier, name="hash", curie=DCTERMS.curie('identifier'), +slots.hash = Slot(uri=D4D.hashValue, name="hash", curie=D4D.curie('hashValue'), model_uri=DATA_SHEETS_SCHEMA.hash, domain=None, range=Optional[str]) -slots.md5 = Slot(uri=DCTERMS.identifier, name="md5", curie=DCTERMS.curie('identifier'), +slots.md5 = Slot(uri=D4D.md5Checksum, name="md5", curie=D4D.curie('md5Checksum'), model_uri=DATA_SHEETS_SCHEMA.md5, domain=None, range=Optional[str]) -slots.sha256 = Slot(uri=DCTERMS.identifier, name="sha256", curie=DCTERMS.curie('identifier'), +slots.sha256 = Slot(uri=SCHEMA.sha256, name="sha256", curie=SCHEMA.curie('sha256'), model_uri=DATA_SHEETS_SCHEMA.sha256, domain=None, range=Optional[str]) slots.conforms_to = Slot(uri=DCTERMS.conformsTo, name="conforms_to", curie=DCTERMS.curie('conformsTo'), model_uri=DATA_SHEETS_SCHEMA.conforms_to, domain=None, range=Optional[str]) -slots.conforms_to_schema = Slot(uri=DCTERMS.conformsTo, name="conforms_to_schema", curie=DCTERMS.curie('conformsTo'), +slots.conforms_to_schema = Slot(uri=D4D.conformsToSchema, name="conforms_to_schema", curie=D4D.curie('conformsToSchema'), model_uri=DATA_SHEETS_SCHEMA.conforms_to_schema, domain=None, range=Optional[str]) -slots.conforms_to_class = Slot(uri=DCTERMS.conformsTo, name="conforms_to_class", curie=DCTERMS.curie('conformsTo'), +slots.conforms_to_class = Slot(uri=D4D.conformsToClass, name="conforms_to_class", curie=D4D.curie('conformsToClass'), model_uri=DATA_SHEETS_SCHEMA.conforms_to_class, domain=None, range=Optional[str]) slots.license = Slot(uri=DCTERMS.license, name="license", curie=DCTERMS.curie('license'), @@ -3757,7 +3978,7 @@ class slots: slots.keywords = Slot(uri=DCAT.keyword, name="keywords", curie=DCAT.curie('keyword'), model_uri=DATA_SHEETS_SCHEMA.keywords, domain=None, range=Optional[Union[str, list[str]]]) -slots.version = Slot(uri=DCTERMS.hasVersion, name="version", curie=DCTERMS.curie('hasVersion'), +slots.version = Slot(uri=SCHEMA.version, name="version", curie=SCHEMA.curie('version'), model_uri=DATA_SHEETS_SCHEMA.version, domain=None, range=Optional[str]) slots.created_by = Slot(uri=DCTERMS.creator, name="created_by", curie=DCTERMS.curie('creator'), @@ -3772,13 +3993,13 @@ class slots: slots.modified_by = Slot(uri=DCTERMS.contributor, name="modified_by", curie=DCTERMS.curie('contributor'), model_uri=DATA_SHEETS_SCHEMA.modified_by, domain=None, range=Optional[str]) -slots.status = Slot(uri=DCTERMS.type, name="status", curie=DCTERMS.curie('type'), +slots.status = Slot(uri=D4D.publicationStatus, name="status", curie=D4D.curie('publicationStatus'), model_uri=DATA_SHEETS_SCHEMA.status, domain=None, range=Optional[str]) slots.was_derived_from = Slot(uri=PROV.wasDerivedFrom, name="was_derived_from", curie=PROV.curie('wasDerivedFrom'), model_uri=DATA_SHEETS_SCHEMA.was_derived_from, domain=None, range=Optional[str]) -slots.doi = Slot(uri=DCTERMS.identifier, name="doi", curie=DCTERMS.curie('identifier'), +slots.doi = Slot(uri=D4D.doiIdentifier, name="doi", curie=D4D.curie('doiIdentifier'), model_uri=DATA_SHEETS_SCHEMA.doi, domain=None, range=Optional[str], pattern=re.compile(r'10\.\d{4,}\/.+')) @@ -3812,7 +4033,7 @@ class slots: slots.dataset__funders = Slot(uri=SCHEMA.funder, name="dataset__funders", curie=SCHEMA.curie('funder'), model_uri=DATA_SHEETS_SCHEMA.dataset__funders, domain=None, range=Optional[Union[Union[dict, FundingMechanism], list[Union[dict, FundingMechanism]]]]) -slots.dataset__subsets = Slot(uri=DCAT.distribution, name="dataset__subsets", curie=DCAT.curie('distribution'), +slots.dataset__subsets = Slot(uri=D4D.dataSubset, name="dataset__subsets", curie=D4D.curie('dataSubset'), model_uri=DATA_SHEETS_SCHEMA.dataset__subsets, domain=None, range=Optional[Union[dict[Union[str, DataSubsetId], Union[dict, DataSubset]], list[Union[dict, DataSubset]]]]) slots.dataset__instances = Slot(uri=D4D.instances, name="dataset__instances", curie=D4D.curie('instances'), @@ -3839,6 +4060,12 @@ class slots: slots.dataset__sensitive_elements = Slot(uri=D4D.sensitiveElements, name="dataset__sensitive_elements", curie=D4D.curie('sensitiveElements'), model_uri=DATA_SHEETS_SCHEMA.dataset__sensitive_elements, domain=None, range=Optional[Union[Union[dict, SensitiveElement], list[Union[dict, SensitiveElement]]]]) +slots.dataset__relationships = Slot(uri=D4D.relationships, name="dataset__relationships", curie=D4D.curie('relationships'), + model_uri=DATA_SHEETS_SCHEMA.dataset__relationships, domain=None, range=Optional[Union[Union[dict, Relationships], list[Union[dict, Relationships]]]]) + +slots.dataset__splits = Slot(uri=D4D.splits, name="dataset__splits", curie=D4D.curie('splits'), + model_uri=DATA_SHEETS_SCHEMA.dataset__splits, domain=None, range=Optional[Union[Union[dict, Splits], list[Union[dict, Splits]]]]) + slots.dataset__acquisition_methods = Slot(uri=D4D.acquisitionMethods, name="dataset__acquisition_methods", curie=D4D.curie('acquisitionMethods'), model_uri=DATA_SHEETS_SCHEMA.dataset__acquisition_methods, domain=None, range=Optional[Union[Union[dict, InstanceAcquisition], list[Union[dict, InstanceAcquisition]]]]) @@ -3854,6 +4081,18 @@ class slots: slots.dataset__collection_timeframes = Slot(uri=D4D.collectionTimeframes, name="dataset__collection_timeframes", curie=D4D.curie('collectionTimeframes'), model_uri=DATA_SHEETS_SCHEMA.dataset__collection_timeframes, domain=None, range=Optional[Union[Union[dict, CollectionTimeframe], list[Union[dict, CollectionTimeframe]]]]) +slots.dataset__direct_collection = Slot(uri=D4D.directCollection, name="dataset__direct_collection", curie=D4D.curie('directCollection'), + model_uri=DATA_SHEETS_SCHEMA.dataset__direct_collection, domain=None, range=Optional[Union[Union[dict, DirectCollection], list[Union[dict, DirectCollection]]]]) + +slots.dataset__collection_notifications = Slot(uri=D4D.collectionNotifications, name="dataset__collection_notifications", curie=D4D.curie('collectionNotifications'), + model_uri=DATA_SHEETS_SCHEMA.dataset__collection_notifications, domain=None, range=Optional[Union[Union[dict, CollectionNotification], list[Union[dict, CollectionNotification]]]]) + +slots.dataset__collection_consents = Slot(uri=D4D.collectionConsents, name="dataset__collection_consents", curie=D4D.curie('collectionConsents'), + model_uri=DATA_SHEETS_SCHEMA.dataset__collection_consents, domain=None, range=Optional[Union[Union[dict, CollectionConsent], list[Union[dict, CollectionConsent]]]]) + +slots.dataset__consent_revocations = Slot(uri=D4D.consentRevocations, name="dataset__consent_revocations", curie=D4D.curie('consentRevocations'), + model_uri=DATA_SHEETS_SCHEMA.dataset__consent_revocations, domain=None, range=Optional[Union[Union[dict, ConsentRevocation], list[Union[dict, ConsentRevocation]]]]) + slots.dataset__missing_data_documentation = Slot(uri=D4D.missingDataDocumentation, name="dataset__missing_data_documentation", curie=D4D.curie('missingDataDocumentation'), model_uri=DATA_SHEETS_SCHEMA.dataset__missing_data_documentation, domain=None, range=Optional[Union[Union[dict, MissingDataDocumentation], list[Union[dict, MissingDataDocumentation]]]]) @@ -3929,6 +4168,9 @@ class slots: slots.dataset__distribution_dates = Slot(uri=D4D.distributionDates, name="dataset__distribution_dates", curie=D4D.curie('distributionDates'), model_uri=DATA_SHEETS_SCHEMA.dataset__distribution_dates, domain=None, range=Optional[Union[Union[dict, DistributionDate], list[Union[dict, DistributionDate]]]]) +slots.dataset__third_party_sharing = Slot(uri=D4D.thirdPartySharing, name="dataset__third_party_sharing", curie=D4D.curie('thirdPartySharing'), + model_uri=DATA_SHEETS_SCHEMA.dataset__third_party_sharing, domain=None, range=Optional[Union[Union[dict, ThirdPartySharing], list[Union[dict, ThirdPartySharing]]]]) + slots.dataset__license_and_use_terms = Slot(uri=SCHEMA.license, name="dataset__license_and_use_terms", curie=SCHEMA.curie('license'), model_uri=DATA_SHEETS_SCHEMA.dataset__license_and_use_terms, domain=None, range=Optional[Union[dict, LicenseAndUseTerms]]) @@ -3950,7 +4192,7 @@ class slots: slots.dataset__retention_limit = Slot(uri=D4D.retentionLimit, name="dataset__retention_limit", curie=D4D.curie('retentionLimit'), model_uri=DATA_SHEETS_SCHEMA.dataset__retention_limit, domain=None, range=Optional[Union[dict, RetentionLimits]]) -slots.dataset__version_access = Slot(uri=DCAT.accessURL, name="dataset__version_access", curie=DCAT.curie('accessURL'), +slots.dataset__version_access = Slot(uri=D4D.versionAccess, name="dataset__version_access", curie=D4D.curie('versionAccess'), model_uri=DATA_SHEETS_SCHEMA.dataset__version_access, domain=None, range=Optional[Union[dict, VersionAccess]]) slots.dataset__extension_mechanism = Slot(uri=D4D.extensionMechanism, name="dataset__extension_mechanism", curie=D4D.curie('extensionMechanism'), @@ -3962,7 +4204,7 @@ class slots: slots.dataset__is_deidentified = Slot(uri=D4D.isDeidentified, name="dataset__is_deidentified", curie=D4D.curie('isDeidentified'), model_uri=DATA_SHEETS_SCHEMA.dataset__is_deidentified, domain=None, range=Optional[Union[dict, Deidentification]]) -slots.dataset__is_tabular = Slot(uri=SCHEMA.encodingFormat, name="dataset__is_tabular", curie=SCHEMA.curie('encodingFormat'), +slots.dataset__is_tabular = Slot(uri=D4D.isTabular, name="dataset__is_tabular", curie=D4D.curie('isTabular'), model_uri=DATA_SHEETS_SCHEMA.dataset__is_tabular, domain=None, range=Optional[Union[bool, Bool]]) slots.dataset__citation = Slot(uri=SCHEMA.citation, name="dataset__citation", curie=SCHEMA.curie('citation'), @@ -4008,7 +4250,7 @@ class slots: model_uri=DATA_SHEETS_SCHEMA.software__license, domain=None, range=Optional[str]) slots.software__url = Slot(uri=SCHEMA.url, name="software__url", curie=SCHEMA.curie('url'), - model_uri=DATA_SHEETS_SCHEMA.software__url, domain=None, range=Optional[str]) + model_uri=DATA_SHEETS_SCHEMA.software__url, domain=None, range=Optional[Union[str, URI]]) slots.person__affiliation = Slot(uri=SCHEMA.affiliation, name="person__affiliation", curie=SCHEMA.curie('affiliation'), model_uri=DATA_SHEETS_SCHEMA.person__affiliation, domain=None, range=Optional[Union[Union[str, OrganizationId], list[Union[str, OrganizationId]]]]) @@ -4016,7 +4258,7 @@ class slots: slots.person__email = Slot(uri=SCHEMA.email, name="person__email", curie=SCHEMA.curie('email'), model_uri=DATA_SHEETS_SCHEMA.person__email, domain=None, range=Optional[str]) -slots.person__orcid = Slot(uri=SCHEMA.identifier, name="person__orcid", curie=SCHEMA.curie('identifier'), +slots.person__orcid = Slot(uri=D4D.orcidIdentifier, name="person__orcid", curie=D4D.curie('orcidIdentifier'), model_uri=DATA_SHEETS_SCHEMA.person__orcid, domain=None, range=Optional[str], pattern=re.compile(r'^\d{4}-\d{4}-\d{4}-\d{3}[0-9X]$')) @@ -4035,19 +4277,19 @@ class slots: slots.formatDialect__quote_char = Slot(uri=DATA_SHEETS_SCHEMA.quote_char, name="formatDialect__quote_char", curie=DATA_SHEETS_SCHEMA.curie('quote_char'), model_uri=DATA_SHEETS_SCHEMA.formatDialect__quote_char, domain=None, range=Optional[str]) -slots.purpose__response = Slot(uri=DCTERMS.description, name="purpose__response", curie=DCTERMS.curie('description'), +slots.purpose__response = Slot(uri=D4D.questionResponse, name="purpose__response", curie=D4D.curie('questionResponse'), model_uri=DATA_SHEETS_SCHEMA.purpose__response, domain=None, range=Optional[str]) -slots.task__response = Slot(uri=DCTERMS.description, name="task__response", curie=DCTERMS.curie('description'), +slots.task__response = Slot(uri=D4D.questionResponse, name="task__response", curie=D4D.curie('questionResponse'), model_uri=DATA_SHEETS_SCHEMA.task__response, domain=None, range=Optional[str]) -slots.addressingGap__response = Slot(uri=DCTERMS.description, name="addressingGap__response", curie=DCTERMS.curie('description'), +slots.addressingGap__response = Slot(uri=D4D.questionResponse, name="addressingGap__response", curie=D4D.curie('questionResponse'), model_uri=DATA_SHEETS_SCHEMA.addressingGap__response, domain=None, range=Optional[str]) -slots.creator__principal_investigator = Slot(uri=DCTERMS.creator, name="creator__principal_investigator", curie=DCTERMS.curie('creator'), +slots.creator__principal_investigator = Slot(uri=D4D.principalInvestigator, name="creator__principal_investigator", curie=D4D.curie('principalInvestigator'), model_uri=DATA_SHEETS_SCHEMA.creator__principal_investigator, domain=None, range=Optional[Union[str, PersonId]]) -slots.creator__affiliations = Slot(uri=SCHEMA.affiliation, name="creator__affiliations", curie=SCHEMA.curie('affiliation'), +slots.creator__affiliations = Slot(uri=D4D.teamAffiliation, name="creator__affiliations", curie=D4D.curie('teamAffiliation'), model_uri=DATA_SHEETS_SCHEMA.creator__affiliations, domain=None, range=Optional[Union[dict[Union[str, OrganizationId], Union[dict, Organization]], list[Union[dict, Organization]]]]) slots.creator__credit_roles = Slot(uri=D4D.creditRoles, name="creator__credit_roles", curie=D4D.curie('creditRoles'), @@ -4059,16 +4301,16 @@ class slots: slots.fundingMechanism__grants = Slot(uri=SCHEMA.funding, name="fundingMechanism__grants", curie=SCHEMA.curie('funding'), model_uri=DATA_SHEETS_SCHEMA.fundingMechanism__grants, domain=None, range=Optional[Union[dict[Union[str, GrantId], Union[dict, Grant]], list[Union[dict, Grant]]]]) -slots.grant__grant_number = Slot(uri=SCHEMA.identifier, name="grant__grant_number", curie=SCHEMA.curie('identifier'), +slots.grant__grant_number = Slot(uri=D4D.grantIdentifier, name="grant__grant_number", curie=D4D.curie('grantIdentifier'), model_uri=DATA_SHEETS_SCHEMA.grant__grant_number, domain=None, range=Optional[str]) slots.instance__data_topic = Slot(uri=DCAT.theme, name="instance__data_topic", curie=DCAT.curie('theme'), model_uri=DATA_SHEETS_SCHEMA.instance__data_topic, domain=None, range=Optional[Union[str, URIorCURIE]]) -slots.instance__instance_type = Slot(uri=DCTERMS.type, name="instance__instance_type", curie=DCTERMS.curie('type'), +slots.instance__instance_type = Slot(uri=D4D.instanceType, name="instance__instance_type", curie=D4D.curie('instanceType'), model_uri=DATA_SHEETS_SCHEMA.instance__instance_type, domain=None, range=Optional[str]) -slots.instance__data_substrate = Slot(uri=DCTERMS.format, name="instance__data_substrate", curie=DCTERMS.curie('format'), +slots.instance__data_substrate = Slot(uri=DCTERMS.type, name="instance__data_substrate", curie=DCTERMS.curie('type'), model_uri=DATA_SHEETS_SCHEMA.instance__data_substrate, domain=None, range=Optional[Union[str, URIorCURIE]]) slots.instance__counts = Slot(uri=SCHEMA.numberOfItems, name="instance__counts", curie=SCHEMA.curie('numberOfItems'), @@ -4077,7 +4319,7 @@ class slots: slots.instance__label = Slot(uri=D4D.hasLabel, name="instance__label", curie=D4D.curie('hasLabel'), model_uri=DATA_SHEETS_SCHEMA.instance__label, domain=None, range=Optional[Union[bool, Bool]]) -slots.instance__label_description = Slot(uri=SCHEMA.description, name="instance__label_description", curie=SCHEMA.curie('description'), +slots.instance__label_description = Slot(uri=D4D.labelPattern, name="instance__label_description", curie=D4D.curie('labelPattern'), model_uri=DATA_SHEETS_SCHEMA.instance__label_description, domain=None, range=Optional[str]) slots.instance__sampling_strategies = Slot(uri=D4D.samplingStrategies, name="instance__sampling_strategies", curie=D4D.curie('samplingStrategies'), @@ -4087,18 +4329,18 @@ class slots: model_uri=DATA_SHEETS_SCHEMA.instance__missing_information, domain=None, range=Optional[Union[Union[dict, MissingInfo], list[Union[dict, MissingInfo]]]]) slots.samplingStrategy__is_sample = Slot(uri=D4D.isSample, name="samplingStrategy__is_sample", curie=D4D.curie('isSample'), - model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_sample, domain=None, range=Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]]) + model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_sample, domain=None, range=Optional[Union[bool, Bool]]) slots.samplingStrategy__is_random = Slot(uri=D4D.isRandom, name="samplingStrategy__is_random", curie=D4D.curie('isRandom'), - model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_random, domain=None, range=Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]]) + model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_random, domain=None, range=Optional[Union[bool, Bool]]) slots.samplingStrategy__source_data = Slot(uri=D4D.sourceData, name="samplingStrategy__source_data", curie=D4D.curie('sourceData'), model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__source_data, domain=None, range=Optional[Union[str, list[str]]]) slots.samplingStrategy__is_representative = Slot(uri=D4D.isRepresentative, name="samplingStrategy__is_representative", curie=D4D.curie('isRepresentative'), - model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_representative, domain=None, range=Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]]) + model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__is_representative, domain=None, range=Optional[Union[bool, Bool]]) -slots.samplingStrategy__representative_verification = Slot(uri=SCHEMA.description, name="samplingStrategy__representative_verification", curie=SCHEMA.curie('description'), +slots.samplingStrategy__representative_verification = Slot(uri=D4D.verificationDescription, name="samplingStrategy__representative_verification", curie=D4D.curie('verificationDescription'), model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__representative_verification, domain=None, range=Optional[Union[str, list[str]]]) slots.samplingStrategy__why_not_representative = Slot(uri=D4D.whyNotRepresentative, name="samplingStrategy__why_not_representative", curie=D4D.curie('whyNotRepresentative'), @@ -4107,10 +4349,10 @@ class slots: slots.samplingStrategy__strategies = Slot(uri=D4D.strategies, name="samplingStrategy__strategies", curie=D4D.curie('strategies'), model_uri=DATA_SHEETS_SCHEMA.samplingStrategy__strategies, domain=None, range=Optional[Union[str, list[str]]]) -slots.missingInfo__missing = Slot(uri=DCTERMS.description, name="missingInfo__missing", curie=DCTERMS.curie('description'), +slots.missingInfo__missing = Slot(uri=D4D.missingDataDescription, name="missingInfo__missing", curie=D4D.curie('missingDataDescription'), model_uri=DATA_SHEETS_SCHEMA.missingInfo__missing, domain=None, range=Optional[Union[str, list[str]]]) -slots.missingInfo__why_missing = Slot(uri=DCTERMS.description, name="missingInfo__why_missing", curie=DCTERMS.curie('description'), +slots.missingInfo__why_missing = Slot(uri=D4D.missingDataCause, name="missingInfo__why_missing", curie=D4D.curie('missingDataCause'), model_uri=DATA_SHEETS_SCHEMA.missingInfo__why_missing, domain=None, range=Optional[Union[str, list[str]]]) slots.relationships__relationship_details = Slot(uri=DCTERMS.description, name="relationships__relationship_details", curie=DCTERMS.curie('description'), @@ -4119,7 +4361,7 @@ class slots: slots.splits__split_details = Slot(uri=DCTERMS.description, name="splits__split_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.splits__split_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.dataAnomaly__anomaly_details = Slot(uri=DCTERMS.description, name="dataAnomaly__anomaly_details", curie=DCTERMS.curie('description'), +slots.dataAnomaly__anomaly_details = Slot(uri=D4D.anomalyDetails, name="dataAnomaly__anomaly_details", curie=D4D.curie('anomalyDetails'), model_uri=DATA_SHEETS_SCHEMA.dataAnomaly__anomaly_details, domain=None, range=Optional[Union[str, list[str]]]) slots.datasetBias__bias_type = Slot(uri=D4D.biasType, name="datasetBias__bias_type", curie=D4D.curie('biasType'), @@ -4146,13 +4388,13 @@ class slots: slots.datasetLimitation__recommended_mitigation = Slot(uri=D4D.recommendedMitigation, name="datasetLimitation__recommended_mitigation", curie=D4D.curie('recommendedMitigation'), model_uri=DATA_SHEETS_SCHEMA.datasetLimitation__recommended_mitigation, domain=None, range=Optional[str]) -slots.externalResource__future_guarantees = Slot(uri=DCTERMS.description, name="externalResource__future_guarantees", curie=DCTERMS.curie('description'), +slots.externalResource__future_guarantees = Slot(uri=D4D.availabilityGuarantee, name="externalResource__future_guarantees", curie=D4D.curie('availabilityGuarantee'), model_uri=DATA_SHEETS_SCHEMA.externalResource__future_guarantees, domain=None, range=Optional[Union[str, list[str]]]) -slots.externalResource__archival = Slot(uri=SCHEMA.archivedAt, name="externalResource__archival", curie=SCHEMA.curie('archivedAt'), - model_uri=DATA_SHEETS_SCHEMA.externalResource__archival, domain=None, range=Optional[Union[Union[bool, Bool], list[Union[bool, Bool]]]]) +slots.externalResource__archival = Slot(uri=D4D.hasArchivalVersion, name="externalResource__archival", curie=D4D.curie('hasArchivalVersion'), + model_uri=DATA_SHEETS_SCHEMA.externalResource__archival, domain=None, range=Optional[Union[bool, Bool]]) -slots.externalResource__restrictions = Slot(uri=DCTERMS.accessRights, name="externalResource__restrictions", curie=DCTERMS.curie('accessRights'), +slots.externalResource__restrictions = Slot(uri=D4D.externalResourceRestrictions, name="externalResource__restrictions", curie=D4D.curie('externalResourceRestrictions'), model_uri=DATA_SHEETS_SCHEMA.externalResource__restrictions, domain=None, range=Optional[Union[str, list[str]]]) slots.confidentiality__confidential_elements_present = Slot(uri=D4D.confidential_elements_present, name="confidentiality__confidential_elements_present", curie=D4D.curie('confidential_elements_present'), @@ -4170,10 +4412,10 @@ class slots: slots.subpopulation__subpopulation_elements_present = Slot(uri=D4D.subpopulationElementsPresent, name="subpopulation__subpopulation_elements_present", curie=D4D.curie('subpopulationElementsPresent'), model_uri=DATA_SHEETS_SCHEMA.subpopulation__subpopulation_elements_present, domain=None, range=Optional[Union[bool, Bool]]) -slots.subpopulation__identification = Slot(uri=DCTERMS.description, name="subpopulation__identification", curie=DCTERMS.curie('description'), +slots.subpopulation__identification = Slot(uri=D4D.subpopulationIdentification, name="subpopulation__identification", curie=D4D.curie('subpopulationIdentification'), model_uri=DATA_SHEETS_SCHEMA.subpopulation__identification, domain=None, range=Optional[Union[str, list[str]]]) -slots.subpopulation__distribution = Slot(uri=DCTERMS.description, name="subpopulation__distribution", curie=DCTERMS.curie('description'), +slots.subpopulation__distribution = Slot(uri=D4D.subpopulationDistribution, name="subpopulation__distribution", curie=D4D.curie('subpopulationDistribution'), model_uri=DATA_SHEETS_SCHEMA.subpopulation__distribution, domain=None, range=Optional[Union[str, list[str]]]) slots.deidentification__identifiable_elements_present = Slot(uri=D4D.identifiableElementsPresent, name="deidentification__identifiable_elements_present", curie=D4D.curie('identifiableElementsPresent'), @@ -4182,7 +4424,7 @@ class slots: slots.deidentification__method = Slot(uri=D4DCOMPOSITION.method, name="deidentification__method", curie=D4DCOMPOSITION.curie('method'), model_uri=DATA_SHEETS_SCHEMA.deidentification__method, domain=None, range=Optional[str]) -slots.deidentification__identifiers_removed = Slot(uri=SCHEMA.identifier, name="deidentification__identifiers_removed", curie=SCHEMA.curie('identifier'), +slots.deidentification__identifiers_removed = Slot(uri=D4D.removedIdentifierTypes, name="deidentification__identifiers_removed", curie=D4D.curie('removedIdentifierTypes'), model_uri=DATA_SHEETS_SCHEMA.deidentification__identifiers_removed, domain=None, range=Optional[Union[str, list[str]]]) slots.deidentification__deidentification_details = Slot(uri=DCTERMS.description, name="deidentification__deidentification_details", curie=DCTERMS.curie('description'), @@ -4194,7 +4436,7 @@ class slots: slots.sensitiveElement__sensitivity_details = Slot(uri=DCTERMS.description, name="sensitiveElement__sensitivity_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.sensitiveElement__sensitivity_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.datasetRelationship__target_dataset = Slot(uri=SCHEMA.identifier, name="datasetRelationship__target_dataset", curie=SCHEMA.curie('identifier'), +slots.datasetRelationship__target_dataset = Slot(uri=DCTERMS.relation, name="datasetRelationship__target_dataset", curie=DCTERMS.curie('relation'), model_uri=DATA_SHEETS_SCHEMA.datasetRelationship__target_dataset, domain=None, range=str) slots.datasetRelationship__relationship_type = Slot(uri=SCHEMA.additionalType, name="datasetRelationship__relationship_type", curie=SCHEMA.curie('additionalType'), @@ -4254,7 +4496,7 @@ class slots: slots.rawDataSource__source_description = Slot(uri=DCTERMS.description, name="rawDataSource__source_description", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.rawDataSource__source_description, domain=None, range=str) -slots.rawDataSource__source_type = Slot(uri=DCTERMS.type, name="rawDataSource__source_type", curie=DCTERMS.curie('type'), +slots.rawDataSource__source_type = Slot(uri=D4D.sourceType, name="rawDataSource__source_type", curie=D4D.curie('sourceType'), model_uri=DATA_SHEETS_SCHEMA.rawDataSource__source_type, domain=None, range=Optional[Union[str, list[str]]]) slots.rawDataSource__access_details = Slot(uri=D4D.accessDetails, name="rawDataSource__access_details", curie=D4D.curie('accessDetails'), @@ -4269,8 +4511,8 @@ class slots: slots.cleaningStrategy__cleaning_details = Slot(uri=DCTERMS.description, name="cleaningStrategy__cleaning_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.cleaningStrategy__cleaning_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.labelingStrategy__data_annotation_platform = Slot(uri=SCHEMA.instrument, name="labelingStrategy__data_annotation_platform", curie=SCHEMA.curie('instrument'), - model_uri=DATA_SHEETS_SCHEMA.labelingStrategy__data_annotation_platform, domain=None, range=Optional[str]) +slots.labelingStrategy__data_annotation_platform = Slot(uri=RAI.dataAnnotationPlatform, name="labelingStrategy__data_annotation_platform", curie=RAI.curie('dataAnnotationPlatform'), + model_uri=DATA_SHEETS_SCHEMA.labelingStrategy__data_annotation_platform, domain=None, range=Optional[Union[str, list[str]]]) slots.labelingStrategy__data_annotation_protocol = Slot(uri=D4D.dataAnnotationProtocol, name="labelingStrategy__data_annotation_protocol", curie=D4D.curie('dataAnnotationProtocol'), model_uri=DATA_SHEETS_SCHEMA.labelingStrategy__data_annotation_protocol, domain=None, range=Optional[Union[str, list[str]]]) @@ -4287,7 +4529,7 @@ class slots: slots.labelingStrategy__labeling_details = Slot(uri=DCTERMS.description, name="labelingStrategy__labeling_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.labelingStrategy__labeling_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.rawData__access_url = Slot(uri=DCAT.accessURL, name="rawData__access_url", curie=DCAT.curie('accessURL'), +slots.rawData__access_url = Slot(uri=D4D.rawDataAccessURL, name="rawData__access_url", curie=D4D.curie('rawDataAccessURL'), model_uri=DATA_SHEETS_SCHEMA.rawData__access_url, domain=None, range=Optional[Union[str, URI]]) slots.rawData__raw_data_details = Slot(uri=DCTERMS.description, name="rawData__raw_data_details", curie=DCTERMS.curie('description'), @@ -4320,7 +4562,7 @@ class slots: slots.annotationAnalysis__annotation_quality_details = Slot(uri=D4D.annotationQualityDetails, name="annotationAnalysis__annotation_quality_details", curie=D4D.curie('annotationQualityDetails'), model_uri=DATA_SHEETS_SCHEMA.annotationAnalysis__annotation_quality_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.machineAnnotationTools__tools = Slot(uri=SCHEMA.name, name="machineAnnotationTools__tools", curie=SCHEMA.curie('name'), +slots.machineAnnotationTools__tools = Slot(uri=D4D.toolNames, name="machineAnnotationTools__tools", curie=D4D.curie('toolNames'), model_uri=DATA_SHEETS_SCHEMA.machineAnnotationTools__tools, domain=None, range=Optional[Union[str, list[str]]]) slots.machineAnnotationTools__tool_descriptions = Slot(uri=D4D.toolDescriptions, name="machineAnnotationTools__tool_descriptions", curie=D4D.curie('toolDescriptions'), @@ -4359,11 +4601,11 @@ class slots: slots.prohibitedUse__prohibition_reason = Slot(uri=D4D.prohibitionReason, name="prohibitedUse__prohibition_reason", curie=D4D.curie('prohibitionReason'), model_uri=DATA_SHEETS_SCHEMA.prohibitedUse__prohibition_reason, domain=None, range=Optional[Union[str, list[str]]]) -slots.thirdPartySharing__is_shared = Slot(uri=DCTERMS.accessRights, name="thirdPartySharing__is_shared", curie=DCTERMS.curie('accessRights'), +slots.thirdPartySharing__is_shared = Slot(uri=D4D.isExternallyShared, name="thirdPartySharing__is_shared", curie=D4D.curie('isExternallyShared'), model_uri=DATA_SHEETS_SCHEMA.thirdPartySharing__is_shared, domain=None, range=Optional[Union[bool, Bool]]) slots.distributionFormat__access_urls = Slot(uri=DCAT.accessURL, name="distributionFormat__access_urls", curie=DCAT.curie('accessURL'), - model_uri=DATA_SHEETS_SCHEMA.distributionFormat__access_urls, domain=None, range=Optional[Union[str, list[str]]]) + model_uri=DATA_SHEETS_SCHEMA.distributionFormat__access_urls, domain=None, range=Optional[Union[Union[str, URI], list[Union[str, URI]]]]) slots.distributionDate__release_dates = Slot(uri=DCTERMS.available, name="distributionDate__release_dates", curie=DCTERMS.curie('available'), model_uri=DATA_SHEETS_SCHEMA.distributionDate__release_dates, domain=None, range=Optional[Union[str, list[str]]]) @@ -4374,7 +4616,7 @@ class slots: slots.maintainer__maintainer_details = Slot(uri=DCTERMS.description, name="maintainer__maintainer_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.maintainer__maintainer_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.erratum__erratum_url = Slot(uri=DCAT.accessURL, name="erratum__erratum_url", curie=DCAT.curie('accessURL'), +slots.erratum__erratum_url = Slot(uri=D4D.erratumURL, name="erratum__erratum_url", curie=D4D.curie('erratumURL'), model_uri=DATA_SHEETS_SCHEMA.erratum__erratum_url, domain=None, range=Optional[Union[str, URI]]) slots.erratum__erratum_details = Slot(uri=DCTERMS.description, name="erratum__erratum_details", curie=DCTERMS.curie('description'), @@ -4392,8 +4634,8 @@ class slots: slots.retentionLimits__retention_details = Slot(uri=DCTERMS.description, name="retentionLimits__retention_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.retentionLimits__retention_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.versionAccess__latest_version_doi = Slot(uri=SCHEMA.identifier, name="versionAccess__latest_version_doi", curie=SCHEMA.curie('identifier'), - model_uri=DATA_SHEETS_SCHEMA.versionAccess__latest_version_doi, domain=None, range=Optional[str]) +slots.versionAccess__latest_version_doi = Slot(uri=DCTERMS.hasVersion, name="versionAccess__latest_version_doi", curie=DCTERMS.curie('hasVersion'), + model_uri=DATA_SHEETS_SCHEMA.versionAccess__latest_version_doi, domain=None, range=Optional[Union[str, URIorCURIE]]) slots.versionAccess__versions_available = Slot(uri=D4D.versionsAvailable, name="versionAccess__versions_available", curie=D4D.curie('versionsAvailable'), model_uri=DATA_SHEETS_SCHEMA.versionAccess__versions_available, domain=None, range=Optional[Union[str, list[str]]]) @@ -4401,13 +4643,13 @@ class slots: slots.versionAccess__version_details = Slot(uri=DCTERMS.description, name="versionAccess__version_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.versionAccess__version_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.extensionMechanism__contribution_url = Slot(uri=DCAT.landingPage, name="extensionMechanism__contribution_url", curie=DCAT.curie('landingPage'), +slots.extensionMechanism__contribution_url = Slot(uri=D4D.contributionURL, name="extensionMechanism__contribution_url", curie=D4D.curie('contributionURL'), model_uri=DATA_SHEETS_SCHEMA.extensionMechanism__contribution_url, domain=None, range=Optional[Union[str, URI]]) slots.extensionMechanism__extension_details = Slot(uri=DCTERMS.description, name="extensionMechanism__extension_details", curie=DCTERMS.curie('description'), model_uri=DATA_SHEETS_SCHEMA.extensionMechanism__extension_details, domain=None, range=Optional[Union[str, list[str]]]) -slots.ethicalReview__contact_person = Slot(uri=SCHEMA.contactPoint, name="ethicalReview__contact_person", curie=SCHEMA.curie('contactPoint'), +slots.ethicalReview__contact_person = Slot(uri=D4D.ethicsContactPoint, name="ethicalReview__contact_person", curie=D4D.curie('ethicsContactPoint'), model_uri=DATA_SHEETS_SCHEMA.ethicalReview__contact_person, domain=None, range=Optional[Union[str, PersonId]]) slots.ethicalReview__reviewing_organization = Slot(uri=SCHEMA.provider, name="ethicalReview__reviewing_organization", curie=SCHEMA.curie('provider'), @@ -4494,13 +4736,13 @@ class slots: slots.atRiskPopulations__guardian_consent = Slot(uri=D4D.guardianConsent, name="atRiskPopulations__guardian_consent", curie=D4D.curie('guardianConsent'), model_uri=DATA_SHEETS_SCHEMA.atRiskPopulations__guardian_consent, domain=None, range=Optional[Union[str, list[str]]]) -slots.licenseAndUseTerms__license_terms = Slot(uri=DCTERMS.license, name="licenseAndUseTerms__license_terms", curie=DCTERMS.curie('license'), +slots.licenseAndUseTerms__license_terms = Slot(uri=D4D.licenseDescription, name="licenseAndUseTerms__license_terms", curie=D4D.curie('licenseDescription'), model_uri=DATA_SHEETS_SCHEMA.licenseAndUseTerms__license_terms, domain=None, range=Optional[Union[str, list[str]]]) slots.licenseAndUseTerms__data_use_permission = Slot(uri=DUO['0000001'], name="licenseAndUseTerms__data_use_permission", curie=DUO.curie('0000001'), model_uri=DATA_SHEETS_SCHEMA.licenseAndUseTerms__data_use_permission, domain=None, range=Optional[Union[Union[str, "DataUsePermissionEnum"], list[Union[str, "DataUsePermissionEnum"]]]]) -slots.licenseAndUseTerms__contact_person = Slot(uri=SCHEMA.contactPoint, name="licenseAndUseTerms__contact_person", curie=SCHEMA.curie('contactPoint'), +slots.licenseAndUseTerms__contact_person = Slot(uri=D4D.licenseContactPoint, name="licenseAndUseTerms__contact_person", curie=D4D.curie('licenseContactPoint'), model_uri=DATA_SHEETS_SCHEMA.licenseAndUseTerms__contact_person, domain=None, range=Optional[Union[str, PersonId]]) slots.iPRestrictions__restrictions = Slot(uri=DCTERMS.rights, name="iPRestrictions__restrictions", curie=DCTERMS.curie('rights'), @@ -4518,10 +4760,10 @@ class slots: slots.exportControlRegulatoryRestrictions__confidentiality_level = Slot(uri=D4D.confidentialityLevel, name="exportControlRegulatoryRestrictions__confidentiality_level", curie=D4D.curie('confidentialityLevel'), model_uri=DATA_SHEETS_SCHEMA.exportControlRegulatoryRestrictions__confidentiality_level, domain=None, range=Optional[Union[str, "ConfidentialityLevelEnum"]]) -slots.exportControlRegulatoryRestrictions__governance_committee_contact = Slot(uri=SCHEMA.contactPoint, name="exportControlRegulatoryRestrictions__governance_committee_contact", curie=SCHEMA.curie('contactPoint'), +slots.exportControlRegulatoryRestrictions__governance_committee_contact = Slot(uri=D4D.governanceContactPoint, name="exportControlRegulatoryRestrictions__governance_committee_contact", curie=D4D.curie('governanceContactPoint'), model_uri=DATA_SHEETS_SCHEMA.exportControlRegulatoryRestrictions__governance_committee_contact, domain=None, range=Optional[Union[str, PersonId]]) -slots.variableMetadata__variable_name = Slot(uri=SCHEMA.name, name="variableMetadata__variable_name", curie=SCHEMA.curie('name'), +slots.variableMetadata__variable_name = Slot(uri=D4D.variableName, name="variableMetadata__variable_name", curie=D4D.curie('variableName'), model_uri=DATA_SHEETS_SCHEMA.variableMetadata__variable_name, domain=None, range=str) slots.variableMetadata__data_type = Slot(uri=SCHEMA.DataType, name="variableMetadata__data_type", curie=SCHEMA.curie('DataType'), @@ -4545,7 +4787,7 @@ class slots: slots.variableMetadata__examples = Slot(uri=SKOS.example, name="variableMetadata__examples", curie=SKOS.curie('example'), model_uri=DATA_SHEETS_SCHEMA.variableMetadata__examples, domain=None, range=Optional[Union[str, list[str]]]) -slots.variableMetadata__is_identifier = Slot(uri=SCHEMA.identifier, name="variableMetadata__is_identifier", curie=SCHEMA.curie('identifier'), +slots.variableMetadata__is_identifier = Slot(uri=D4D.isIdentifier, name="variableMetadata__is_identifier", curie=D4D.curie('isIdentifier'), model_uri=DATA_SHEETS_SCHEMA.variableMetadata__is_identifier, domain=None, range=Optional[Union[bool, Bool]]) slots.variableMetadata__is_sensitive = Slot(uri=D4D.isSensitive, name="variableMetadata__is_sensitive", curie=D4D.curie('isSensitive'), @@ -4560,7 +4802,7 @@ class slots: slots.variableMetadata__derivation = Slot(uri=DCTERMS.provenance, name="variableMetadata__derivation", curie=DCTERMS.curie('provenance'), model_uri=DATA_SHEETS_SCHEMA.variableMetadata__derivation, domain=None, range=Optional[str]) -slots.variableMetadata__quality_notes = Slot(uri=DCTERMS.description, name="variableMetadata__quality_notes", curie=DCTERMS.curie('description'), +slots.variableMetadata__quality_notes = Slot(uri=D4D.qualityNotes, name="variableMetadata__quality_notes", curie=D4D.curie('qualityNotes'), model_uri=DATA_SHEETS_SCHEMA.variableMetadata__quality_notes, domain=None, range=Optional[Union[str, list[str]]]) slots.file__file_type = Slot(uri=D4D.fileType, name="file__file_type", curie=D4D.curie('fileType'), diff --git a/src/data_sheets_schema/schema/D4D_Base_import.yaml b/src/data_sheets_schema/schema/D4D_Base_import.yaml index a846a0a9..96fbbc52 100644 --- a/src/data_sheets_schema/schema/D4D_Base_import.yaml +++ b/src/data_sheets_schema/schema/D4D_Base_import.yaml @@ -96,6 +96,8 @@ classes: slot_uri: schema:identifier range: uriorcurie description: A unique identifier for a thing. + annotations: + "d4d:docExample": "https://example.org/dataset/my-dataset-001" name: slot_uri: schema:name description: A human-readable name for a thing. @@ -121,6 +123,8 @@ classes: slot_uri: schema:identifier range: uriorcurie description: An optional identifier for this property. + annotations: + "d4d:docExample": "https://example.org/dataset/property-001" name: slot_uri: schema:name description: A human-readable name for this property. @@ -144,13 +148,17 @@ classes: - schema:SoftwareApplication attributes: version: + description: The version identifier of the software (e.g., "1.0.0", "2.3.1-beta"). range: string slot_uri: schema:softwareVersion license: + description: >- + The license under which the software is distributed (e.g., "MIT", "Apache-2.0", "GPL-3.0"). range: string slot_uri: schema:license url: - range: string + description: URL where the software can be found (e.g., homepage, repository, or documentation). + range: uri slot_uri: schema:url Person: @@ -184,8 +192,10 @@ classes: Use this for stable cross-dataset identification. range: string pattern: "^\\d{4}-\\d{4}-\\d{4}-\\d{3}[0-9X]$" - slot_uri: schema:identifier - exact_mappings: + slot_uri: d4d:orcidIdentifier + annotations: + "d4d:docExample": "0000-0001-2345-6789" + broad_mappings: - schema:identifier Information: @@ -217,13 +227,24 @@ classes: # From linkml Datasets schema FormatDialect: - description: Additional format information for a file + description: Additional format information for a file. attributes: comment_prefix: + description: Character(s) used to indicate comment lines (e.g., "#" for CSV comments). delimiter: + description: Field delimiter character (e.g., "," for CSV, "\t" for TSV). double_quote: + description: >- + Whether quotes within quoted fields are escaped by doubling them. + Expected values: "true" or "false" (as strings per CSV dialect specification). + Follows the W3C CSV-on-the-Web dialect specification. header: + description: >- + Whether the first row of the file contains column headers. + Expected values: "true" or "false" (as strings per CSV dialect specification). + Follows the W3C CSV-on-the-Web dialect specification. quote_char: + description: Character used for quoting fields (e.g., '"' for CSV). ## SHARED SLOTS ## slots: @@ -233,24 +254,29 @@ slots: # https://github.com/linkml/linkml-model/blob/main/linkml_model/model/schema/datasets.yaml title: - description: the official title of the element + description: The official title of the element. slot_uri: dcterms:title language: - description: language in which the information is expressed + description: Language in which the information is expressed. slot_uri: dcterms:language exact_mappings: - schema:inLanguage publisher: + description: The organization or entity responsible for making the resource available. slot_uri: dcterms:publisher range: uriorcurie + annotations: + "d4d:docExample": "ror:04t3en479 # use a ROR ID, DOI, or URL — not a plain name" issued: + description: Date of formal issuance or publication of the resource. slot_uri: dcterms:issued range: datetime page: + description: A landing page or web page providing access to or information about the resource. slot_uri: dcat:landingPage dialect: @@ -263,6 +289,7 @@ slots: slot_uri: dcat:byteSize path: + description: The file path or URL where the content is located. slot_uri: schema:contentUrl download_url: @@ -283,13 +310,12 @@ slots: slot_uri: dcterms:format encoding: - description: the character encoding of the data + description: The character encoding of the data. range: EncodingEnum - slot_uri: dcat:mediaType + slot_uri: d4d:characterEncoding compression: - description: >- - compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). range: CompressionEnum slot_uri: dcat:compressFormat @@ -302,62 +328,90 @@ slots: - schema:encodingFormat hash: - description: hash of the data - slot_uri: dcterms:identifier + description: >- + Cryptographic hash value of the data for integrity verification + (e.g., SHA-256: 'e3b0c44298fc1c149afb...', MD5: 'd41d8cd98f00b204e9800998ecf8427e'). + slot_uri: d4d:hashValue + broad_mappings: + - dcterms:identifier md5: - description: md5 hash of the data - slot_uri: dcterms:identifier + description: MD5 hash value of the data (128-bit cryptographic hash). + slot_uri: d4d:md5Checksum + broad_mappings: + - dcterms:identifier sha256: - description: sha256 hash of the data - slot_uri: dcterms:identifier + description: SHA-256 hash value of the data (256-bit cryptographic hash, recommended). + slot_uri: schema:sha256 conforms_to: + description: An established standard, specification, or schema to which the resource conforms. slot_uri: dcterms:conformsTo conforms_to_schema: - slot_uri: dcterms:conformsTo + description: The schema or data model to which the resource conforms. + slot_uri: d4d:conformsToSchema + broad_mappings: + - dcterms:conformsTo conforms_to_class: - slot_uri: dcterms:conformsTo + description: The specific class or type within a schema to which the resource conforms. + slot_uri: d4d:conformsToClass + broad_mappings: + - dcterms:conformsTo license: + description: The legal license under which the resource is made available (e.g., "MIT", "CC-BY-4.0"). slot_uri: dcterms:license keywords: + description: Keywords or tags describing the resource for discovery and classification. multivalued: true slot_uri: dcat:keyword version: - slot_uri: dcterms:hasVersion + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). + slot_uri: schema:version created_by: + description: The person or organization primarily responsible for creating the resource. slot_uri: dcterms:creator created_on: + description: The date and time when the resource was created. slot_uri: dcterms:created range: datetime last_updated_on: + description: The date and time when the resource was most recently modified or updated. slot_uri: dcterms:modified range: datetime modified_by: + description: A person or organization that contributed to modifying or updating the resource. slot_uri: dcterms:contributor status: - slot_uri: dcterms:type + description: The status of the resource (e.g., draft, published, deprecated). + slot_uri: d4d:publicationStatus was_derived_from: + description: A resource from which this resource was derived, in whole or in part. slot_uri: prov:wasDerivedFrom exact_mappings: - dcterms:source doi: - description: digital object identifier - slot_uri: dcterms:identifier + description: >- + Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing persistent identification + (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). + slot_uri: d4d:doiIdentifier pattern: "10\\.\\d{4,}\\/.+" + broad_mappings: + - dcterms:identifier + exact_mappings: + - schema:identifier external_resources: description: >- @@ -380,94 +434,194 @@ slots: enums: FormatEnum: + description: Common file format extensions for data files and documents. permissible_values: CSV: + description: Comma-Separated Values - tabular data format. TSV: + description: Tab-Separated Values - tabular data format with tab delimiters. XML: + description: Extensible Markup Language - structured markup format. JSON: + description: JavaScript Object Notation - structured data interchange format. JSONL: + description: JSON Lines - newline-delimited JSON format. YAML: + description: YAML Ain't Markup Language - human-readable data serialization format. HTML: + description: HyperText Markup Language - web page markup format. PDF: + description: Portable Document Format - fixed-layout document format. DOCX: + description: Microsoft Word Open XML Document - word processing document. XLSX: + description: Microsoft Excel Open XML Spreadsheet - spreadsheet format. PPTX: + description: Microsoft PowerPoint Open XML Presentation - presentation format. TXT: + description: Plain text file. MD: + description: Markdown - lightweight markup language. ZIP: + description: ZIP archive - compressed file container. TAR: + description: Tape Archive - file archive format. GZ: + description: Gzip compressed file. BZ2: + description: Bzip2 compressed file. XZ: + description: XZ compressed file. MediaTypeEnum: + description: MIME media types (Internet Media Types) for file content identification. permissible_values: text/csv: + description: MIME type for CSV (Comma-Separated Values) files. text/tab-separated-values: + description: MIME type for TSV (Tab-Separated Values) files. application/json: + description: MIME type for JSON (JavaScript Object Notation) files. application/xml: + description: MIME type for XML (Extensible Markup Language) files. text/xml: + description: Alternative MIME type for XML files (text variant). application/yaml: + description: MIME type for YAML files. text/yaml: + description: Alternative MIME type for YAML files (text variant). text/html: + description: MIME type for HTML (HyperText Markup Language) files. application/pdf: + description: MIME type for PDF (Portable Document Format) files. application/vnd.openxmlformats-officedocument.wordprocessingml.document: + description: MIME type for Microsoft Word DOCX files. application/vnd.openxmlformats-officedocument.spreadsheetml.sheet: + description: MIME type for Microsoft Excel XLSX files. application/vnd.openxmlformats-officedocument.presentationml.presentation: + description: MIME type for Microsoft PowerPoint PPTX files. text/plain: + description: MIME type for plain text files. text/markdown: + description: MIME type for Markdown files. application/zip: + description: MIME type for ZIP archive files. application/x-tar: + description: MIME type for TAR archive files. application/gzip: + description: MIME type for Gzip compressed files. application/x-bzip2: + description: MIME type for Bzip2 compressed files. application/x-xz: + description: MIME type for XZ compressed files. CompressionEnum: + description: Compression algorithms and formats for file compression. permissible_values: gzip: + description: GNU zip compression (commonly used with .gz extension). bzip2: + description: Burrows-Wheeler block-sorting compression (commonly used with .bz2 extension). zip: + description: ZIP archive compression format. tar: + description: Tape Archive format (typically combined with gzip or bzip2). xz: + description: XZ Utils compression using LZMA2 algorithm. lzma: + description: Lempel-Ziv-Markov chain algorithm compression. compress: + description: Unix compress utility (LZW compression). EncodingEnum: + description: Character encoding schemes for text representation in different languages and scripts. permissible_values: ASCII: + description: American Standard Code for Information Interchange (7-bit, English characters only). Big5: + description: Traditional Chinese character encoding (primarily Taiwan and Hong Kong). EUC-JP: + description: Extended Unix Code for Japanese. EUC-KR: + description: Extended Unix Code for Korean. EUC-TW: + description: Extended Unix Code for Traditional Chinese. GB2312: + description: Simplified Chinese character encoding standard. HZ-GB-2312: + description: 7-bit encoding for Simplified Chinese (GB2312). ISO-2022-CN-EXT: + description: Extended ISO-2022 encoding for Chinese (includes both Simplified and Traditional). ISO-2022-CN: + description: ISO-2022 encoding for Chinese. ISO-2022-JP-2: + description: Extended ISO-2022 encoding for Japanese (includes additional character sets). ISO-2022-JP: + description: ISO-2022 encoding for Japanese. ISO-2022-KR: + description: ISO-2022 encoding for Korean. ISO-8859-10: + description: Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic). ISO-8859-11: + description: Latin/Thai encoding. ISO-8859-13: + description: Latin-7 (Baltic Rim languages). ISO-8859-14: + description: Latin-8 (Celtic languages). ISO-8859-15: + description: Latin-9 (Western European with Euro sign). ISO-8859-16: + description: Latin-10 (South-Eastern European languages). ISO-8859-1: + description: Latin-1 (Western European languages). ISO-8859-2: + description: Latin-2 (Central European languages). ISO-8859-3: + description: Latin-3 (South European languages - Turkish, Maltese, Esperanto). ISO-8859-4: + description: Latin-4 (North European languages). ISO-8859-5: + description: Latin/Cyrillic encoding. ISO-8859-6: + description: Latin/Arabic encoding. ISO-8859-7: + description: Latin/Greek encoding. ISO-8859-8: + description: Latin/Hebrew encoding. ISO-8859-9: + description: Latin-5 (Turkish). KOI8-R: + description: Russian character encoding (Kod Obmena Informatsiey). KOI8-U: + description: Ukrainian character encoding. Shift_JIS: + description: Japanese character encoding (Microsoft and other systems). UTF-16: + description: Unicode Transformation Format 16-bit (variable-width encoding). UTF-32: + description: Unicode Transformation Format 32-bit (fixed-width encoding). UTF-7: + description: Unicode Transformation Format 7-bit (for 7-bit channels). UTF-8: + description: Unicode Transformation Format 8-bit (variable-width, most common Unicode encoding). + Windows-1250: + description: Windows code page for Central European languages. + Windows-1251: + description: Windows code page for Cyrillic script. + Windows-1252: + description: Windows code page for Western European languages. + Windows-1253: + description: Windows code page for Greek. + Windows-1254: + description: Windows code page for Turkish. + Windows-1255: + description: Windows code page for Hebrew. + Windows-1256: + description: Windows code page for Arabic. + Windows-1257: + description: Windows code page for Baltic languages. + Windows-1258: + description: Windows code page for Vietnamese. CRediTRoleEnum: description: >- @@ -475,33 +629,33 @@ enums: See https://credit.niso.org/ permissible_values: conceptualization: - description: Ideas; formulation or evolution of overarching research goals and aims + description: Ideas; formulation or evolution of overarching research goals and aims. methodology: - description: Development or design of methodology; creation of models + description: Development or design of methodology; creation of models. software: - description: Programming, software development; designing computer programs + description: Programming, software development; designing computer programs. validation: - description: Verification of the overall replication/reproducibility of results + description: Verification of the overall replication/reproducibility of results. formal_analysis: - description: Application of statistical, mathematical, or other formal techniques + description: Application of statistical, mathematical, or other formal techniques. investigation: - description: Conducting the research and investigation process + description: Conducting the research and investigation process. resources: description: Provision of study materials, reagents, patients, laboratory samples, etc. data_curation: - description: Management activities to annotate, scrub data and maintain research data + description: Management activities to annotate, scrub data and maintain research data. writing_original_draft: - description: Preparation, creation and/or presentation of the published work + description: Preparation, creation and/or presentation of the published work. writing_review_editing: - description: Critical review, commentary or revision of the work + description: Critical review, commentary or revision of the work. visualization: - description: Preparation, creation and/or presentation of visualizations/data presentation + description: Preparation, creation and/or presentation of visualizations/data presentation. supervision: - description: Oversight and leadership responsibility for the research activity + description: Oversight and leadership responsibility for the research activity. project_administration: - description: Management and coordination responsibility for the research activity + description: Management and coordination responsibility for the research activity. funding_acquisition: - description: Acquisition of the financial support for the project + description: Acquisition of the financial support for the project. BiasTypeEnum: description: >- @@ -603,20 +757,11 @@ enums: See https://semver.org/ permissible_values: MAJOR: - description: Incompatible changes, breaking backward compatibility + description: Incompatible changes, breaking backward compatibility. MINOR: - description: Backward-compatible new functionality or enhancements + description: Backward-compatible new functionality or enhancements. PATCH: - description: Backward-compatible bug fixes or minor corrections - Windows-1250: - Windows-1251: - Windows-1252: - Windows-1253: - Windows-1254: - Windows-1255: - Windows-1256: - Windows-1257: - Windows-1258: + description: Backward-compatible bug fixes or minor corrections. CreatorOrMaintainerEnum: description: >- @@ -668,10 +813,14 @@ enums: description: Other type of creator or maintainer not listed. Boolean: + description: Three-valued boolean logic supporting true, false, and unknown states. permissible_values: "true": title: "True" + description: Affirmative or positive value. "false": title: "False" + description: Negative or false value. "unknown": - title: "Unknown" \ No newline at end of file + title: "Unknown" + description: Unknown, uncertain, or not applicable value. diff --git a/src/data_sheets_schema/schema/D4D_Collection.yaml b/src/data_sheets_schema/schema/D4D_Collection.yaml index b94e43a2..d2cf6bbb 100644 --- a/src/data_sheets_schema/schema/D4D_Collection.yaml +++ b/src/data_sheets_schema/schema/D4D_Collection.yaml @@ -38,24 +38,33 @@ classes: (e.g., directly observed, reported by subjects, inferred). attributes: was_directly_observed: - description: Whether the data was directly observed + description: >- + True if the data was directly observed by a researcher or instrument; + false if it was obtained through other means (e.g., reported, inferred). slot_uri: d4d:wasDirectlyObserved range: boolean was_reported_by_subjects: - description: Whether the data was reported directly by the subjects themselves + description: >- + True if the data was self-reported directly by the subjects themselves + (e.g., survey responses, questionnaires); false otherwise. slot_uri: d4d:wasReportedBySubjects range: boolean was_inferred_derived: - description: Whether the data was inferred or derived from other data + description: >- + True if the data was computationally inferred or derived from other data + (e.g., model outputs, imputed values); false otherwise. slot_uri: d4d:wasInferred range: boolean was_validated_verified: - description: Whether the data was validated or verified in any way + description: >- + True if the data underwent a validation or verification process + (e.g., expert review, cross-checking with ground truth); false otherwise. slot_uri: d4d:wasValidated range: boolean acquisition_details: description: > - Details on how data was acquired for each instance. + Free-text description of how data was acquired for each instance, including + instruments, protocols, and any manual steps involved. range: string multivalued: true slot_uri: dcterms:description @@ -71,7 +80,9 @@ classes: attributes: mechanism_details: description: > - Details on mechanisms or procedures used to collect the data. + Free-text description of the specific mechanisms or procedures used to collect + the data (e.g., hardware model, software API, manual curation process), + including how those mechanisms were validated. range: string multivalued: true slot_uri: dcterms:description @@ -84,12 +95,14 @@ classes: and how they were compensated. attributes: role: - description: Role of the data collector (e.g., researcher, crowdworker) + description: Role of the data collector (e.g., researcher, crowdworker). slot_uri: schema:roleName range: string collector_details: description: > - Details on who collected the data and their compensation. + Free-text description of who was involved in data collection (e.g., students, + crowdworkers, contractors), their training or qualifications, and how they + were compensated. range: string multivalued: true slot_uri: dcterms:description @@ -104,16 +117,18 @@ classes: - rai:dataCollectionTimeframe attributes: start_date: - description: Start date of data collection + description: Start date of data collection. slot_uri: schema:startDate range: date end_date: - description: End date of data collection + description: End date of data collection. slot_uri: schema:endDate range: date timeframe_details: description: > - Details on the collection timeframe and relationship to data creation dates. + Free-text description of the data collection period and whether this timeframe + matches the creation timeframe of the underlying data (e.g., historical records, + prospective collection). range: string multivalued: true slot_uri: dcterms:description @@ -126,12 +141,13 @@ classes: or obtained via third parties/other sources. attributes: is_direct: - description: Whether collection was direct from individuals + description: Whether collection was direct from individuals. slot_uri: d4d:isDirect range: boolean collection_details: description: > - Details on direct vs. indirect collection methods and sources. + Free-text description of whether data was collected directly from individuals + or obtained via third parties or other indirect sources, and what those sources are. range: string multivalued: true slot_uri: dcterms:description @@ -161,8 +177,8 @@ classes: multivalued: true handling_strategy: description: > - Strategy used to handle missing data (e.g., deletion, imputation, - flagging, multiple imputation). + The primary strategy used to handle missing data (e.g., listwise deletion, + mean imputation, multiple imputation, flagging with sentinel values). slot_uri: d4d:handlingStrategy range: string @@ -184,10 +200,12 @@ classes: required: true source_type: description: > - Type of raw source (sensor, database, user input, web scraping, etc.). - slot_uri: dcterms:type + One or more types of raw source (e.g., sensor, database, user input, web scraping). + slot_uri: d4d:sourceType range: string multivalued: true + broad_mappings: + - dcterms:type access_details: description: > Information on how to access or retrieve the raw source data. @@ -195,7 +213,7 @@ classes: range: string raw_data_format: description: > - Format of the raw data before any preprocessing. + One or more formats of the raw data before any preprocessing (e.g., CSV, DICOM, JSON). slot_uri: d4d:rawDataFormat range: string multivalued: true diff --git a/src/data_sheets_schema/schema/D4D_Composition.yaml b/src/data_sheets_schema/schema/D4D_Composition.yaml index c3f21405..f89f9ec1 100644 --- a/src/data_sheets_schema/schema/D4D_Composition.yaml +++ b/src/data_sheets_schema/schema/D4D_Composition.yaml @@ -56,21 +56,27 @@ classes: slot_uri: dcat:theme instance_type: description: > - Multiple types of instances? (e.g., movies, users, and ratings). + The type or types of instances in the dataset (e.g., "movie", "user", + "rating", "clinical record"). Use when the dataset contains multiple + instance types with different structures. range: string - slot_uri: dcterms:type + slot_uri: d4d:instanceType + broad_mappings: + - dcterms:type data_substrate: description: > Type of data (e.g., raw text, images) from Bridge2AI standards. range: uriorcurie values_from: - B2AI_SUBSTRATE - slot_uri: dcterms:format + slot_uri: dcterms:type counts: description: > How many instances are there in total (of each type, if appropriate)? range: integer slot_uri: schema:numberOfItems + annotations: + "d4d:docExample": "42000 (42,000 patient records)" label: description: > Is there a label or target associated with each instance? @@ -80,7 +86,9 @@ classes: description: > If labeled, what pattern or format do labels follow? range: string - slot_uri: schema:description + slot_uri: d4d:labelPattern + broad_mappings: + - schema:description sampling_strategies: description: > References to one or more SamplingStrategy objects. @@ -106,15 +114,13 @@ classes: description: "Indicates whether it is a sample of a larger set." slot_uri: d4d:isSample range: boolean - multivalued: true is_random: description: "Indicates whether the sample is random." slot_uri: d4d:isRandom range: boolean - multivalued: true source_data: description: > - Description of the larger set from which the sample was drawn, if any. + One or more descriptions of the larger sets from which the sample was drawn, if applicable. slot_uri: d4d:sourceData range: string multivalued: true @@ -123,22 +129,24 @@ classes: Indicates whether the sample is representative of the larger set. slot_uri: d4d:isRepresentative range: boolean - multivalued: true representative_verification: description: > - Explanation of how representativeness was validated or verified. - slot_uri: schema:description + One or more explanations of how representativeness was validated or verified (e.g., statistical tests, domain expert review). + slot_uri: d4d:verificationDescription range: string multivalued: true + broad_mappings: + - schema:description why_not_representative: description: > - Explanation of why the sample is not representative, if applicable. + One or more explanations of why the sample is not representative of the larger set, if applicable. slot_uri: d4d:whyNotRepresentative range: string multivalued: true strategies: description: > - Description of the sampling strategy (deterministic, probabilistic, etc.). + One or more sampling strategies used (e.g., deterministic, simple random, + stratified, cluster, systematic). slot_uri: d4d:strategies range: string multivalued: true @@ -155,13 +163,17 @@ classes: Description of the missing data fields or elements. range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:missingDataDescription + broad_mappings: + - dcterms:description why_missing: description: > Explanation of why each piece of data is missing. range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:missingDataCause + broad_mappings: + - dcterms:description Relationships: @@ -172,7 +184,9 @@ classes: attributes: relationship_details: description: > - Details on relationships between instances (e.g., graph edges, ratings). + Free-text description of how relationships between instances are represented + (e.g., graph edges, ratings matrices, foreign keys), including relationship types + and any associated metadata. range: string multivalued: true slot_uri: dcterms:description @@ -186,7 +200,8 @@ classes: attributes: split_details: description: > - Details on recommended data splits and their rationale. + Free-text description of the recommended data splits (e.g., 80/10/10 train/ + validation/test), how they are defined, and the rationale for the split strategy. range: string multivalued: true slot_uri: dcterms:description @@ -199,10 +214,13 @@ classes: attributes: anomaly_details: description: > - Details on errors, noise sources, or redundancies in the dataset. + Free-text description of errors, noise sources, or redundancies in the dataset, + including their known causes and estimated prevalence. range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:anomalyDetails + broad_mappings: + - dcterms:description DatasetBias: @@ -233,7 +251,8 @@ classes: range: string affected_subsets: description: > - Specific subsets or features of the dataset affected by this bias. + One or more specific subsets or features of the dataset affected by this bias + (e.g., "female participants", "non-English text", "images taken at night"). slot_uri: d4d:affectedSubsets range: string multivalued: true @@ -289,20 +308,24 @@ classes: available and stable over time. range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:availabilityGuarantee + broad_mappings: + - dcterms:description archival: description: > - Indication whether official archival versions of external resources - are included. - slot_uri: schema:archivedAt + Indicates whether official archival versions of external resources + are included in the dataset. + slot_uri: d4d:hasArchivalVersion range: boolean - multivalued: true restrictions: description: > - Description of any restrictions or fees associated with external resources. + One or more descriptions of restrictions or fees associated with accessing + these external resources (e.g., paywalls, registration requirements, API limits). range: string multivalued: true - slot_uri: dcterms:accessRights + slot_uri: d4d:externalResourceRestrictions + broad_mappings: + - dcterms:accessRights Confidentiality: @@ -317,7 +340,9 @@ classes: slot_uri: d4d:confidential_elements_present confidentiality_details: description: > - Details on confidential data elements and handling procedures. + Free-text description of which data elements are confidential, the basis for + confidentiality (e.g., legal privilege, patient data), and how they are handled + or restricted. range: string multivalued: true slot_uri: dcterms:description @@ -334,6 +359,10 @@ classes: description: "Indicates whether any content warnings are needed." slot_uri: d4d:content_warnings_present warnings: + description: >- + One or more specific content warnings describing potentially offensive, insulting, + threatening, or anxiety-provoking content present in the dataset + (e.g., violence, profanity, explicit imagery, hate speech). range: string multivalued: true slot_uri: dcterms:description @@ -350,13 +379,23 @@ classes: range: boolean description: "Indicates whether any subpopulations are explicitly identified." identification: + description: >- + How subpopulations are identified and defined (e.g., by age groups, gender, + geographic region, disease status, or other demographic/clinical characteristics). range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:subpopulationIdentification + broad_mappings: + - dcterms:description distribution: + description: >- + The distribution of instances across identified subpopulations, including counts, + percentages, or proportions for each subgroup. range: string multivalued: true - slot_uri: dcterms:description + slot_uri: d4d:subpopulationDistribution + broad_mappings: + - dcterms:description Deidentification: @@ -375,8 +414,10 @@ classes: identifiers_removed: range: string multivalued: true - description: "List of identifier types removed during de-identification." - slot_uri: schema:identifier + description: >- + List of identifier types removed during de-identification + (e.g., 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision'). + slot_uri: d4d:removedIdentifierTypes deidentification_details: description: > Details on de-identification procedures and residual risks. @@ -416,7 +457,7 @@ classes: description: >- The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object. - slot_uri: schema:identifier + slot_uri: dcterms:relation range: string required: true relationship_type: diff --git a/src/data_sheets_schema/schema/D4D_Data_Governance.yaml b/src/data_sheets_schema/schema/D4D_Data_Governance.yaml index 9a1b57d3..efa0afaa 100644 --- a/src/data_sheets_schema/schema/D4D_Data_Governance.yaml +++ b/src/data_sheets_schema/schema/D4D_Data_Governance.yaml @@ -47,18 +47,22 @@ classes: is_a: DatasetProperty attributes: license_terms: - description: > - Description of the dataset's license and terms of use (including - links, costs, or usage constraints). + description: >- + Description of the dataset's license and terms of use, including links, costs, + or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', 'CC BY-NC-SA 4.0', + 'proprietary - contact data@example.org for access'). range: string multivalued: true - slot_uri: dcterms:license + slot_uri: d4d:licenseDescription + broad_mappings: + - dcterms:license + - dcterms:rights data_use_permission: description: >- Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, - ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO + ethics approval required, collaboration required). See https://github.com/EBISPOT/DUO. range: DataUsePermissionEnum multivalued: true slot_uri: DUO:0000001 @@ -71,8 +75,8 @@ classes: This person can answer questions about licensing terms, usage restrictions, fees, and permissions. range: Person - slot_uri: schema:contactPoint - exact_mappings: + slot_uri: d4d:licenseContactPoint + broad_mappings: - schema:contactPoint @@ -85,7 +89,7 @@ classes: is_a: DatasetProperty attributes: restrictions: - description: "Explanation of third-party IP restrictions." + description: "One or more explanations of third-party IP restrictions or associated fees." range: string multivalued: true slot_uri: dcterms:rights @@ -104,7 +108,7 @@ classes: is_a: DatasetProperty attributes: regulatory_restrictions: - description: "Export or regulatory restrictions on the dataset." + description: "One or more export controls or regulatory restrictions applicable to the dataset (e.g., HIPAA, ITAR, GDPR)." range: string multivalued: true slot_uri: dcterms:accessRights @@ -137,8 +141,8 @@ classes: questions about data governance policies, access procedures, and oversight mechanisms. range: Person - slot_uri: schema:contactPoint - exact_mappings: + slot_uri: d4d:governanceContactPoint + broad_mappings: - schema:contactPoint @@ -211,69 +215,69 @@ enums: permissible_values: # Permissions no_restriction: - description: No restriction on data use + description: No restriction on data use. meaning: DUO:0000004 general_research_use: - description: Data available for any research purpose (GRU) + description: Data available for any research purpose (GRU). meaning: DUO:0000042 health_medical_biomedical_research: - description: Data limited to health, medical, or biomedical research (HMB) + description: Data limited to health, medical, or biomedical research (HMB). meaning: DUO:0000006 disease_specific_research: - description: Data limited to research on specified disease(s) (DS) + description: Data limited to research on specified disease(s) (DS). meaning: DUO:0000007 population_origins_ancestry_research: - description: Data limited to population origins or ancestry research (POA) + description: Data limited to population origins or ancestry research (POA). meaning: DUO:0000011 clinical_care_use: - description: Data available for clinical care and applications (CC) + description: Data available for clinical care and applications (CC). meaning: DUO:0000043 # Modifiers/Restrictions no_commercial_use: - description: Data use limited to non-commercial purposes (NCU) + description: Data use limited to non-commercial purposes (NCU). meaning: DUO:0000046 non_profit_use_only: - description: Data use limited to not-for-profit organizations (NPU) + description: Data use limited to not-for-profit organizations (NPU). meaning: DUO:0000045 non_profit_use_and_non_commercial_use: - description: Data limited to not-for-profit organizations and non-commercial use (NPUNCU) + description: Data limited to not-for-profit organizations and non-commercial use (NPUNCU). meaning: DUO:0000018 no_methods_development: - description: Data cannot be used for methods or software development (NMDS) + description: Data cannot be used for methods or software development (NMDS). meaning: DUO:0000015 genetic_studies_only: - description: Data limited to genetic studies only (GSO) + description: Data limited to genetic studies only (GSO). meaning: DUO:0000016 ethics_approval_required: - description: Ethics approval (e.g., IRB/ERB) required for data use (IRB) + description: Ethics approval (e.g., IRB/ERB) required for data use (IRB). meaning: DUO:0000021 collaboration_required: - description: Collaboration with primary investigator required (COL) + description: Collaboration with primary investigator required (COL). meaning: DUO:0000020 publication_required: - description: Results must be published/shared with research community (PUB) + description: Results must be published/shared with research community (PUB). meaning: DUO:0000019 geographic_restriction: - description: Data use limited to specific geographic region (GS) + description: Data use limited to specific geographic region (GS). meaning: DUO:0000022 institution_specific: - description: Data use limited to approved institutions (IS) + description: Data use limited to approved institutions (IS). meaning: DUO:0000028 project_specific: - description: Data use limited to approved project(s) (PS) + description: Data use limited to approved project(s) (PS). meaning: DUO:0000027 user_specific: - description: Data use limited to approved users (US) + description: Data use limited to approved users (US). meaning: DUO:0000026 time_limit: - description: Data use approved for limited time period (TS) + description: Data use approved for limited time period (TS). meaning: DUO:0000025 return_to_database: - description: Derived data must be returned to database/resource (RTN) + description: Derived data must be returned to database/resource (RTN). meaning: DUO:0000029 publication_moratorium: - description: Publication restricted until specified date (MOR) + description: Publication restricted until specified date (MOR). meaning: DUO:0000024 no_population_ancestry_research: - description: Population/ancestry research prohibited (NPOA) + description: Population/ancestry research prohibited (NPOA). meaning: DUO:0000044 \ No newline at end of file diff --git a/src/data_sheets_schema/schema/D4D_Distribution.yaml b/src/data_sheets_schema/schema/D4D_Distribution.yaml index ecd9ac73..88bf3f6d 100644 --- a/src/data_sheets_schema/schema/D4D_Distribution.yaml +++ b/src/data_sheets_schema/schema/D4D_Distribution.yaml @@ -52,7 +52,7 @@ classes: Boolean indicating whether the dataset is distributed to parties external to the dataset-creating entity. range: boolean - slot_uri: dcterms:accessRights + slot_uri: d4d:isExternallyShared DistributionFormat: @@ -62,10 +62,12 @@ classes: is_a: DatasetProperty attributes: access_urls: - description: "Details of the distribution channel(s) or format(s)." - range: string + description: "One or more URLs providing access to the distribution channel(s) or format(s)." + range: uri multivalued: true slot_uri: dcat:accessURL + annotations: + "d4d:docExample": "https://example.org/dataset/download" DistributionDate: @@ -75,8 +77,9 @@ classes: attributes: release_dates: description: > - Dates or timeframe for dataset release. Could be a one-time release date - or multiple scheduled releases. + One or more dates or timeframes for dataset release, in ISO 8601 format + (e.g., "2024-03-15") or as a descriptive string (e.g., "Q2 2024"). + Use multiple values for staged or scheduled releases. range: string multivalued: true slot_uri: dcterms:available diff --git a/src/data_sheets_schema/schema/D4D_Ethics.yaml b/src/data_sheets_schema/schema/D4D_Ethics.yaml index b8efdc6e..4f3d2412 100644 --- a/src/data_sheets_schema/schema/D4D_Ethics.yaml +++ b/src/data_sheets_schema/schema/D4D_Ethics.yaml @@ -50,8 +50,8 @@ classes: Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID. range: Person - slot_uri: schema:contactPoint - exact_mappings: + slot_uri: d4d:ethicsContactPoint + broad_mappings: - schema:contactPoint reviewing_organization: description: >- @@ -64,7 +64,8 @@ classes: - schema:provider review_details: description: > - Details on ethical review processes, outcomes, and supporting documentation. + Free-text description of the ethical review process, board decisions, outcomes, + and any supporting documentation (e.g., IRB approval number, ethics committee name). range: string multivalued: true slot_uri: dcterms:description @@ -79,12 +80,16 @@ classes: attributes: impact_details: description: > - Details on data protection impact analysis, outcomes, and documentation. + Free-text description of the data protection impact analysis, including methodology, + privacy risks identified, mitigation measures taken, and any regulatory findings. range: string multivalued: true slot_uri: dcterms:description + # CollectionNotification, CollectionConsent, and ConsentRevocation cover general data + # collection ethics (notification, consent, revocation) applicable to any dataset + # touching individuals — broader than formal human subjects research (D4D_Human.yaml scope). CollectionNotification: description: > Were the individuals in question notified about the data collection? If so, please @@ -94,7 +99,9 @@ classes: attributes: notification_details: description: > - Details on how individuals were notified about data collection. + Free-text description of how individuals were notified about data collection, + including the notification method (e.g., email, poster, in-person), timing, + and the language or text of the notification itself if available. range: string multivalued: true slot_uri: dcterms:description @@ -109,7 +116,8 @@ classes: attributes: consent_details: description: > - Details on how consent was requested, provided, and documented. + Free-text description of how consent was requested (e.g., opt-in form, verbal + agreement), provided, and documented, including the language individuals consented to. range: string multivalued: true slot_uri: dcterms:description @@ -123,7 +131,9 @@ classes: attributes: revocation_details: description: > - Details on consent revocation mechanisms and procedures. + Free-text description of the mechanism provided for individuals to revoke + consent (e.g., opt-out portal, written request), the scope of revocation + (full withdrawal or specific uses), and what happens to their data after revocation. range: string multivalued: true slot_uri: dcterms:description \ No newline at end of file diff --git a/src/data_sheets_schema/schema/D4D_Evaluation_Summary.yaml b/src/data_sheets_schema/schema/D4D_Evaluation_Summary.yaml index 10f0c72b..164abdfd 100644 --- a/src/data_sheets_schema/schema/D4D_Evaluation_Summary.yaml +++ b/src/data_sheets_schema/schema/D4D_Evaluation_Summary.yaml @@ -18,452 +18,456 @@ imports: classes: EvaluationSummary: - description: Complete evaluation summary for a rubric system + description: Complete evaluation summary for a rubric system. attributes: rubric_type: - description: Type of rubric used (rubric10 or rubric20) + description: Type of rubric used (rubric10 or rubric20). range: RubricTypeEnum required: true rubric_description: - description: Description of rubric structure and scoring + description: "Free-text description of the rubric structure, scoring criteria, and scale used in this evaluation (e.g., 'Rubric10: 10 elements × 5 sub-elements, binary scoring, max 50 points')." + range: string required: true total_files_evaluated: - description: Total number of D4D files evaluated + description: Total number of D4D files evaluated. range: integer required: true concatenated_file_count: - description: Number of concatenated D4D files + description: Number of concatenated D4D files. range: integer individual_file_count: - description: Number of individual D4D files + description: Number of individual D4D files. range: integer overall_performance: - description: Overall performance statistics + description: Summary statistics of evaluation performance across all D4D files. range: OverallPerformance required: true method_comparison: - description: Performance comparison across generation methods + description: Performance comparison across generation methods. range: MethodPerformance multivalued: true required: true project_comparison: - description: Performance comparison across Bridge2AI projects + description: Performance comparison across Bridge2AI projects. range: ProjectPerformance multivalued: true required: true top_performers: - description: List of top performing D4D files + description: List of top performing D4D files. range: TopPerformer multivalued: true element_performance: - description: Performance by rubric element (rubric10 only) + description: Performance by rubric element (rubric10 only). range: ElementPerformance multivalued: true category_performance: - description: Performance by category (rubric20 only) + description: Performance by category (rubric20 only). range: CategoryPerformance multivalued: true common_weaknesses: - description: Common weaknesses across all evaluations + description: Common weaknesses across all evaluations. range: CommonWeakness multivalued: true common_strengths: - description: Common strengths across all evaluations + description: Common strengths across all evaluations. range: CommonStrength multivalued: true key_insights: - description: Key analytical insights from evaluation + description: Key analytical insights from evaluation. range: KeyInsight multivalued: true input_type_comparison: - description: Concatenated vs individual file performance + description: Concatenated vs individual file performance. range: InputTypeComparison files_generated: - description: Output files generated by evaluation + description: Output files generated by evaluation. range: GeneratedFile multivalued: true OverallPerformance: - description: Overall performance statistics across all files + description: Overall performance statistics across all files. attributes: average_score: - description: Average score across all files + description: Average score across all files. range: float required: true average_percentage: - description: Average percentage score + description: Average score expressed as a percentage of maximum possible score. range: float required: true max_score: - description: Maximum possible score + description: Maximum possible score for the rubric being evaluated. range: integer required: true best_score: - description: Best score achieved + description: Highest score achieved across all evaluated D4D files. range: float required: true best_percentage: - description: Best percentage achieved + description: Highest score expressed as a percentage of maximum possible score. range: float required: true best_performer: - description: Project/method/type with best score + description: Identifier of the project, method, or file type that achieved the best score. required: true worst_score: - description: Worst score achieved + description: Lowest score achieved across all evaluated D4D files. range: float required: true worst_percentage: - description: Worst percentage achieved + description: Lowest score expressed as a percentage of maximum possible score. range: float required: true worst_performer: - description: Project/method/type with worst score + description: Identifier of the project, method, or file type that achieved the worst score. MethodPerformance: - description: Performance metrics for a specific generation method + description: Performance metrics for a specific generation method. attributes: method: - description: D4D generation method + description: The D4D generation method used to create the datasheets (e.g., claudecode_agent, gpt5). range: GenerationMethodEnum required: true input_type: - description: Type of input files (concatenated or individual) + description: Type of input files used for generation (concatenated or individual source documents). range: InputTypeEnum file_count: - description: Number of files evaluated with this method + description: Number of D4D files evaluated using this generation method. range: integer required: true average_score: - description: Average score for this method + description: Mean score achieved across all files generated with this method. range: float required: true average_percentage: - description: Average percentage for this method + description: Mean score expressed as a percentage of maximum possible for this method. range: float required: true max_score: - description: Maximum possible score + description: Maximum possible score for the rubric being evaluated. range: integer best_score: - description: Best score for this method + description: Highest score achieved by any file using this method. range: float worst_score: - description: Worst score for this method + description: Lowest score achieved by any file using this method. range: float rank: - description: Ranking among all methods (1 = best) + description: Ranking position of this method among all methods (1 = best performing). range: integer ProjectPerformance: - description: Performance metrics for a Bridge2AI Grand Challenge project + description: Performance metrics for a Bridge2AI Grand Challenge project. attributes: project: - description: Bridge2AI Grand Challenge project + description: The Bridge2AI Grand Challenge project for which D4D datasheets were evaluated (e.g., AI_READI, VOICE). range: Bridge2AIProjectEnum required: true file_count: - description: Number of files evaluated for this project + description: Number of D4D files evaluated for this specific Bridge2AI project. range: integer required: true average_score: - description: Average score for this project + description: Mean score achieved across all D4D files for this project. range: float required: true average_percentage: - description: Average percentage for this project + description: Mean score expressed as a percentage of maximum possible for this project. range: float required: true max_score: - description: Maximum possible score + description: Maximum possible score for the rubric being evaluated. range: integer rank: - description: Ranking among all projects (1 = best) + description: Ranking position of this project among all projects (1 = best performing). range: integer TopPerformer: - description: Information about a top-performing D4D file + description: Information about a top-performing D4D file. attributes: rank: - description: Rank in top performers list + description: Position in the top performers list (1 = highest score). range: integer required: true project: - description: Bridge2AI project + description: The Bridge2AI project associated with this top-performing datasheet. range: Bridge2AIProjectEnum required: true method: - description: Generation method used + description: The D4D generation method used to create this top-performing file. range: GenerationMethodEnum required: true input_type: - description: Input type (concatenated or individual) + description: Type of input used for generation (concatenated or individual source documents). range: InputTypeEnum required: true score: - description: Score achieved + description: Raw score achieved by this D4D file on the rubric. range: float required: true percentage: - description: Percentage score + description: Score expressed as a percentage of the maximum possible score. range: float required: true max_score: - description: Maximum possible score + description: Maximum possible score for the rubric being evaluated. range: integer required: true elements_passing: - description: Number of elements passing threshold (rubric10) + description: Number of rubric elements that met or exceeded the passing threshold (rubric10 only). range: integer file_name: - description: Name of the D4D file + description: Filename of the top-performing D4D datasheet. ElementPerformance: - description: Performance for a specific rubric element (rubric10) + description: Performance for a specific rubric element (rubric10). attributes: element_id: - description: Element number (1-10) + description: Numeric identifier of the rubric element being evaluated (1-10 for rubric10). range: integer required: true element_name: - description: Name of the element + description: Human-readable name of the rubric element (e.g., "Motivation", "Composition"). required: true max_score: - description: Maximum score for this element + description: Maximum possible score for this specific rubric element. range: integer required: true average_score: - description: Average score across all files + description: Mean score achieved for this element across all evaluated files. range: float required: true average_percentage: - description: Average percentage for this element + description: Mean score for this element expressed as a percentage of maximum possible. range: float strength_level: - description: Assessment of strength (strongest, strong, weak, weakest) + description: Qualitative assessment of performance level for this element (strongest, strong, weak, weakest). range: StrengthLevelEnum description: - description: Description of what this element measures + description: Description of what this element measures. CategoryPerformance: - description: Performance for a rubric category (rubric20) + description: Performance for a rubric category (rubric20). attributes: category_id: - description: Category number (1-4) + description: Numeric identifier of the rubric category being evaluated (1-4 for rubric20). range: integer required: true category_name: - description: Name of the category + description: Human-readable name of the category (e.g., "Dataset Motivation", "Data Composition"). required: true max_score: - description: Maximum score for this category + description: Maximum possible score for all questions within this category. range: integer required: true average_score: - description: Average score across all files + description: Mean score achieved for this category across all evaluated files. range: float required: true average_percentage: - description: Average percentage for this category + description: Mean score for this category expressed as a percentage of maximum possible. range: float rank: - description: Ranking among categories (1 = best) + description: Ranking position of this category among all categories (1 = best performing). range: integer question_count: - description: Number of questions in this category + description: Total number of evaluation questions within this category. range: integer CommonWeakness: - description: Common weakness identified across evaluations + description: Common weakness identified across evaluations. attributes: weakness_type: - description: Type of weakness + description: Classification of the weakness observed across multiple D4D files. range: WeaknessTypeEnum required: true description: - description: Description of the weakness + description: Detailed explanation of the weakness pattern and its manifestation. required: true frequency: - description: How frequently this weakness appears + description: How frequently this weakness appears across the evaluated dataset. slot_uri: d4d:frequency range: FrequencyEnum affected_element_or_question: - description: Specific element/question affected + description: Specific rubric element or question where this weakness commonly occurs. typical_score: - description: Typical score for this weakness area + description: Representative numeric score (float) typically achieved in areas affected by this weakness. + range: float CommonStrength: - description: Common strength identified across evaluations + description: Common strength identified across evaluations. attributes: strength_type: - description: Type of strength + description: Classification of the strength pattern observed across multiple D4D files. range: StrengthTypeEnum required: true description: - description: Description of the strength + description: Detailed explanation of the strength pattern and its positive impact. required: true frequency: - description: How frequently this strength appears + description: How frequently this strength appears across the evaluated dataset. slot_uri: d4d:frequency range: FrequencyEnum affected_element_or_question: - description: Specific element/question affected + description: Specific rubric element or question where this strength commonly occurs. typical_score: - description: Typical score for this strength area + description: Representative numeric score (float) typically achieved in areas demonstrating this strength. + range: float KeyInsight: - description: Key analytical insight from evaluation + description: Key analytical insight from evaluation. attributes: insight_type: - description: Type of insight + description: Classification of the analytical insight (e.g., trend, comparison, finding). range: InsightTypeEnum required: true title: - description: Brief title of the insight + description: Concise summary title capturing the essence of the insight. required: true description: - description: Detailed description of the insight + description: Comprehensive explanation of the insight and its implications. required: true supporting_data: - description: Data supporting this insight + description: Quantitative or qualitative evidence supporting this analytical insight. multivalued: true comparison_metric: - description: Comparison metric if applicable (e.g., "2.4× better") + description: Numerical comparison metric quantifying the insight (e.g., "2.4× better", "30% improvement"). InputTypeComparison: - description: Comparison between concatenated and individual file performance + description: Comparison between concatenated and individual file performance. attributes: concatenated_performance: - description: Performance on concatenated files + description: Performance metrics for D4D files generated from concatenated source documents. range: InputTypePerformance required: true individual_performance: - description: Performance on individual files + description: Performance metrics for D4D files generated from individual source documents. range: InputTypePerformance required: true synthesis_advantage: - description: Description of synthesis advantage for concatenated files + description: "Free-text explanation of the performance difference observed between multi-document synthesis (concatenated files) and individual source documents for this comparison, including any advantage or disadvantage." + range: string InputTypePerformance: - description: Performance metrics for an input type + description: Performance metrics for an input type. attributes: input_type: - description: Type of input (concatenated or individual) + description: Type of input (concatenated or individual). range: InputTypeEnum required: true file_count: - description: Number of files of this type + description: Number of files of this type. range: integer average_score: - description: Average score for this input type + description: Average score for this input type. range: float average_percentage: - description: Average percentage for this input type + description: Average percentage for this input type. range: float score_range: - description: Score range (min-max) + description: Range of scores observed for this input type (minimum to maximum values). best_method: - description: Best performing method for this input type + description: Generation method that achieved the highest performance for this input type. range: GenerationMethodEnum GeneratedFile: - description: Information about generated output file + description: Information about generated output file. attributes: file_path: - description: Path to the generated file + description: Filesystem path to the generated evaluation output file. required: true file_type: - description: Type of file (CSV, JSON, Markdown) - range: FileTypeEnum + description: File format type of the generated output (CSV, JSON, or Markdown). + range: EvaluationOutputFormatEnum required: true description: - description: Description of file contents + description: Explanation of the file contents and what data it contains. required: true row_count: - description: Number of rows/entries (for CSV/JSON) + description: Number of data rows or entries in the file (applicable to CSV and JSON formats). range: integer enums: RubricTypeEnum: - description: Types of evaluation rubrics + description: Types of evaluation rubrics. permissible_values: rubric10: description: 10-element hierarchical rubric (50 sub-elements, max 50 points) @@ -471,109 +475,109 @@ enums: description: 20-question detailed rubric (4 categories, max 84 points) GenerationMethodEnum: - description: D4D generation methods + description: D4D generation methods. permissible_values: curated: - description: Manually curated comprehensive datasheets + description: Manually curated comprehensive datasheets. gpt5: - description: Generated using GPT-5 API + description: Generated using GPT-5 API. claudecode: - description: Direct synthesis at temperature=0.0 (concatenated only) + description: Direct synthesis at temperature=0.0 (concatenated only). claudecode_agent: - description: Claude Code agent-based generation + description: Claude Code agent-based generation. claudecode_assistant: - description: Claude Code assistant-based generation + description: Claude Code assistant-based generation. Bridge2AIProjectEnum: - description: Bridge2AI Grand Challenge projects + description: Bridge2AI Grand Challenge projects. permissible_values: AI_READI: - description: AI-READI - AI-Ready and Equitable Atlas for Diabetes Insights + description: AI-READI - AI-Ready and Equitable Atlas for Diabetes Insights. CHORUS: - description: CHoRUS - Collaborative Hospital Repository Uniting Standards + description: CHoRUS - Collaborative Hospital Repository Uniting Standards. CM4AI: - description: CM4AI - Cell Maps for AI + description: CM4AI - Cell Maps for AI. VOICE: - description: VOICE - Voice as a Biomarker of Health + description: VOICE - Voice as a Biomarker of Health. InputTypeEnum: - description: Types of input files for D4D generation + description: Types of input files for D4D generation. permissible_values: concatenated: - description: Multiple source documents concatenated, requires synthesis + description: Multiple source documents concatenated, requires synthesis. individual: - description: Single source document, single-source extraction + description: Single source document, single-source extraction. StrengthLevelEnum: - description: Assessment of element/category strength + description: Assessment of element/category strength. permissible_values: strongest: - description: Highest performing element/category + description: Highest performing element/category. strong: - description: Above average performance + description: Above average performance. weak: - description: Below average performance + description: Below average performance. weakest: - description: Lowest performing element/category + description: Lowest performing element/category. WeaknessTypeEnum: - description: Types of common weaknesses + description: Types of common weaknesses. permissible_values: missing_field: - description: Required or important field is missing + description: Required or important field is missing. incomplete_content: - description: Field present but content is incomplete + description: Field present but content is incomplete. low_quality: - description: Content quality is below standard + description: Content quality is below standard. inconsistent: - description: Information is inconsistent across fields + description: Information is inconsistent across fields. generic: - description: Content is too generic/boilerplate + description: Content is too generic/boilerplate. StrengthTypeEnum: - description: Types of common strengths + description: Types of common strengths. permissible_values: comprehensive: - description: Comprehensive coverage of topic + description: Comprehensive coverage of topic. high_quality: - description: High quality detailed content + description: High quality detailed content. consistent: - description: Consistent information across fields + description: Consistent information across fields. well_structured: - description: Well-structured and organized + description: Well-structured and organized. FrequencyEnum: - description: Frequency of occurrence + description: Frequency of occurrence. permissible_values: always: - description: Appears in all or nearly all files + description: Appears in all or nearly all files. frequently: - description: Appears in majority of files + description: Appears in majority of files. sometimes: - description: Appears in some files + description: Appears in some files. rarely: - description: Appears in few files + description: Appears in few files. InsightTypeEnum: - description: Types of analytical insights + description: Types of analytical insights. permissible_values: method_comparison: - description: Comparison between generation methods + description: Comparison between generation methods. synthesis_advantage: - description: Advantage of synthesis vs single-source + description: Advantage of synthesis vs single-source. performance_gap: - description: Gap between best and worst performers + description: Gap between best and worst performers. common_pattern: - description: Pattern observed across evaluations + description: Pattern observed across evaluations. improvement_opportunity: - description: Opportunity for improvement + description: Opportunity for improvement. - FileTypeEnum: - description: Types of generated output files + EvaluationOutputFormatEnum: + description: Types of generated output files. permissible_values: csv: - description: CSV file with tabular data + description: CSV file with tabular data. json: - description: JSON file with structured data + description: JSON file with structured data. markdown: - description: Markdown file with formatted text + description: Markdown file with formatted text. diff --git a/src/data_sheets_schema/schema/D4D_FileCollection.yaml b/src/data_sheets_schema/schema/D4D_FileCollection.yaml index 8bdfd70b..dddeb746 100644 --- a/src/data_sheets_schema/schema/D4D_FileCollection.yaml +++ b/src/data_sheets_schema/schema/D4D_FileCollection.yaml @@ -120,73 +120,77 @@ classes: description: Number of files in this collection. range: integer slot_uri: d4d:fileCount + annotations: + "d4d:docExample": "47" total_bytes: - description: Total size of all files in bytes. + description: "Total size of all files in this collection, in bytes (integer). Maps to dcat:byteSize." range: integer slot_uri: dcat:byteSize + annotations: + "d4d:docExample": "1073741824 (1 GiB = 1024³ bytes)" enums: FileTypeEnum: description: Types of individual files within datasets. permissible_values: data_file: - description: A data file containing dataset content + description: A data file containing dataset content. meaning: schema:DataDownload code_file: - description: A source code or script file + description: A source code or script file. meaning: schema:SoftwareSourceCode documentation_file: - description: A documentation file (README, guide, etc.) + description: A documentation file (README, guide, etc.). meaning: schema:Documentation metadata_file: - description: A metadata or annotation file + description: A metadata or annotation file. meaning: dcat:CatalogRecord configuration_file: - description: A configuration or settings file + description: A configuration or settings file. meaning: d4d:ConfigurationFile notebook_file: - description: A computational notebook file (Jupyter, R Markdown, etc.) + description: A computational notebook file (Jupyter, R Markdown, etc.). meaning: d4d:NotebookFile image_file: - description: An image or visualization file + description: An image or visualization file. meaning: schema:ImageObject archive_file: - description: An archive or compressed file + description: An archive or compressed file. meaning: d4d:ArchiveFile other: - description: Other file type + description: Other file type. meaning: d4d:OtherFile FileCollectionTypeEnum: description: Types of file collections within datasets. permissible_values: raw_data: - description: Raw, unprocessed data files + description: Raw, unprocessed data files. meaning: d4d:RawData processed_data: - description: Cleaned, processed, or transformed data files + description: Cleaned, processed, or transformed data files. meaning: d4d:ProcessedData training_split: - description: Files designated for model training + description: Files designated for model training. meaning: d4d:TrainingSplit test_split: - description: Files designated for model testing + description: Files designated for model testing. meaning: d4d:TestSplit validation_split: - description: Files designated for model validation + description: Files designated for model validation. meaning: d4d:ValidationSplit documentation: - description: Documentation files (README, codebook, etc.) + description: Documentation files (README, codebook, etc.). meaning: schema:Documentation metadata: - description: Metadata or annotation files + description: Metadata or annotation files. meaning: dcat:CatalogRecord code: - description: Code or script files + description: Code or script files. meaning: schema:SoftwareSourceCode supplementary: - description: Supplementary materials + description: Supplementary materials. meaning: schema:SupplementalMaterial other: - description: Other file collection type + description: Other file collection type. meaning: d4d:OtherFileCollection diff --git a/src/data_sheets_schema/schema/D4D_Maintenance.yaml b/src/data_sheets_schema/schema/D4D_Maintenance.yaml index 5edf13c9..d0400df6 100644 --- a/src/data_sheets_schema/schema/D4D_Maintenance.yaml +++ b/src/data_sheets_schema/schema/D4D_Maintenance.yaml @@ -50,7 +50,8 @@ classes: slot_uri: schema:maintainer maintainer_details: description: > - Details on who will support, host, or maintain the dataset. + Free-text description of the organization, team, or individual responsible for + maintaining the dataset, including contact information and hosting arrangements. range: string multivalued: true slot_uri: dcterms:description @@ -63,11 +64,14 @@ classes: attributes: erratum_url: description: "URL or access point for the erratum." - slot_uri: dcat:accessURL + slot_uri: d4d:erratumURL range: uri + annotations: + "d4d:docExample": "https://example.org/dataset/errata/2024-01-15" erratum_details: description: > - Details on any errata or corrections to the dataset. + Free-text description of the error, its scope, the affected data or records, + and the correction applied. range: string multivalued: true slot_uri: dcterms:description @@ -88,7 +92,8 @@ classes: range: string update_details: description: > - Details on update plans, responsible parties, and communication methods. + Free-text description of planned update types (e.g., corrections, additions, + deletions), responsible parties, and how updates will be communicated to users. range: string multivalued: true slot_uri: dcterms:description @@ -108,7 +113,9 @@ classes: range: string retention_details: description: > - Details on data retention limits and enforcement procedures. + Free-text description of applicable retention limits, legal or ethical basis + for those limits, and how they will be enforced (e.g., automated deletion, + anonymization after the retention period). range: string multivalued: true slot_uri: dcterms:description @@ -121,9 +128,9 @@ classes: is_a: DatasetProperty attributes: latest_version_doi: - description: "DOI or URL of the latest dataset version." - slot_uri: schema:identifier - range: string + description: "DOI or URL identifying the latest version of this dataset (e.g., '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567')." + slot_uri: dcterms:hasVersion + range: uriorcurie versions_available: description: "List of available versions with metadata." slot_uri: d4d:versionsAvailable @@ -131,7 +138,8 @@ classes: multivalued: true version_details: description: > - Details on version support policies and obsolescence communication. + Free-text description of version support policies, how long older versions will + be hosted, and how dataset consumers will be notified when versions become obsolete. range: string multivalued: true slot_uri: dcterms:description @@ -146,11 +154,15 @@ classes: attributes: contribution_url: description: "URL for contribution guidelines or process." - slot_uri: dcat:landingPage + slot_uri: d4d:contributionURL range: uri + annotations: + "d4d:docExample": "https://example.org/dataset/contributing" extension_details: description: > - Details on extension mechanisms, contribution validation, and communication. + Free-text description of how third parties can contribute to the dataset, + how contributions are validated (e.g., peer review, automated tests), + and how accepted contributions will be communicated to the community. range: string multivalued: true slot_uri: dcterms:description \ No newline at end of file diff --git a/src/data_sheets_schema/schema/D4D_Minimal.yaml b/src/data_sheets_schema/schema/D4D_Minimal.yaml index 1601c218..8b71e92e 100644 --- a/src/data_sheets_schema/schema/D4D_Minimal.yaml +++ b/src/data_sheets_schema/schema/D4D_Minimal.yaml @@ -31,7 +31,7 @@ classes: is_a: Information attributes: resources: - description: The datasets in this collection. + description: List of datasets in this collection. slot_uri: schema:hasPart range: MinimalDataset multivalued: true diff --git a/src/data_sheets_schema/schema/D4D_Motivation.yaml b/src/data_sheets_schema/schema/D4D_Motivation.yaml index d3da914b..f9ea2c3d 100644 --- a/src/data_sheets_schema/schema/D4D_Motivation.yaml +++ b/src/data_sheets_schema/schema/D4D_Motivation.yaml @@ -46,7 +46,9 @@ classes: response: description: "Short explanation describing the primary purpose of creating the dataset." range: string - slot_uri: dcterms:description + slot_uri: d4d:questionResponse + broad_mappings: + - dcterms:description Task: @@ -56,7 +58,9 @@ classes: response: description: "Short explanation describing the specific task or tasks for which this dataset was created." range: string - slot_uri: dcterms:description + slot_uri: d4d:questionResponse + broad_mappings: + - dcterms:description AddressingGap: @@ -66,7 +70,9 @@ classes: response: description: "Short explanation of the knowledge or resource gap that this dataset was intended to address." range: string - slot_uri: dcterms:description + slot_uri: d4d:questionResponse + broad_mappings: + - dcterms:description Creator: @@ -78,22 +84,25 @@ classes: principal_investigator: description: "A key individual (Principal Investigator) responsible for or overseeing dataset creation." range: Person - slot_uri: dcterms:creator - exact_mappings: + slot_uri: d4d:principalInvestigator + broad_mappings: + - dcterms:creator - schema:creator affiliations: description: "Organizations with which the creator or team is affiliated." range: Organization multivalued: true inlined_as_list: true - slot_uri: schema:affiliation + slot_uri: d4d:teamAffiliation + broad_mappings: + - schema:affiliation + # Roles are placed on Creator rather than Person because the same person + # may contribute in different capacities across different datasets. credit_roles: description: >- - Contributor roles using the CRediT (Contributor Roles Taxonomy) for - the principal investigator or creator team. Specifies the specific - contributions made to this dataset (e.g., Conceptualization, Data Curation, - Methodology). Note: roles are specified here rather than on Person directly, - since the same person may have different roles across different datasets. + One or more contributor roles using the CRediT (Contributor Roles Taxonomy) for + the principal investigator or creator team (e.g., Conceptualization, Data Curation, + Methodology). slot_uri: d4d:creditRoles range: CRediTRoleEnum multivalued: true @@ -135,4 +144,6 @@ classes: grant_number: description: "The alphanumeric identifier for the grant." range: string - slot_uri: schema:identifier \ No newline at end of file + slot_uri: d4d:grantIdentifier + broad_mappings: + - schema:identifier \ No newline at end of file diff --git a/src/data_sheets_schema/schema/D4D_Preprocessing.yaml b/src/data_sheets_schema/schema/D4D_Preprocessing.yaml index 91ae872f..b14c1362 100644 --- a/src/data_sheets_schema/schema/D4D_Preprocessing.yaml +++ b/src/data_sheets_schema/schema/D4D_Preprocessing.yaml @@ -51,7 +51,9 @@ classes: attributes: preprocessing_details: description: > - Details on preprocessing steps applied to the data. + Free-text description of preprocessing steps applied to the data, + including tools used, parameters, order of operations, and rationale + for each step. range: string multivalued: true slot_uri: dcterms:description @@ -67,7 +69,9 @@ classes: attributes: cleaning_details: description: > - Details on data cleaning procedures applied. + Free-text description of data cleaning procedures applied, including + criteria for removing or correcting instances, tools used, and how + removed instances are accounted for. range: string multivalued: true slot_uri: dcterms:description @@ -81,12 +85,11 @@ classes: attributes: data_annotation_platform: description: >- - Platform or tool used for annotation (e.g., Label Studio, Prodigy, + One or more platforms or tools used for annotation (e.g., Label Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool). range: string - slot_uri: schema:instrument - exact_mappings: - - rai:dataAnnotationPlatform + multivalued: true + slot_uri: rai:dataAnnotationPlatform data_annotation_protocol: description: >- Annotation methodology, tasks, and protocols followed during labeling. @@ -104,6 +107,8 @@ classes: range: integer exact_mappings: - rai:annotationsPerItem + annotations: + "d4d:docExample": "3 (three independent annotators per item)" inter_annotator_agreement: description: >- Measure of agreement between annotators (e.g., Cohen's kappa, Fleiss' kappa, @@ -112,8 +117,8 @@ classes: slot_uri: schema:measurementMethod annotator_demographics: description: >- - Demographic information about annotators, if available and relevant - (e.g., geographic location, language background, expertise level). + One or more demographic characteristics of the annotators, if available and relevant + (e.g., geographic location, language background, expertise level, native language). slot_uri: d4d:annotatorDemographics range: string multivalued: true @@ -121,7 +126,8 @@ classes: - rai:annotatorDemographics labeling_details: description: > - Details on labeling/annotation procedures and quality metrics. + Free-text description of the labeling or annotation procedures, including + annotation guidelines, task definitions, and quality control metrics. range: string multivalued: true slot_uri: dcterms:description @@ -135,11 +141,14 @@ classes: attributes: access_url: description: "URL or access point for the raw data." - slot_uri: dcat:accessURL + slot_uri: d4d:rawDataAccessURL range: uri + annotations: + "d4d:docExample": "https://example.org/dataset/raw/raw-data.zip" raw_data_details: description: > - Details on raw data availability and access procedures. + Free-text description of raw data availability, access procedures, + and any conditions or restrictions on accessing the raw data. range: string multivalued: true slot_uri: dcterms:description @@ -233,9 +242,11 @@ classes: List of automated annotation tools with their versions. Format each entry as "ToolName version" (e.g., "spaCy 3.5.0", "NLTK 3.8", "GPT-4 turbo"). Use "unknown" for version if not available (e.g., "Custom NER Model unknown"). - slot_uri: schema:name + slot_uri: d4d:toolNames range: string multivalued: true + broad_mappings: + - schema:name tool_descriptions: description: > Descriptions of what each tool does in the annotation process and @@ -245,7 +256,7 @@ classes: multivalued: true tool_accuracy: description: > - Known accuracy or performance metrics for the automated tools (if available). + One or more known accuracy or performance metrics for the automated tools (if available). Include metric name and value (e.g., "spaCy F1: 0.95", "GPT-4 Accuracy: 92%"). slot_uri: d4d:toolAccuracy range: string diff --git a/src/data_sheets_schema/schema/D4D_Uses.yaml b/src/data_sheets_schema/schema/D4D_Uses.yaml index 55312da9..b4af8b21 100644 --- a/src/data_sheets_schema/schema/D4D_Uses.yaml +++ b/src/data_sheets_schema/schema/D4D_Uses.yaml @@ -51,17 +51,19 @@ classes: UseRepository: - description: > - Is there a repository that links to any or all papers or systems - that use the dataset? If so, provide a link or other access point. + description: >- + A repository or registry of known uses of this dataset by third parties. Documents where the dataset has been applied, enabling discoverability of downstream use cases and impact tracking. is_a: DatasetProperty attributes: repository_url: description: "URL to a repository of known dataset uses." range: uri + annotations: + "d4d:docExample": "https://example.org/dataset/known-uses" repository_details: description: > - Details on the repository of known dataset uses. + Free-text description of the repository of known dataset uses, including + how it is maintained and how to contribute new use cases. range: string multivalued: true slot_uri: dcterms:description @@ -74,7 +76,8 @@ classes: attributes: task_details: description: > - Details on other potential tasks the dataset could be used for. + Free-text description of other potential tasks the dataset could support, + including any prerequisites or limitations for those uses. range: string multivalued: true slot_uri: dcterms:description @@ -92,7 +95,9 @@ classes: attributes: impact_details: description: > - Details on potential impacts, risks, and mitigation strategies. + Free-text description of potential future impacts or risks arising from the + dataset's composition or collection (e.g., unfair treatment, privacy violations, + legal or financial risks), and any recommended mitigation strategies. range: string multivalued: true slot_uri: dcterms:description @@ -105,7 +110,9 @@ classes: attributes: discouragement_details: description: > - Details on tasks for which the dataset should not be used. + Free-text description of tasks or applications for which the dataset is + not recommended, with explanation of why (e.g., out-of-scope, risk of harm, + poor coverage). range: string multivalued: true slot_uri: dcterms:description @@ -127,11 +134,11 @@ classes: multivalued: true usage_notes: description: >- - Notes or caveats about using the dataset for intended purposes. + A note or caveat about using the dataset for its intended purposes. range: string use_category: description: >- - Category of intended use (e.g., research, clinical, educational, + One or more categories of intended use (e.g., research, clinical, educational, commercial, policy). slot_uri: d4d:useCategory range: string @@ -148,7 +155,7 @@ classes: attributes: prohibition_reason: description: >- - Reason why this use is prohibited (e.g., license restriction, + One or more reasons why this use is prohibited (e.g., license restriction, ethical concern, privacy risk, legal constraint). slot_uri: d4d:prohibitionReason range: string diff --git a/src/data_sheets_schema/schema/D4D_Variables.yaml b/src/data_sheets_schema/schema/D4D_Variables.yaml index fa1ccb89..593ae780 100644 --- a/src/data_sheets_schema/schema/D4D_Variables.yaml +++ b/src/data_sheets_schema/schema/D4D_Variables.yaml @@ -44,9 +44,10 @@ classes: The name or identifier of the variable as it appears in the data files. range: string required: true - slot_uri: schema:name - exact_mappings: + slot_uri: d4d:variableName + broad_mappings: - schema:name + - schema:identifier data_type: description: >- @@ -81,16 +82,20 @@ classes: The minimum value that the variable can take. Applicable to numeric variables. range: float slot_uri: schema:minValue + annotations: + "d4d:docExample": "0.0" maximum_value: description: >- The maximum value that the variable can take. Applicable to numeric variables. range: float slot_uri: schema:maxValue + annotations: + "d4d:docExample": "100.0" categories: description: >- - The permitted categories or values for a categorical variable. + One or more permitted categories or values for a categorical variable. Each entry should describe a possible value and its meaning. range: string multivalued: true @@ -108,7 +113,7 @@ classes: Indicates whether this variable serves as a unique identifier or key for records in the dataset. range: boolean - slot_uri: schema:identifier + slot_uri: d4d:isIdentifier is_sensitive: description: >- @@ -122,6 +127,8 @@ classes: The precision or number of decimal places for numeric variables. slot_uri: schema:valuePrecision range: integer + annotations: + "d4d:docExample": "2 (two decimal places, e.g., 3.14)" measurement_technique: description: >- @@ -141,9 +148,11 @@ classes: description: >- Notes about data quality, reliability, or known issues specific to this variable. - slot_uri: dcterms:description + slot_uri: d4d:qualityNotes range: string multivalued: true + broad_mappings: + - dcterms:description enums: diff --git a/src/data_sheets_schema/schema/data_sheets_schema.yaml b/src/data_sheets_schema/schema/data_sheets_schema.yaml index e19f8861..5e868726 100644 --- a/src/data_sheets_schema/schema/data_sheets_schema.yaml +++ b/src/data_sheets_schema/schema/data_sheets_schema.yaml @@ -133,52 +133,86 @@ classes: Can be aggregated from file_collections[].file_count. range: integer slot_uri: d4d:totalFileCount + annotations: + "d4d:docExample": "156" total_size_bytes: description: >- Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes. range: integer slot_uri: dcat:byteSize + annotations: + "d4d:docExample": "10737418240 (10 GiB = 10 × 1024³ bytes)" # Motivation module classes purposes: + description: >- + Purposes for which the dataset was created. List of Purpose objects + from the Motivation module, each describing a specific creation goal + or intended application. slot_uri: d4d:purposes range: Purpose multivalued: true inlined_as_list: true tasks: + description: >- + Tasks the dataset is intended to support. List of Task objects + from the Motivation module describing specific machine learning, + research, or analytical tasks. slot_uri: d4d:tasks range: Task multivalued: true inlined_as_list: true addressing_gaps: + description: >- + Research or practical gaps this dataset addresses. List of + AddressingGap objects from the Motivation module, each describing + a gap in existing datasets or knowledge that this dataset fills. slot_uri: d4d:addressingGaps range: AddressingGap multivalued: true inlined_as_list: true creators: + description: >- + Individuals or organizations who created the dataset. List of + Creator objects describing authorship, roles, and affiliations + of dataset creators. slot_uri: schema:creator range: Creator multivalued: true inlined_as_list: true funders: + description: >- + Funding mechanisms that supported dataset creation. List of + FundingMechanism objects describing grants, contracts, or other + funding sources including grantors and grant identifiers. slot_uri: schema:funder range: FundingMechanism multivalued: true inlined_as_list: true # Composition module classes subsets: + description: >- + Subsets or splits of this dataset. List of DataSubset objects + from the Composition module, each representing a logical partition + such as training, validation, or test splits, or demographic subgroups. range: DataSubset multivalued: true inlined_as_list: true - slot_uri: dcat:distribution - exact_mappings: - - schema:distribution + slot_uri: d4d:dataSubset instances: + description: >- + Individual data instances or records in the dataset. List of + Instance objects from the Composition module describing what + each data point represents, its type, and associated label information. slot_uri: d4d:instances range: Instance multivalued: true inlined_as_list: true anomalies: + description: >- + Known data quality issues, errors, or irregularities in the dataset. + List of DataAnomaly objects from the Composition module, each + documenting a specific anomaly and its potential impact. slot_uri: d4d:anomalies range: DataAnomaly multivalued: true @@ -201,51 +235,142 @@ classes: multivalued: true inlined_as_list: true confidential_elements: + description: >- + Confidential or restricted information within the dataset that + requires access controls. List of Confidentiality objects describing + what is confidential and why it cannot be released. slot_uri: d4d:confidentialElements range: Confidentiality multivalued: true inlined_as_list: true content_warnings: + description: >- + Content warnings for potentially harmful, offensive, or disturbing + material in the dataset. List of ContentWarning objects alerting + users to sensitive content categories. slot_uri: d4d:contentWarnings range: ContentWarning multivalued: true inlined_as_list: true subpopulations: + description: >- + Subpopulations represented within the dataset. List of Subpopulation + objects from the Composition module describing demographic or other + groups, their representation, and any imbalances. slot_uri: d4d:subpopulations range: Subpopulation multivalued: true inlined_as_list: true sensitive_elements: + description: >- + Sensitive data elements requiring special handling or access controls. + List of SensitiveElement objects identifying sensitive attributes + such as personal identifiers, protected health information, or + legally sensitive content. slot_uri: d4d:sensitiveElements range: SensitiveElement multivalued: true inlined_as_list: true + relationships: + description: >- + Explicit relationships between individual instances in the dataset. + List of Relationships objects from the Composition module describing + how instances relate (e.g., graph edges, ratings, social network links). + slot_uri: d4d:relationships + range: Relationships + multivalued: true + inlined_as_list: true + splits: + description: >- + Recommended data splits for this dataset. List of Splits objects + from the Composition module describing train/validation/test partitions + and the rationale for each split strategy. + slot_uri: d4d:splits + range: Splits + multivalued: true + inlined_as_list: true # Collection module classes acquisition_methods: + description: >- + Methods used to acquire or obtain dataset instances. List of + InstanceAcquisition objects from the Collection module describing + how data was sourced, whether directly observed or derived. slot_uri: d4d:acquisitionMethods range: InstanceAcquisition multivalued: true inlined_as_list: true collection_mechanisms: + description: >- + Mechanisms, instruments, or tools used for data collection. + List of CollectionMechanism objects from the Collection module + describing sensors, surveys, APIs, or other collection instruments. slot_uri: d4d:collectionMechanisms range: CollectionMechanism multivalued: true inlined_as_list: true sampling_strategies: + description: >- + Strategies used to select data instances from a larger population. + List of SamplingStrategy objects from the Collection module + describing sampling methodology, inclusion criteria, and limitations. slot_uri: d4d:samplingStrategies range: SamplingStrategy multivalued: true inlined_as_list: true data_collectors: + description: >- + Individuals or organizations responsible for collecting the data. + List of DataCollector objects from the Collection module describing + who performed data collection and their roles. slot_uri: d4d:dataCollectors range: DataCollector multivalued: true inlined_as_list: true collection_timeframes: + description: >- + Time periods during which data was collected. List of + CollectionTimeframe objects from the Collection module describing + collection start and end dates, and any gaps in the collection period. slot_uri: d4d:collectionTimeframes range: CollectionTimeframe multivalued: true inlined_as_list: true + direct_collection: + description: >- + Whether data was collected directly from individuals or via third parties. + List of DirectCollection objects from the Collection module describing + direct vs. indirect collection methods and sources. + slot_uri: d4d:directCollection + range: DirectCollection + multivalued: true + inlined_as_list: true + collection_notifications: + description: >- + Notifications provided to individuals about data collection. + List of CollectionNotification objects from the Ethics module describing + how and when individuals were informed about the data collection. + slot_uri: d4d:collectionNotifications + range: CollectionNotification + multivalued: true + inlined_as_list: true + collection_consents: + description: >- + Consent obtained from individuals for data collection and use. + List of CollectionConsent objects from the Ethics module describing + how consent was requested, provided, and documented. + slot_uri: d4d:collectionConsents + range: CollectionConsent + multivalued: true + inlined_as_list: true + consent_revocations: + description: >- + Mechanisms for individuals to revoke previously given consent. + List of ConsentRevocation objects from the Ethics module describing + how revocation works and what happens to data after revocation. + slot_uri: d4d:consentRevocations + range: ConsentRevocation + multivalued: true + inlined_as_list: true missing_data_documentation: description: >- Documentation of missing data patterns and handling strategies. @@ -255,18 +380,27 @@ classes: inlined_as_list: true raw_data_sources: description: >- - Description of raw data sources before preprocessing. + List of raw data sources before preprocessing. Each RawDataSource object + describes where the original data came from and how it can be accessed. slot_uri: d4d:rawDataSources range: RawDataSource multivalued: true inlined_as_list: true # Ethics module classes ethical_reviews: + description: >- + Ethical reviews and institutional oversight for the dataset. + List of EthicalReview objects from the Ethics module describing + IRB approvals, ethics committee reviews, and compliance certifications. slot_uri: d4d:ethicalReviews range: EthicalReview multivalued: true inlined_as_list: true data_protection_impacts: + description: >- + Data protection impact assessments (DPIAs) conducted for the dataset. + List of DataProtectionImpact objects from the Ethics module + documenting privacy risk assessments and mitigation measures. slot_uri: d4d:dataProtectionImpacts range: DataProtectionImpact multivalued: true @@ -311,28 +445,46 @@ classes: Information about compensation or incentives provided to human research participants. # Preprocessing module classes preprocessing_strategies: + description: >- + Preprocessing steps applied to the raw data. List of + PreprocessingStrategy objects from the Preprocessing module + describing normalization, transformation, and other preparation steps. slot_uri: d4d:preprocessingStrategies range: PreprocessingStrategy multivalued: true inlined_as_list: true cleaning_strategies: + description: >- + Data cleaning and quality control procedures applied to the dataset. + List of CleaningStrategy objects from the Preprocessing module + describing outlier removal, deduplication, and error correction steps. slot_uri: d4d:cleaningStrategies range: CleaningStrategy multivalued: true inlined_as_list: true labeling_strategies: + description: >- + Labeling or annotation methodologies applied to the data. List of + LabelingStrategy objects from the Preprocessing module describing + annotation procedures, annotator qualifications, and quality controls. slot_uri: d4d:labelingStrategies range: LabelingStrategy multivalued: true inlined_as_list: true raw_sources: + description: >- + Raw, unprocessed source data before any preprocessing was applied. + List of RawData objects from the Preprocessing module describing + original data sources and their formats. slot_uri: d4d:rawSources range: RawData multivalued: true inlined_as_list: true imputation_protocols: description: >- - Data imputation methodology and techniques. + Data imputation protocols applied to handle missing values. + List of ImputationProtocol objects from the Preprocessing module + describing the imputation technique, affected variables, and rationale. slot_uri: d4d:imputation_protocols range: ImputationProtocol multivalued: true @@ -352,26 +504,46 @@ classes: inlined_as_list: true # Uses module classes existing_uses: + description: >- + Known existing uses of the dataset at the time of publication. + List of ExistingUse objects from the Uses module describing + research, commercial, or other applications of the dataset. slot_uri: d4d:existingUses range: ExistingUse multivalued: true inlined_as_list: true use_repository: + description: >- + Repositories or registries tracking how the dataset has been used. + List of UseRepository objects from the Uses module pointing to + papers with code, citation indices, or other use-tracking resources. slot_uri: d4d:useRepository range: UseRepository multivalued: true inlined_as_list: true other_tasks: + description: >- + Additional tasks the dataset may support beyond its original intent. + List of OtherTask objects from the Uses module describing potential + applications not originally planned by the dataset creators. slot_uri: d4d:otherTasks range: OtherTask multivalued: true inlined_as_list: true future_use_impacts: + description: >- + Anticipated impacts of future uses, including risks and benefits. + List of FutureUseImpact objects from the Uses module describing + foreseeable consequences of using this dataset in new applications. slot_uri: d4d:futureUseImpacts range: FutureUseImpact multivalued: true inlined_as_list: true discouraged_uses: + description: >- + Uses that are not recommended for this dataset due to limitations, + risks, or ethical concerns. List of DiscouragedUse objects from + the Uses module explaining why certain applications should be avoided. slot_uri: d4d:discouragedUses range: DiscouragedUse multivalued: true @@ -394,52 +566,107 @@ classes: Stronger than discouraged_uses - these are not permitted. # Distribution module classes distribution_formats: + description: >- + Formats in which the dataset is distributed or made available. + List of DistributionFormat objects from the Distribution module + describing file formats, compression, and access methods. slot_uri: d4d:distributionFormats range: DistributionFormat multivalued: true inlined_as_list: true distribution_dates: + description: >- + Dates when the dataset was or will be distributed or released. + List of DistributionDate objects from the Distribution module + describing initial release dates, version release dates, and + planned future releases. slot_uri: d4d:distributionDates range: DistributionDate multivalued: true inlined_as_list: true + third_party_sharing: + description: >- + Third-party distribution policies for the dataset. + List of ThirdPartySharing objects from the Distribution module describing + whether and how the dataset is shared with entities outside the + creating organization. + slot_uri: d4d:thirdPartySharing + range: ThirdPartySharing + multivalued: true + inlined_as_list: true # Data Governance module classes license_and_use_terms: + description: >- + License and usage terms governing dataset access and use. + LicenseAndUseTerms object from the Data Governance module describing + the applicable license, permitted uses, and any restrictions. slot_uri: schema:license range: LicenseAndUseTerms inlined: true ip_restrictions: + description: >- + Intellectual property restrictions on dataset use or redistribution. + IPRestrictions object from the Data Governance module describing + copyright, trademark, or other IP considerations. slot_uri: d4d:ipRestrictions range: IPRestrictions inlined: true regulatory_restrictions: + description: >- + Regulatory and export control restrictions applicable to the dataset. + ExportControlRegulatoryRestrictions object from the Data Governance + module describing compliance requirements such as ITAR, EAR, or GDPR. slot_uri: d4d:regulatoryRestrictions range: ExportControlRegulatoryRestrictions inlined: true # Maintenance module classes maintainers: + description: >- + Individuals or organizations responsible for maintaining the dataset. + List of Maintainer objects from the Maintenance module describing + maintenance contacts, roles, and support channels. slot_uri: d4d:maintainers range: Maintainer multivalued: true inlined_as_list: true errata: + description: >- + Known errors or corrections to the dataset since publication. + List of Erratum objects from the Maintenance module describing + discovered errors, affected records, and correction procedures. slot_uri: d4d:errata range: Erratum multivalued: true inlined_as_list: true updates: + description: >- + Plans for future updates or versioning of the dataset. + UpdatePlan object from the Maintenance module describing update + frequency, versioning policy, and planned enhancements. slot_uri: d4d:updates range: UpdatePlan inlined: true retention_limit: + description: >- + Data retention policies and limits for the dataset. + RetentionLimits object from the Maintenance module describing + how long the dataset will be available and any deletion schedules. slot_uri: d4d:retentionLimit range: RetentionLimits inlined: true version_access: - slot_uri: dcat:accessURL + description: >- + Information about access to different versions of the dataset. + VersionAccess object from the Maintenance module describing + where older versions can be found and how version history is maintained. + slot_uri: d4d:versionAccess range: VersionAccess inlined: true extension_mechanism: + description: >- + Mechanisms for extending or contributing to the dataset. + ExtensionMechanism object from the Maintenance module describing + how others can propose additions, corrections, or expansions. slot_uri: d4d:extensionMechanism range: ExtensionMechanism inlined: true @@ -455,11 +682,20 @@ classes: - schema:variableMeasured # Other attributes is_deidentified: + description: >- + De-identification status and procedures applied to the dataset. + Deidentification object describing whether the dataset contains + personal data, what de-identification methods were applied, and + any residual re-identification risks. slot_uri: d4d:isDeidentified range: Deidentification inlined: true is_tabular: - slot_uri: schema:encodingFormat + description: >- + Whether the dataset is in tabular format (rows and columns). + True if the data is structured as a table (e.g., CSV, TSV, relational + database); false for unstructured formats such as images or free text. + slot_uri: d4d:isTabular range: boolean # Dataset citation and relationships citation: @@ -526,13 +762,15 @@ slots: # Additional main schema specific slots same_as: description: >- - URL of a reference web resource that is the same as this dataset. + One or more URLs or URIs identifying equivalent or related representations of this dataset. Used to link to canonical or alternative representations of the same dataset on different platforms (e.g., DOI resolver, institutional repository, data catalog). singular_name: same_as multivalued: true range: uriorcurie slot_uri: schema:sameAs + annotations: + "d4d:docExample": "doi:10.XXXXX/example-dataset" exact_mappings: - schema:sameAs diff --git a/src/data_sheets_schema/schema/data_sheets_schema_all.yaml b/src/data_sheets_schema/schema/data_sheets_schema_all.yaml index e864172e..8737139a 100644 --- a/src/data_sheets_schema/schema/data_sheets_schema_all.yaml +++ b/src/data_sheets_schema/schema/data_sheets_schema_all.yaml @@ -1,4 +1,3 @@ ---- name: data-sheets-schema description: A LinkML schema for Datasheets for Datasets. title: data-sheets-schema @@ -392,176 +391,293 @@ types: enums: FormatEnum: name: FormatEnum + description: Common file format extensions for data files and documents. from_schema: https://w3id.org/bridge2ai/data-sheets-schema permissible_values: CSV: text: CSV + description: Comma-Separated Values - tabular data format. TSV: text: TSV + description: Tab-Separated Values - tabular data format with tab delimiters. XML: text: XML + description: Extensible Markup Language - structured markup format. JSON: text: JSON + description: JavaScript Object Notation - structured data interchange format. JSONL: text: JSONL + description: JSON Lines - newline-delimited JSON format. YAML: text: YAML + description: YAML Ain't Markup Language - human-readable data serialization + format. HTML: text: HTML + description: HyperText Markup Language - web page markup format. PDF: text: PDF + description: Portable Document Format - fixed-layout document format. DOCX: text: DOCX + description: Microsoft Word Open XML Document - word processing document. XLSX: text: XLSX + description: Microsoft Excel Open XML Spreadsheet - spreadsheet format. PPTX: text: PPTX + description: Microsoft PowerPoint Open XML Presentation - presentation format. TXT: text: TXT + description: Plain text file. MD: text: MD + description: Markdown - lightweight markup language. ZIP: text: ZIP + description: ZIP archive - compressed file container. TAR: text: TAR + description: Tape Archive - file archive format. GZ: text: GZ + description: Gzip compressed file. BZ2: text: BZ2 + description: Bzip2 compressed file. XZ: text: XZ + description: XZ compressed file. MediaTypeEnum: name: MediaTypeEnum + description: MIME media types (Internet Media Types) for file content identification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema permissible_values: text/csv: text: text/csv + description: MIME type for CSV (Comma-Separated Values) files. text/tab-separated-values: text: text/tab-separated-values + description: MIME type for TSV (Tab-Separated Values) files. application/json: text: application/json + description: MIME type for JSON (JavaScript Object Notation) files. application/xml: text: application/xml + description: MIME type for XML (Extensible Markup Language) files. text/xml: text: text/xml + description: Alternative MIME type for XML files (text variant). application/yaml: text: application/yaml + description: MIME type for YAML files. text/yaml: text: text/yaml + description: Alternative MIME type for YAML files (text variant). text/html: text: text/html + description: MIME type for HTML (HyperText Markup Language) files. application/pdf: text: application/pdf + description: MIME type for PDF (Portable Document Format) files. application/vnd.openxmlformats-officedocument.wordprocessingml.document: text: application/vnd.openxmlformats-officedocument.wordprocessingml.document + description: MIME type for Microsoft Word DOCX files. application/vnd.openxmlformats-officedocument.spreadsheetml.sheet: text: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet + description: MIME type for Microsoft Excel XLSX files. application/vnd.openxmlformats-officedocument.presentationml.presentation: text: application/vnd.openxmlformats-officedocument.presentationml.presentation + description: MIME type for Microsoft PowerPoint PPTX files. text/plain: text: text/plain + description: MIME type for plain text files. text/markdown: text: text/markdown + description: MIME type for Markdown files. application/zip: text: application/zip + description: MIME type for ZIP archive files. application/x-tar: text: application/x-tar + description: MIME type for TAR archive files. application/gzip: text: application/gzip + description: MIME type for Gzip compressed files. application/x-bzip2: text: application/x-bzip2 + description: MIME type for Bzip2 compressed files. application/x-xz: text: application/x-xz + description: MIME type for XZ compressed files. CompressionEnum: name: CompressionEnum + description: Compression algorithms and formats for file compression. from_schema: https://w3id.org/bridge2ai/data-sheets-schema permissible_values: gzip: text: gzip + description: GNU zip compression (commonly used with .gz extension). bzip2: text: bzip2 + description: Burrows-Wheeler block-sorting compression (commonly used with + .bz2 extension). zip: text: zip + description: ZIP archive compression format. tar: text: tar + description: Tape Archive format (typically combined with gzip or bzip2). xz: text: xz + description: XZ Utils compression using LZMA2 algorithm. lzma: text: lzma + description: Lempel-Ziv-Markov chain algorithm compression. compress: text: compress + description: Unix compress utility (LZW compression). EncodingEnum: name: EncodingEnum + description: Character encoding schemes for text representation in different languages + and scripts. from_schema: https://w3id.org/bridge2ai/data-sheets-schema permissible_values: ASCII: text: ASCII + description: American Standard Code for Information Interchange (7-bit, English + characters only). Big5: text: Big5 + description: Traditional Chinese character encoding (primarily Taiwan and + Hong Kong). EUC-JP: text: EUC-JP + description: Extended Unix Code for Japanese. EUC-KR: text: EUC-KR + description: Extended Unix Code for Korean. EUC-TW: text: EUC-TW + description: Extended Unix Code for Traditional Chinese. GB2312: text: GB2312 + description: Simplified Chinese character encoding standard. HZ-GB-2312: text: HZ-GB-2312 + description: 7-bit encoding for Simplified Chinese (GB2312) ISO-2022-CN-EXT: text: ISO-2022-CN-EXT + description: Extended ISO-2022 encoding for Chinese (includes both Simplified + and Traditional). ISO-2022-CN: text: ISO-2022-CN + description: ISO-2022 encoding for Chinese. ISO-2022-JP-2: text: ISO-2022-JP-2 + description: Extended ISO-2022 encoding for Japanese (includes additional + character sets). ISO-2022-JP: text: ISO-2022-JP + description: ISO-2022 encoding for Japanese. ISO-2022-KR: text: ISO-2022-KR + description: ISO-2022 encoding for Korean. ISO-8859-10: text: ISO-8859-10 + description: Latin-6 (Nordic languages - Danish, Norwegian, Swedish, Icelandic). ISO-8859-11: text: ISO-8859-11 + description: Latin/Thai encoding. ISO-8859-13: text: ISO-8859-13 + description: Latin-7 (Baltic Rim languages). ISO-8859-14: text: ISO-8859-14 + description: Latin-8 (Celtic languages). ISO-8859-15: text: ISO-8859-15 + description: Latin-9 (Western European with Euro sign). ISO-8859-16: text: ISO-8859-16 + description: Latin-10 (South-Eastern European languages). ISO-8859-1: text: ISO-8859-1 + description: Latin-1 (Western European languages). ISO-8859-2: text: ISO-8859-2 + description: Latin-2 (Central European languages). ISO-8859-3: text: ISO-8859-3 + description: Latin-3 (South European languages - Turkish, Maltese, Esperanto). ISO-8859-4: text: ISO-8859-4 + description: Latin-4 (North European languages). ISO-8859-5: text: ISO-8859-5 + description: Latin/Cyrillic encoding. ISO-8859-6: text: ISO-8859-6 + description: Latin/Arabic encoding. ISO-8859-7: text: ISO-8859-7 + description: Latin/Greek encoding. ISO-8859-8: text: ISO-8859-8 + description: Latin/Hebrew encoding. ISO-8859-9: text: ISO-8859-9 + description: Latin-5 (Turkish). KOI8-R: text: KOI8-R + description: Russian character encoding (Kod Obmena Informatsiey). KOI8-U: text: KOI8-U + description: Ukrainian character encoding. Shift_JIS: text: Shift_JIS + description: Japanese character encoding (Microsoft and other systems). UTF-16: text: UTF-16 + description: Unicode Transformation Format 16-bit (variable-width encoding). UTF-32: text: UTF-32 + description: Unicode Transformation Format 32-bit (fixed-width encoding). UTF-7: text: UTF-7 + description: Unicode Transformation Format 7-bit (for 7-bit channels). UTF-8: text: UTF-8 + description: Unicode Transformation Format 8-bit (variable-width, most common + Unicode encoding). + Windows-1250: + text: Windows-1250 + description: Windows code page for Central European languages. + Windows-1251: + text: Windows-1251 + description: Windows code page for Cyrillic script. + Windows-1252: + text: Windows-1252 + description: Windows code page for Western European languages. + Windows-1253: + text: Windows-1253 + description: Windows code page for Greek. + Windows-1254: + text: Windows-1254 + description: Windows code page for Turkish. + Windows-1255: + text: Windows-1255 + description: Windows code page for Hebrew. + Windows-1256: + text: Windows-1256 + description: Windows code page for Arabic. + Windows-1257: + text: Windows-1257 + description: Windows code page for Baltic languages. + Windows-1258: + text: Windows-1258 + description: Windows code page for Vietnamese. CRediTRoleEnum: name: CRediTRoleEnum description: Contributor roles based on the CRediT (Contributor Roles Taxonomy). @@ -571,22 +687,22 @@ enums: conceptualization: text: conceptualization description: Ideas; formulation or evolution of overarching research goals - and aims + and aims. methodology: text: methodology - description: Development or design of methodology; creation of models + description: Development or design of methodology; creation of models. software: text: software - description: Programming, software development; designing computer programs + description: Programming, software development; designing computer programs. validation: text: validation - description: Verification of the overall replication/reproducibility of results + description: Verification of the overall replication/reproducibility of results. formal_analysis: text: formal_analysis - description: Application of statistical, mathematical, or other formal techniques + description: Application of statistical, mathematical, or other formal techniques. investigation: text: investigation - description: Conducting the research and investigation process + description: Conducting the research and investigation process. resources: text: resources description: Provision of study materials, reagents, patients, laboratory @@ -594,26 +710,26 @@ enums: data_curation: text: data_curation description: Management activities to annotate, scrub data and maintain research - data + data. writing_original_draft: text: writing_original_draft - description: Preparation, creation and/or presentation of the published work + description: Preparation, creation and/or presentation of the published work. writing_review_editing: text: writing_review_editing - description: Critical review, commentary or revision of the work + description: Critical review, commentary or revision of the work. visualization: text: visualization description: Preparation, creation and/or presentation of visualizations/data - presentation + presentation. supervision: text: supervision - description: Oversight and leadership responsibility for the research activity + description: Oversight and leadership responsibility for the research activity. project_administration: text: project_administration - description: Management and coordination responsibility for the research activity + description: Management and coordination responsibility for the research activity. funding_acquisition: text: funding_acquisition - description: Acquisition of the financial support for the project + description: Acquisition of the financial support for the project. BiasTypeEnum: name: BiasTypeEnum description: Types of bias that may be present in datasets. Values are mapped @@ -715,31 +831,13 @@ enums: permissible_values: MAJOR: text: MAJOR - description: Incompatible changes, breaking backward compatibility + description: Incompatible changes, breaking backward compatibility. MINOR: text: MINOR - description: Backward-compatible new functionality or enhancements + description: Backward-compatible new functionality or enhancements. PATCH: text: PATCH - description: Backward-compatible bug fixes or minor corrections - Windows-1250: - text: Windows-1250 - Windows-1251: - text: Windows-1251 - Windows-1252: - text: Windows-1252 - Windows-1253: - text: Windows-1253 - Windows-1254: - text: Windows-1254 - Windows-1255: - text: Windows-1255 - Windows-1256: - text: Windows-1256 - Windows-1257: - text: Windows-1257 - Windows-1258: - text: Windows-1258 + description: Backward-compatible bug fixes or minor corrections. CreatorOrMaintainerEnum: name: CreatorOrMaintainerEnum description: Types of agents (persons or organizations) involved in dataset creation @@ -793,16 +891,20 @@ enums: description: Other type of creator or maintainer not listed. Boolean: name: Boolean + description: Three-valued boolean logic supporting true, false, and unknown states. from_schema: https://w3id.org/bridge2ai/data-sheets-schema permissible_values: 'true': text: 'true' + description: Affirmative or positive value. title: 'True' 'false': text: 'false' + description: Negative or false value. title: 'False' unknown: text: unknown + description: Unknown, uncertain, or not applicable value. title: Unknown DatasetRelationshipTypeEnum: name: DatasetRelationshipTypeEnum @@ -965,92 +1067,92 @@ enums: permissible_values: no_restriction: text: no_restriction - description: No restriction on data use + description: No restriction on data use. meaning: DUO:0000004 general_research_use: text: general_research_use - description: Data available for any research purpose (GRU) + description: Data available for any research purpose (GRU). meaning: DUO:0000042 health_medical_biomedical_research: text: health_medical_biomedical_research - description: Data limited to health, medical, or biomedical research (HMB) + description: Data limited to health, medical, or biomedical research (HMB). meaning: DUO:0000006 disease_specific_research: text: disease_specific_research - description: Data limited to research on specified disease(s) (DS) + description: Data limited to research on specified disease(s) (DS). meaning: DUO:0000007 population_origins_ancestry_research: text: population_origins_ancestry_research - description: Data limited to population origins or ancestry research (POA) + description: Data limited to population origins or ancestry research (POA). meaning: DUO:0000011 clinical_care_use: text: clinical_care_use - description: Data available for clinical care and applications (CC) + description: Data available for clinical care and applications (CC). meaning: DUO:0000043 no_commercial_use: text: no_commercial_use - description: Data use limited to non-commercial purposes (NCU) + description: Data use limited to non-commercial purposes (NCU). meaning: DUO:0000046 non_profit_use_only: text: non_profit_use_only - description: Data use limited to not-for-profit organizations (NPU) + description: Data use limited to not-for-profit organizations (NPU). meaning: DUO:0000045 non_profit_use_and_non_commercial_use: text: non_profit_use_and_non_commercial_use description: Data limited to not-for-profit organizations and non-commercial - use (NPUNCU) + use (NPUNCU). meaning: DUO:0000018 no_methods_development: text: no_methods_development - description: Data cannot be used for methods or software development (NMDS) + description: Data cannot be used for methods or software development (NMDS). meaning: DUO:0000015 genetic_studies_only: text: genetic_studies_only - description: Data limited to genetic studies only (GSO) + description: Data limited to genetic studies only (GSO). meaning: DUO:0000016 ethics_approval_required: text: ethics_approval_required - description: Ethics approval (e.g., IRB/ERB) required for data use (IRB) + description: Ethics approval (e.g., IRB/ERB) required for data use (IRB). meaning: DUO:0000021 collaboration_required: text: collaboration_required - description: Collaboration with primary investigator required (COL) + description: Collaboration with primary investigator required (COL). meaning: DUO:0000020 publication_required: text: publication_required - description: Results must be published/shared with research community (PUB) + description: Results must be published/shared with research community (PUB). meaning: DUO:0000019 geographic_restriction: text: geographic_restriction - description: Data use limited to specific geographic region (GS) + description: Data use limited to specific geographic region (GS). meaning: DUO:0000022 institution_specific: text: institution_specific - description: Data use limited to approved institutions (IS) + description: Data use limited to approved institutions (IS). meaning: DUO:0000028 project_specific: text: project_specific - description: Data use limited to approved project(s) (PS) + description: Data use limited to approved project(s) (PS). meaning: DUO:0000027 user_specific: text: user_specific - description: Data use limited to approved users (US) + description: Data use limited to approved users (US). meaning: DUO:0000026 time_limit: text: time_limit - description: Data use approved for limited time period (TS) + description: Data use approved for limited time period (TS). meaning: DUO:0000025 return_to_database: text: return_to_database - description: Derived data must be returned to database/resource (RTN) + description: Derived data must be returned to database/resource (RTN). meaning: DUO:0000029 publication_moratorium: text: publication_moratorium - description: Publication restricted until specified date (MOR) + description: Publication restricted until specified date (MOR). meaning: DUO:0000024 no_population_ancestry_research: text: no_population_ancestry_research - description: Population/ancestry research prohibited (NPOA) + description: Population/ancestry research prohibited (NPOA). meaning: DUO:0000044 VariableTypeEnum: name: VariableTypeEnum @@ -1135,39 +1237,39 @@ enums: permissible_values: data_file: text: data_file - description: A data file containing dataset content + description: A data file containing dataset content. meaning: schema:DataDownload code_file: text: code_file - description: A source code or script file + description: A source code or script file. meaning: schema:SoftwareSourceCode documentation_file: text: documentation_file - description: A documentation file (README, guide, etc.) + description: A documentation file (README, guide, etc.). meaning: schema:Documentation metadata_file: text: metadata_file - description: A metadata or annotation file + description: A metadata or annotation file. meaning: dcat:CatalogRecord configuration_file: text: configuration_file - description: A configuration or settings file + description: A configuration or settings file. meaning: d4d:ConfigurationFile notebook_file: text: notebook_file - description: A computational notebook file (Jupyter, R Markdown, etc.) + description: A computational notebook file (Jupyter, R Markdown, etc.). meaning: d4d:NotebookFile image_file: text: image_file - description: An image or visualization file + description: An image or visualization file. meaning: schema:ImageObject archive_file: text: archive_file - description: An archive or compressed file + description: An archive or compressed file. meaning: d4d:ArchiveFile other: text: other - description: Other file type + description: Other file type. meaning: d4d:OtherFile FileCollectionTypeEnum: name: FileCollectionTypeEnum @@ -1176,50 +1278,51 @@ enums: permissible_values: raw_data: text: raw_data - description: Raw, unprocessed data files + description: Raw, unprocessed data files. meaning: d4d:RawData processed_data: text: processed_data - description: Cleaned, processed, or transformed data files + description: Cleaned, processed, or transformed data files. meaning: d4d:ProcessedData training_split: text: training_split - description: Files designated for model training + description: Files designated for model training. meaning: d4d:TrainingSplit test_split: text: test_split - description: Files designated for model testing + description: Files designated for model testing. meaning: d4d:TestSplit validation_split: text: validation_split - description: Files designated for model validation + description: Files designated for model validation. meaning: d4d:ValidationSplit documentation: text: documentation - description: Documentation files (README, codebook, etc.) + description: Documentation files (README, codebook, etc.). meaning: schema:Documentation metadata: text: metadata - description: Metadata or annotation files + description: Metadata or annotation files. meaning: dcat:CatalogRecord code: text: code - description: Code or script files + description: Code or script files. meaning: schema:SoftwareSourceCode supplementary: text: supplementary - description: Supplementary materials + description: Supplementary materials. meaning: schema:SupplementalMaterial other: text: other - description: Other file collection type + description: Other file collection type. meaning: d4d:OtherFileCollection slots: same_as: name: same_as - description: URL of a reference web resource that is the same as this dataset. - Used to link to canonical or alternative representations of the same dataset - on different platforms (e.g., DOI resolver, institutional repository, data catalog). + description: One or more URLs or URIs identifying equivalent or related representations + of this dataset. Used to link to canonical or alternative representations of + the same dataset on different platforms (e.g., DOI resolver, institutional repository, + data catalog). from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:sameAs @@ -1238,7 +1341,7 @@ slots: multivalued: true title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title domain_of: @@ -1249,7 +1352,7 @@ slots: - File language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -1262,6 +1365,7 @@ slots: - File publisher: name: publisher + description: The organization or entity responsible for making the resource available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher domain_of: @@ -1273,6 +1377,7 @@ slots: range: uriorcurie issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued domain_of: @@ -1284,6 +1389,8 @@ slots: range: datetime page: name: page + description: A landing page or web page providing access to or information about + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage domain_of: @@ -1310,6 +1417,7 @@ slots: range: integer path: name: path + description: The file path or URL where the content is located. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:contentUrl domain_of: @@ -1342,15 +1450,15 @@ slots: range: FormatEnum encoding: name: encoding - description: the character encoding of the data + description: The character encoding of the data. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcat:mediaType + slot_uri: d4d:characterEncoding domain_of: - File range: EncodingEnum compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat domain_of: @@ -1373,27 +1481,34 @@ slots: range: MediaTypeEnum hash: name: hash - description: hash of the data + description: 'Cryptographic hash value of the data for integrity verification + (e.g., SHA-256: ''e3b0c44298fc1c149afb...'', MD5: ''d41d8cd98f00b204e9800998ecf8427e'').' from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:hashValue domain_of: - File md5: name: md5 - description: md5 hash of the data + description: MD5 hash value of the data (128-bit cryptographic hash). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:md5Checksum domain_of: - File sha256: name: sha256 - description: sha256 hash of the data + description: SHA-256 hash value of the data (256-bit cryptographic hash, recommended). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + slot_uri: schema:sha256 domain_of: - File conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo domain_of: @@ -1404,8 +1519,11 @@ slots: - File conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema domain_of: - Information - DatasetCollection @@ -1414,8 +1532,12 @@ slots: - File conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass domain_of: - Information - DatasetCollection @@ -1424,6 +1546,8 @@ slots: - File license: name: license + description: The legal license under which the resource is made available (e.g., + "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license domain_of: @@ -1435,6 +1559,7 @@ slots: - File keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword domain_of: @@ -1446,8 +1571,9 @@ slots: multivalued: true version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version domain_of: - Software - Information @@ -1457,6 +1583,8 @@ slots: - File created_by: name: created_by + description: The person or organization primarily responsible for creating the + resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator domain_of: @@ -1467,6 +1595,7 @@ slots: - File created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created domain_of: @@ -1478,6 +1607,8 @@ slots: range: datetime last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified or + updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified domain_of: @@ -1489,6 +1620,8 @@ slots: range: datetime modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor domain_of: @@ -1499,8 +1632,9 @@ slots: - File status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus domain_of: - Information - DatasetCollection @@ -1509,6 +1643,7 @@ slots: - File was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -1521,9 +1656,14 @@ slots: - File doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier domain_of: - Information - DatasetCollection @@ -1601,7 +1741,7 @@ classes: inlined_as_list: true compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat alias: compression @@ -1616,6 +1756,8 @@ classes: range: CompressionEnum conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -1629,8 +1771,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: DatasetCollection domain_of: @@ -1642,8 +1788,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: DatasetCollection domain_of: @@ -1655,6 +1804,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -1668,6 +1819,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -1681,9 +1833,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: DatasetCollection domain_of: @@ -1714,6 +1871,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -1727,6 +1885,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -1741,7 +1900,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -1757,6 +1916,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -1770,6 +1931,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -1784,6 +1947,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -1797,6 +1962,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -1810,6 +1977,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -1823,8 +1992,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: DatasetCollection domain_of: @@ -1836,7 +2006,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -1850,8 +2020,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: DatasetCollection domain_of: @@ -1864,6 +2035,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -1880,6 +2053,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -2188,6 +2363,8 @@ classes: name: total_file_count description: Total number of files across all file collections in this dataset. Can be aggregated from file_collections[].file_count. + examples: + - value: '156' from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:totalFileCount alias: total_file_count @@ -2199,6 +2376,9 @@ classes: name: total_size_bytes description: Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes. + examples: + - value: '10737418240' + description: 10 GiB (10 × 1024³ bytes) from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:byteSize alias: total_size_bytes @@ -2208,6 +2388,9 @@ classes: range: integer purposes: name: purposes + description: Purposes for which the dataset was created. List of Purpose objects + from the Motivation module, each describing a specific creation goal or + intended application. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:purposes alias: purposes @@ -2220,6 +2403,9 @@ classes: inlined_as_list: true tasks: name: tasks + description: Tasks the dataset is intended to support. List of Task objects + from the Motivation module describing specific machine learning, research, + or analytical tasks. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:tasks alias: tasks @@ -2232,6 +2418,9 @@ classes: inlined_as_list: true addressing_gaps: name: addressing_gaps + description: Research or practical gaps this dataset addresses. List of AddressingGap + objects from the Motivation module, each describing a gap in existing datasets + or knowledge that this dataset fills. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:addressingGaps alias: addressing_gaps @@ -2244,6 +2433,9 @@ classes: inlined_as_list: true creators: name: creators + description: Individuals or organizations who created the dataset. List of + Creator objects describing authorship, roles, and affiliations of dataset + creators. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:creator alias: creators @@ -2256,6 +2448,9 @@ classes: inlined_as_list: true funders: name: funders + description: Funding mechanisms that supported dataset creation. List of FundingMechanism + objects describing grants, contracts, or other funding sources including + grantors and grant identifiers. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:funder alias: funders @@ -2268,10 +2463,11 @@ classes: inlined_as_list: true subsets: name: subsets + description: Subsets or splits of this dataset. List of DataSubset objects + from the Composition module, each representing a logical partition such + as training, validation, or test splits, or demographic subgroups. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - exact_mappings: - - schema:distribution - slot_uri: dcat:distribution + slot_uri: d4d:dataSubset alias: subsets owner: Dataset domain_of: @@ -2282,6 +2478,9 @@ classes: inlined_as_list: true instances: name: instances + description: Individual data instances or records in the dataset. List of + Instance objects from the Composition module describing what each data point + represents, its type, and associated label information. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:instances alias: instances @@ -2294,6 +2493,9 @@ classes: inlined_as_list: true anomalies: name: anomalies + description: Known data quality issues, errors, or irregularities in the dataset. + List of DataAnomaly objects from the Composition module, each documenting + a specific anomaly and its potential impact. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:anomalies alias: anomalies @@ -2335,6 +2537,9 @@ classes: inlined_as_list: true confidential_elements: name: confidential_elements + description: Confidential or restricted information within the dataset that + requires access controls. List of Confidentiality objects describing what + is confidential and why it cannot be released. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:confidentialElements alias: confidential_elements @@ -2347,6 +2552,9 @@ classes: inlined_as_list: true content_warnings: name: content_warnings + description: Content warnings for potentially harmful, offensive, or disturbing + material in the dataset. List of ContentWarning objects alerting users to + sensitive content categories. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:contentWarnings alias: content_warnings @@ -2359,6 +2567,9 @@ classes: inlined_as_list: true subpopulations: name: subpopulations + description: Subpopulations represented within the dataset. List of Subpopulation + objects from the Composition module describing demographic or other groups, + their representation, and any imbalances. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:subpopulations alias: subpopulations @@ -2371,6 +2582,10 @@ classes: inlined_as_list: true sensitive_elements: name: sensitive_elements + description: Sensitive data elements requiring special handling or access + controls. List of SensitiveElement objects identifying sensitive attributes + such as personal identifiers, protected health information, or legally sensitive + content. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:sensitiveElements alias: sensitive_elements @@ -2381,8 +2596,41 @@ classes: multivalued: true inlined: true inlined_as_list: true + relationships: + name: relationships + description: Explicit relationships between individual instances in the dataset. + List of Relationships objects from the Composition module describing how + instances relate (e.g., graph edges, ratings, social network links). + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:relationships + alias: relationships + owner: Dataset + domain_of: + - Dataset + range: Relationships + multivalued: true + inlined: true + inlined_as_list: true + splits: + name: splits + description: Recommended data splits for this dataset. List of Splits objects + from the Composition module describing train/validation/test partitions + and the rationale for each split strategy. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:splits + alias: splits + owner: Dataset + domain_of: + - Dataset + range: Splits + multivalued: true + inlined: true + inlined_as_list: true acquisition_methods: name: acquisition_methods + description: Methods used to acquire or obtain dataset instances. List of + InstanceAcquisition objects from the Collection module describing how data + was sourced, whether directly observed or derived. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:acquisitionMethods alias: acquisition_methods @@ -2395,6 +2643,9 @@ classes: inlined_as_list: true collection_mechanisms: name: collection_mechanisms + description: Mechanisms, instruments, or tools used for data collection. List + of CollectionMechanism objects from the Collection module describing sensors, + surveys, APIs, or other collection instruments. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:collectionMechanisms alias: collection_mechanisms @@ -2407,6 +2658,9 @@ classes: inlined_as_list: true sampling_strategies: name: sampling_strategies + description: Strategies used to select data instances from a larger population. + List of SamplingStrategy objects from the Collection module describing sampling + methodology, inclusion criteria, and limitations. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:samplingStrategies alias: sampling_strategies @@ -2420,6 +2674,9 @@ classes: inlined_as_list: true data_collectors: name: data_collectors + description: Individuals or organizations responsible for collecting the data. + List of DataCollector objects from the Collection module describing who + performed data collection and their roles. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:dataCollectors alias: data_collectors @@ -2432,6 +2689,9 @@ classes: inlined_as_list: true collection_timeframes: name: collection_timeframes + description: Time periods during which data was collected. List of CollectionTimeframe + objects from the Collection module describing collection start and end dates, + and any gaps in the collection period. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:collectionTimeframes alias: collection_timeframes @@ -2442,6 +2702,66 @@ classes: multivalued: true inlined: true inlined_as_list: true + direct_collection: + name: direct_collection + description: Whether data was collected directly from individuals or via third + parties. List of DirectCollection objects from the Collection module describing + direct vs. indirect collection methods and sources. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:directCollection + alias: direct_collection + owner: Dataset + domain_of: + - Dataset + range: DirectCollection + multivalued: true + inlined: true + inlined_as_list: true + collection_notifications: + name: collection_notifications + description: Notifications provided to individuals about data collection. + List of CollectionNotification objects from the Ethics module describing + how and when individuals were informed about the data collection. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:collectionNotifications + alias: collection_notifications + owner: Dataset + domain_of: + - Dataset + range: CollectionNotification + multivalued: true + inlined: true + inlined_as_list: true + collection_consents: + name: collection_consents + description: Consent obtained from individuals for data collection and use. + List of CollectionConsent objects from the Ethics module describing how + consent was requested, provided, and documented. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:collectionConsents + alias: collection_consents + owner: Dataset + domain_of: + - Dataset + range: CollectionConsent + multivalued: true + inlined: true + inlined_as_list: true + consent_revocations: + name: consent_revocations + description: Mechanisms for individuals to revoke previously given consent. + List of ConsentRevocation objects from the Ethics module describing how + revocation works and what happens to data after revocation. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:consentRevocations + alias: consent_revocations + owner: Dataset + domain_of: + - Dataset + range: ConsentRevocation + multivalued: true + inlined: true + inlined_as_list: true missing_data_documentation: name: missing_data_documentation description: Documentation of missing data patterns and handling strategies. @@ -2457,7 +2777,8 @@ classes: inlined_as_list: true raw_data_sources: name: raw_data_sources - description: Description of raw data sources before preprocessing. + description: List of raw data sources before preprocessing. Each RawDataSource + object describes where the original data came from and how it can be accessed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:rawDataSources alias: raw_data_sources @@ -2470,6 +2791,9 @@ classes: inlined_as_list: true ethical_reviews: name: ethical_reviews + description: Ethical reviews and institutional oversight for the dataset. + List of EthicalReview objects from the Ethics module describing IRB approvals, + ethics committee reviews, and compliance certifications. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:ethicalReviews alias: ethical_reviews @@ -2482,6 +2806,9 @@ classes: inlined_as_list: true data_protection_impacts: name: data_protection_impacts + description: Data protection impact assessments (DPIAs) conducted for the + dataset. List of DataProtectionImpact objects from the Ethics module documenting + privacy risk assessments and mitigation measures. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:dataProtectionImpacts alias: data_protection_impacts @@ -2561,6 +2888,9 @@ classes: inlined_as_list: true preprocessing_strategies: name: preprocessing_strategies + description: Preprocessing steps applied to the raw data. List of PreprocessingStrategy + objects from the Preprocessing module describing normalization, transformation, + and other preparation steps. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:preprocessingStrategies alias: preprocessing_strategies @@ -2573,6 +2903,9 @@ classes: inlined_as_list: true cleaning_strategies: name: cleaning_strategies + description: Data cleaning and quality control procedures applied to the dataset. + List of CleaningStrategy objects from the Preprocessing module describing + outlier removal, deduplication, and error correction steps. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:cleaningStrategies alias: cleaning_strategies @@ -2585,6 +2918,9 @@ classes: inlined_as_list: true labeling_strategies: name: labeling_strategies + description: Labeling or annotation methodologies applied to the data. List + of LabelingStrategy objects from the Preprocessing module describing annotation + procedures, annotator qualifications, and quality controls. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:labelingStrategies alias: labeling_strategies @@ -2597,6 +2933,9 @@ classes: inlined_as_list: true raw_sources: name: raw_sources + description: Raw, unprocessed source data before any preprocessing was applied. + List of RawData objects from the Preprocessing module describing original + data sources and their formats. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:rawSources alias: raw_sources @@ -2609,7 +2948,9 @@ classes: inlined_as_list: true imputation_protocols: name: imputation_protocols - description: Data imputation methodology and techniques. + description: Data imputation protocols applied to handle missing values. List + of ImputationProtocol objects from the Preprocessing module describing the + imputation technique, affected variables, and rationale. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:imputation_protocols alias: imputation_protocols @@ -2647,6 +2988,9 @@ classes: inlined_as_list: true existing_uses: name: existing_uses + description: Known existing uses of the dataset at the time of publication. + List of ExistingUse objects from the Uses module describing research, commercial, + or other applications of the dataset. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:existingUses alias: existing_uses @@ -2659,6 +3003,9 @@ classes: inlined_as_list: true use_repository: name: use_repository + description: Repositories or registries tracking how the dataset has been + used. List of UseRepository objects from the Uses module pointing to papers + with code, citation indices, or other use-tracking resources. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:useRepository alias: use_repository @@ -2671,6 +3018,9 @@ classes: inlined_as_list: true other_tasks: name: other_tasks + description: Additional tasks the dataset may support beyond its original + intent. List of OtherTask objects from the Uses module describing potential + applications not originally planned by the dataset creators. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:otherTasks alias: other_tasks @@ -2683,6 +3033,9 @@ classes: inlined_as_list: true future_use_impacts: name: future_use_impacts + description: Anticipated impacts of future uses, including risks and benefits. + List of FutureUseImpact objects from the Uses module describing foreseeable + consequences of using this dataset in new applications. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:futureUseImpacts alias: future_use_impacts @@ -2695,6 +3048,9 @@ classes: inlined_as_list: true discouraged_uses: name: discouraged_uses + description: Uses that are not recommended for this dataset due to limitations, + risks, or ethical concerns. List of DiscouragedUse objects from the Uses + module explaining why certain applications should be avoided. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:discouragedUses alias: discouraged_uses @@ -2735,6 +3091,9 @@ classes: inlined_as_list: true distribution_formats: name: distribution_formats + description: Formats in which the dataset is distributed or made available. + List of DistributionFormat objects from the Distribution module describing + file formats, compression, and access methods. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:distributionFormats alias: distribution_formats @@ -2747,6 +3106,9 @@ classes: inlined_as_list: true distribution_dates: name: distribution_dates + description: Dates when the dataset was or will be distributed or released. + List of DistributionDate objects from the Distribution module describing + initial release dates, version release dates, and planned future releases. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:distributionDates alias: distribution_dates @@ -2757,8 +3119,26 @@ classes: multivalued: true inlined: true inlined_as_list: true + third_party_sharing: + name: third_party_sharing + description: Third-party distribution policies for the dataset. List of ThirdPartySharing + objects from the Distribution module describing whether and how the dataset + is shared with entities outside the creating organization. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:thirdPartySharing + alias: third_party_sharing + owner: Dataset + domain_of: + - Dataset + range: ThirdPartySharing + multivalued: true + inlined: true + inlined_as_list: true license_and_use_terms: name: license_and_use_terms + description: License and usage terms governing dataset access and use. LicenseAndUseTerms + object from the Data Governance module describing the applicable license, + permitted uses, and any restrictions. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:license alias: license_and_use_terms @@ -2769,6 +3149,9 @@ classes: inlined: true ip_restrictions: name: ip_restrictions + description: Intellectual property restrictions on dataset use or redistribution. + IPRestrictions object from the Data Governance module describing copyright, + trademark, or other IP considerations. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:ipRestrictions alias: ip_restrictions @@ -2779,6 +3162,9 @@ classes: inlined: true regulatory_restrictions: name: regulatory_restrictions + description: Regulatory and export control restrictions applicable to the + dataset. ExportControlRegulatoryRestrictions object from the Data Governance + module describing compliance requirements such as ITAR, EAR, or GDPR. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:regulatoryRestrictions alias: regulatory_restrictions @@ -2790,6 +3176,9 @@ classes: inlined: true maintainers: name: maintainers + description: Individuals or organizations responsible for maintaining the + dataset. List of Maintainer objects from the Maintenance module describing + maintenance contacts, roles, and support channels. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:maintainers alias: maintainers @@ -2802,6 +3191,9 @@ classes: inlined_as_list: true errata: name: errata + description: Known errors or corrections to the dataset since publication. + List of Erratum objects from the Maintenance module describing discovered + errors, affected records, and correction procedures. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:errata alias: errata @@ -2814,6 +3206,9 @@ classes: inlined_as_list: true updates: name: updates + description: Plans for future updates or versioning of the dataset. UpdatePlan + object from the Maintenance module describing update frequency, versioning + policy, and planned enhancements. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:updates alias: updates @@ -2824,6 +3219,9 @@ classes: inlined: true retention_limit: name: retention_limit + description: Data retention policies and limits for the dataset. RetentionLimits + object from the Maintenance module describing how long the dataset will + be available and any deletion schedules. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:retentionLimit alias: retention_limit @@ -2834,8 +3232,11 @@ classes: inlined: true version_access: name: version_access + description: Information about access to different versions of the dataset. + VersionAccess object from the Maintenance module describing where older + versions can be found and how version history is maintained. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcat:accessURL + slot_uri: d4d:versionAccess alias: version_access owner: Dataset domain_of: @@ -2844,6 +3245,9 @@ classes: inlined: true extension_mechanism: name: extension_mechanism + description: Mechanisms for extending or contributing to the dataset. ExtensionMechanism + object from the Maintenance module describing how others can propose additions, + corrections, or expansions. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:extensionMechanism alias: extension_mechanism @@ -2870,6 +3274,10 @@ classes: inlined_as_list: true is_deidentified: name: is_deidentified + description: De-identification status and procedures applied to the dataset. + Deidentification object describing whether the dataset contains personal + data, what de-identification methods were applied, and any residual re-identification + risks. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:isDeidentified alias: is_deidentified @@ -2880,8 +3288,11 @@ classes: inlined: true is_tabular: name: is_tabular + description: Whether the dataset is in tabular format (rows and columns). + True if the data is structured as a table (e.g., CSV, TSV, relational database); + false for unstructured formats such as images or free text. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: schema:encodingFormat + slot_uri: d4d:isTabular alias: is_tabular owner: Dataset domain_of: @@ -2968,7 +3379,7 @@ classes: inlined_as_list: true compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat alias: compression @@ -2983,6 +3394,8 @@ classes: range: CompressionEnum conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -2996,8 +3409,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: Dataset domain_of: @@ -3009,8 +3426,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: Dataset domain_of: @@ -3022,6 +3442,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -3035,6 +3457,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -3048,9 +3471,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: Dataset domain_of: @@ -3081,6 +3509,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -3094,6 +3523,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -3108,7 +3538,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -3124,6 +3554,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -3137,6 +3569,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -3151,6 +3585,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -3164,6 +3600,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -3177,6 +3615,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -3190,8 +3630,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: Dataset domain_of: @@ -3203,7 +3644,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -3217,8 +3658,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: Dataset domain_of: @@ -3231,6 +3673,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -3247,6 +3691,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -3583,6 +4029,8 @@ classes: name: total_file_count description: Total number of files across all file collections in this dataset. Can be aggregated from file_collections[].file_count. + examples: + - value: '156' from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:totalFileCount alias: total_file_count @@ -3594,6 +4042,9 @@ classes: name: total_size_bytes description: Total size of all files in bytes across all file collections. Can be aggregated from file_collections[].total_bytes. + examples: + - value: '10737418240' + description: 10 GiB (10 × 1024³ bytes) from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:byteSize alias: total_size_bytes @@ -3603,6 +4054,9 @@ classes: range: integer purposes: name: purposes + description: Purposes for which the dataset was created. List of Purpose objects + from the Motivation module, each describing a specific creation goal or + intended application. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:purposes alias: purposes @@ -3614,6 +4068,9 @@ classes: inlined_as_list: true tasks: name: tasks + description: Tasks the dataset is intended to support. List of Task objects + from the Motivation module describing specific machine learning, research, + or analytical tasks. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:tasks alias: tasks @@ -3625,6 +4082,9 @@ classes: inlined_as_list: true addressing_gaps: name: addressing_gaps + description: Research or practical gaps this dataset addresses. List of AddressingGap + objects from the Motivation module, each describing a gap in existing datasets + or knowledge that this dataset fills. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:addressingGaps alias: addressing_gaps @@ -3636,6 +4096,9 @@ classes: inlined_as_list: true creators: name: creators + description: Individuals or organizations who created the dataset. List of + Creator objects describing authorship, roles, and affiliations of dataset + creators. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:creator alias: creators @@ -3647,6 +4110,9 @@ classes: inlined_as_list: true funders: name: funders + description: Funding mechanisms that supported dataset creation. List of FundingMechanism + objects describing grants, contracts, or other funding sources including + grantors and grant identifiers. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:funder alias: funders @@ -3658,10 +4124,11 @@ classes: inlined_as_list: true subsets: name: subsets + description: Subsets or splits of this dataset. List of DataSubset objects + from the Composition module, each representing a logical partition such + as training, validation, or test splits, or demographic subgroups. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - exact_mappings: - - schema:distribution - slot_uri: dcat:distribution + slot_uri: d4d:dataSubset alias: subsets owner: DataSubset domain_of: @@ -3671,6 +4138,9 @@ classes: inlined_as_list: true instances: name: instances + description: Individual data instances or records in the dataset. List of + Instance objects from the Composition module describing what each data point + represents, its type, and associated label information. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:instances alias: instances @@ -3682,6 +4152,9 @@ classes: inlined_as_list: true anomalies: name: anomalies + description: Known data quality issues, errors, or irregularities in the dataset. + List of DataAnomaly objects from the Composition module, each documenting + a specific anomaly and its potential impact. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:anomalies alias: anomalies @@ -3720,6 +4193,9 @@ classes: inlined_as_list: true confidential_elements: name: confidential_elements + description: Confidential or restricted information within the dataset that + requires access controls. List of Confidentiality objects describing what + is confidential and why it cannot be released. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:confidentialElements alias: confidential_elements @@ -3731,6 +4207,9 @@ classes: inlined_as_list: true content_warnings: name: content_warnings + description: Content warnings for potentially harmful, offensive, or disturbing + material in the dataset. List of ContentWarning objects alerting users to + sensitive content categories. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:contentWarnings alias: content_warnings @@ -3742,6 +4221,9 @@ classes: inlined_as_list: true subpopulations: name: subpopulations + description: Subpopulations represented within the dataset. List of Subpopulation + objects from the Composition module describing demographic or other groups, + their representation, and any imbalances. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:subpopulations alias: subpopulations @@ -3753,6 +4235,10 @@ classes: inlined_as_list: true sensitive_elements: name: sensitive_elements + description: Sensitive data elements requiring special handling or access + controls. List of SensitiveElement objects identifying sensitive attributes + such as personal identifiers, protected health information, or legally sensitive + content. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:sensitiveElements alias: sensitive_elements @@ -3762,8 +4248,39 @@ classes: range: SensitiveElement multivalued: true inlined_as_list: true + relationships: + name: relationships + description: Explicit relationships between individual instances in the dataset. + List of Relationships objects from the Composition module describing how + instances relate (e.g., graph edges, ratings, social network links). + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:relationships + alias: relationships + owner: DataSubset + domain_of: + - Dataset + range: Relationships + multivalued: true + inlined_as_list: true + splits: + name: splits + description: Recommended data splits for this dataset. List of Splits objects + from the Composition module describing train/validation/test partitions + and the rationale for each split strategy. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:splits + alias: splits + owner: DataSubset + domain_of: + - Dataset + range: Splits + multivalued: true + inlined_as_list: true acquisition_methods: name: acquisition_methods + description: Methods used to acquire or obtain dataset instances. List of + InstanceAcquisition objects from the Collection module describing how data + was sourced, whether directly observed or derived. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:acquisitionMethods alias: acquisition_methods @@ -3775,6 +4292,9 @@ classes: inlined_as_list: true collection_mechanisms: name: collection_mechanisms + description: Mechanisms, instruments, or tools used for data collection. List + of CollectionMechanism objects from the Collection module describing sensors, + surveys, APIs, or other collection instruments. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:collectionMechanisms alias: collection_mechanisms @@ -3786,6 +4306,9 @@ classes: inlined_as_list: true sampling_strategies: name: sampling_strategies + description: Strategies used to select data instances from a larger population. + List of SamplingStrategy objects from the Collection module describing sampling + methodology, inclusion criteria, and limitations. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:samplingStrategies alias: sampling_strategies @@ -3798,6 +4321,9 @@ classes: inlined_as_list: true data_collectors: name: data_collectors + description: Individuals or organizations responsible for collecting the data. + List of DataCollector objects from the Collection module describing who + performed data collection and their roles. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:dataCollectors alias: data_collectors @@ -3809,6 +4335,9 @@ classes: inlined_as_list: true collection_timeframes: name: collection_timeframes + description: Time periods during which data was collected. List of CollectionTimeframe + objects from the Collection module describing collection start and end dates, + and any gaps in the collection period. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:collectionTimeframes alias: collection_timeframes @@ -3818,6 +4347,62 @@ classes: range: CollectionTimeframe multivalued: true inlined_as_list: true + direct_collection: + name: direct_collection + description: Whether data was collected directly from individuals or via third + parties. List of DirectCollection objects from the Collection module describing + direct vs. indirect collection methods and sources. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:directCollection + alias: direct_collection + owner: DataSubset + domain_of: + - Dataset + range: DirectCollection + multivalued: true + inlined_as_list: true + collection_notifications: + name: collection_notifications + description: Notifications provided to individuals about data collection. + List of CollectionNotification objects from the Ethics module describing + how and when individuals were informed about the data collection. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:collectionNotifications + alias: collection_notifications + owner: DataSubset + domain_of: + - Dataset + range: CollectionNotification + multivalued: true + inlined_as_list: true + collection_consents: + name: collection_consents + description: Consent obtained from individuals for data collection and use. + List of CollectionConsent objects from the Ethics module describing how + consent was requested, provided, and documented. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:collectionConsents + alias: collection_consents + owner: DataSubset + domain_of: + - Dataset + range: CollectionConsent + multivalued: true + inlined_as_list: true + consent_revocations: + name: consent_revocations + description: Mechanisms for individuals to revoke previously given consent. + List of ConsentRevocation objects from the Ethics module describing how + revocation works and what happens to data after revocation. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:consentRevocations + alias: consent_revocations + owner: DataSubset + domain_of: + - Dataset + range: ConsentRevocation + multivalued: true + inlined_as_list: true missing_data_documentation: name: missing_data_documentation description: Documentation of missing data patterns and handling strategies. @@ -3832,7 +4417,8 @@ classes: inlined_as_list: true raw_data_sources: name: raw_data_sources - description: Description of raw data sources before preprocessing. + description: List of raw data sources before preprocessing. Each RawDataSource + object describes where the original data came from and how it can be accessed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:rawDataSources alias: raw_data_sources @@ -3844,6 +4430,9 @@ classes: inlined_as_list: true ethical_reviews: name: ethical_reviews + description: Ethical reviews and institutional oversight for the dataset. + List of EthicalReview objects from the Ethics module describing IRB approvals, + ethics committee reviews, and compliance certifications. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:ethicalReviews alias: ethical_reviews @@ -3855,6 +4444,9 @@ classes: inlined_as_list: true data_protection_impacts: name: data_protection_impacts + description: Data protection impact assessments (DPIAs) conducted for the + dataset. List of DataProtectionImpact objects from the Ethics module documenting + privacy risk assessments and mitigation measures. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:dataProtectionImpacts alias: data_protection_impacts @@ -3930,6 +4522,9 @@ classes: inlined_as_list: true preprocessing_strategies: name: preprocessing_strategies + description: Preprocessing steps applied to the raw data. List of PreprocessingStrategy + objects from the Preprocessing module describing normalization, transformation, + and other preparation steps. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:preprocessingStrategies alias: preprocessing_strategies @@ -3941,6 +4536,9 @@ classes: inlined_as_list: true cleaning_strategies: name: cleaning_strategies + description: Data cleaning and quality control procedures applied to the dataset. + List of CleaningStrategy objects from the Preprocessing module describing + outlier removal, deduplication, and error correction steps. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:cleaningStrategies alias: cleaning_strategies @@ -3952,6 +4550,9 @@ classes: inlined_as_list: true labeling_strategies: name: labeling_strategies + description: Labeling or annotation methodologies applied to the data. List + of LabelingStrategy objects from the Preprocessing module describing annotation + procedures, annotator qualifications, and quality controls. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:labelingStrategies alias: labeling_strategies @@ -3963,6 +4564,9 @@ classes: inlined_as_list: true raw_sources: name: raw_sources + description: Raw, unprocessed source data before any preprocessing was applied. + List of RawData objects from the Preprocessing module describing original + data sources and their formats. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:rawSources alias: raw_sources @@ -3974,7 +4578,9 @@ classes: inlined_as_list: true imputation_protocols: name: imputation_protocols - description: Data imputation methodology and techniques. + description: Data imputation protocols applied to handle missing values. List + of ImputationProtocol objects from the Preprocessing module describing the + imputation technique, affected variables, and rationale. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:imputation_protocols alias: imputation_protocols @@ -4009,6 +4615,9 @@ classes: inlined_as_list: true existing_uses: name: existing_uses + description: Known existing uses of the dataset at the time of publication. + List of ExistingUse objects from the Uses module describing research, commercial, + or other applications of the dataset. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:existingUses alias: existing_uses @@ -4020,6 +4629,9 @@ classes: inlined_as_list: true use_repository: name: use_repository + description: Repositories or registries tracking how the dataset has been + used. List of UseRepository objects from the Uses module pointing to papers + with code, citation indices, or other use-tracking resources. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:useRepository alias: use_repository @@ -4031,6 +4643,9 @@ classes: inlined_as_list: true other_tasks: name: other_tasks + description: Additional tasks the dataset may support beyond its original + intent. List of OtherTask objects from the Uses module describing potential + applications not originally planned by the dataset creators. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:otherTasks alias: other_tasks @@ -4042,6 +4657,9 @@ classes: inlined_as_list: true future_use_impacts: name: future_use_impacts + description: Anticipated impacts of future uses, including risks and benefits. + List of FutureUseImpact objects from the Uses module describing foreseeable + consequences of using this dataset in new applications. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:futureUseImpacts alias: future_use_impacts @@ -4053,6 +4671,9 @@ classes: inlined_as_list: true discouraged_uses: name: discouraged_uses + description: Uses that are not recommended for this dataset due to limitations, + risks, or ethical concerns. List of DiscouragedUse objects from the Uses + module explaining why certain applications should be avoided. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:discouragedUses alias: discouraged_uses @@ -4090,6 +4711,9 @@ classes: inlined_as_list: true distribution_formats: name: distribution_formats + description: Formats in which the dataset is distributed or made available. + List of DistributionFormat objects from the Distribution module describing + file formats, compression, and access methods. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:distributionFormats alias: distribution_formats @@ -4101,6 +4725,9 @@ classes: inlined_as_list: true distribution_dates: name: distribution_dates + description: Dates when the dataset was or will be distributed or released. + List of DistributionDate objects from the Distribution module describing + initial release dates, version release dates, and planned future releases. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:distributionDates alias: distribution_dates @@ -4110,8 +4737,25 @@ classes: range: DistributionDate multivalued: true inlined_as_list: true + third_party_sharing: + name: third_party_sharing + description: Third-party distribution policies for the dataset. List of ThirdPartySharing + objects from the Distribution module describing whether and how the dataset + is shared with entities outside the creating organization. + from_schema: https://w3id.org/bridge2ai/data-sheets-schema + slot_uri: d4d:thirdPartySharing + alias: third_party_sharing + owner: DataSubset + domain_of: + - Dataset + range: ThirdPartySharing + multivalued: true + inlined_as_list: true license_and_use_terms: name: license_and_use_terms + description: License and usage terms governing dataset access and use. LicenseAndUseTerms + object from the Data Governance module describing the applicable license, + permitted uses, and any restrictions. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:license alias: license_and_use_terms @@ -4122,6 +4766,9 @@ classes: inlined: true ip_restrictions: name: ip_restrictions + description: Intellectual property restrictions on dataset use or redistribution. + IPRestrictions object from the Data Governance module describing copyright, + trademark, or other IP considerations. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:ipRestrictions alias: ip_restrictions @@ -4132,6 +4779,9 @@ classes: inlined: true regulatory_restrictions: name: regulatory_restrictions + description: Regulatory and export control restrictions applicable to the + dataset. ExportControlRegulatoryRestrictions object from the Data Governance + module describing compliance requirements such as ITAR, EAR, or GDPR. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:regulatoryRestrictions alias: regulatory_restrictions @@ -4143,6 +4793,9 @@ classes: inlined: true maintainers: name: maintainers + description: Individuals or organizations responsible for maintaining the + dataset. List of Maintainer objects from the Maintenance module describing + maintenance contacts, roles, and support channels. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:maintainers alias: maintainers @@ -4154,6 +4807,9 @@ classes: inlined_as_list: true errata: name: errata + description: Known errors or corrections to the dataset since publication. + List of Erratum objects from the Maintenance module describing discovered + errors, affected records, and correction procedures. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:errata alias: errata @@ -4165,6 +4821,9 @@ classes: inlined_as_list: true updates: name: updates + description: Plans for future updates or versioning of the dataset. UpdatePlan + object from the Maintenance module describing update frequency, versioning + policy, and planned enhancements. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:updates alias: updates @@ -4175,6 +4834,9 @@ classes: inlined: true retention_limit: name: retention_limit + description: Data retention policies and limits for the dataset. RetentionLimits + object from the Maintenance module describing how long the dataset will + be available and any deletion schedules. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:retentionLimit alias: retention_limit @@ -4185,8 +4847,11 @@ classes: inlined: true version_access: name: version_access + description: Information about access to different versions of the dataset. + VersionAccess object from the Maintenance module describing where older + versions can be found and how version history is maintained. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcat:accessURL + slot_uri: d4d:versionAccess alias: version_access owner: DataSubset domain_of: @@ -4195,6 +4860,9 @@ classes: inlined: true extension_mechanism: name: extension_mechanism + description: Mechanisms for extending or contributing to the dataset. ExtensionMechanism + object from the Maintenance module describing how others can propose additions, + corrections, or expansions. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:extensionMechanism alias: extension_mechanism @@ -4220,6 +4888,10 @@ classes: inlined_as_list: true is_deidentified: name: is_deidentified + description: De-identification status and procedures applied to the dataset. + Deidentification object describing whether the dataset contains personal + data, what de-identification methods were applied, and any residual re-identification + risks. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: d4d:isDeidentified alias: is_deidentified @@ -4230,8 +4902,11 @@ classes: inlined: true is_tabular: name: is_tabular + description: Whether the dataset is in tabular format (rows and columns). + True if the data is structured as a table (e.g., CSV, TSV, relational database); + false for unstructured formats such as images or free text. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: schema:encodingFormat + slot_uri: d4d:isTabular alias: is_tabular owner: DataSubset domain_of: @@ -4281,7 +4956,7 @@ classes: inlined_as_list: true compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat alias: compression @@ -4296,6 +4971,8 @@ classes: range: CompressionEnum conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -4309,8 +4986,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: DataSubset domain_of: @@ -4322,8 +5003,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: DataSubset domain_of: @@ -4335,6 +5019,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -4348,6 +5034,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -4361,9 +5048,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: DataSubset domain_of: @@ -4394,6 +5086,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -4407,6 +5100,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -4421,7 +5115,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -4437,6 +5131,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -4450,6 +5146,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -4464,6 +5162,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -4477,6 +5177,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -4490,6 +5192,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -4503,8 +5207,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: DataSubset domain_of: @@ -4516,7 +5221,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -4530,8 +5235,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: DataSubset domain_of: @@ -4544,6 +5250,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -4560,6 +5268,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -4822,6 +5532,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -5086,6 +5798,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -5349,6 +6063,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -5681,6 +6397,7 @@ classes: attributes: version: name: version + description: The version identifier of the software (e.g., "1.0.0", "2.3.1-beta"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:softwareVersion alias: version @@ -5694,6 +6411,8 @@ classes: range: string license: name: license + description: The license under which the software is distributed (e.g., "MIT", + "Apache-2.0", "GPL-3.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:license alias: license @@ -5707,16 +6426,20 @@ classes: range: string url: name: url + description: URL where the software can be found (e.g., homepage, repository, + or documentation). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:url alias: url owner: Software domain_of: - Software - range: string + range: uri id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -6013,9 +6736,9 @@ classes: identifier for researchers. Format: 0000-0000-0000-0000 (16 digits in groups of 4). Use this for stable cross-dataset identification.' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base - exact_mappings: + broad_mappings: - schema:identifier - slot_uri: schema:identifier + slot_uri: d4d:orcidIdentifier alias: orcid owner: Person domain_of: @@ -6025,6 +6748,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -6311,7 +7036,7 @@ classes: attributes: compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat alias: compression @@ -6326,6 +7051,8 @@ classes: range: CompressionEnum conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -6339,8 +7066,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: Information domain_of: @@ -6352,8 +7083,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: Information domain_of: @@ -6365,6 +7099,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -6378,6 +7114,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -6391,9 +7128,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: Information domain_of: @@ -6424,6 +7166,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -6437,6 +7180,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -6451,7 +7195,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -6467,6 +7211,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -6480,6 +7226,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -6494,6 +7242,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -6507,6 +7257,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -6520,6 +7272,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -6533,8 +7287,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: Information domain_of: @@ -6546,7 +7301,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -6560,8 +7315,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: Information domain_of: @@ -6574,6 +7330,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -6590,6 +7348,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -6846,11 +7606,13 @@ classes: range: string FormatDialect: name: FormatDialect - description: Additional format information for a file + description: Additional format information for a file. from_schema: https://w3id.org/bridge2ai/data-sheets-schema attributes: comment_prefix: name: comment_prefix + description: Character(s) used to indicate comment lines (e.g., "#" for CSV + comments). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base alias: comment_prefix owner: FormatDialect @@ -6859,6 +7621,7 @@ classes: range: string delimiter: name: delimiter + description: Field delimiter character (e.g., "," for CSV, "\t" for TSV). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base alias: delimiter owner: FormatDialect @@ -6867,6 +7630,9 @@ classes: range: string double_quote: name: double_quote + description: 'Whether quotes within quoted fields are escaped by doubling + them. Expected values: "true" or "false" (as strings per CSV dialect specification). + Follows the W3C CSV-on-the-Web dialect specification.' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base alias: double_quote owner: FormatDialect @@ -6875,6 +7641,9 @@ classes: range: string header: name: header + description: 'Whether the first row of the file contains column headers. Expected + values: "true" or "false" (as strings per CSV dialect specification). Follows + the W3C CSV-on-the-Web dialect specification.' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base alias: header owner: FormatDialect @@ -6883,6 +7652,7 @@ classes: range: string quote_char: name: quote_char + description: Character used for quoting fields (e.g., '"' for CSV). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base alias: quote_char owner: FormatDialect @@ -6900,7 +7670,9 @@ classes: description: Short explanation describing the primary purpose of creating the dataset. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:questionResponse alias: response owner: Purpose domain_of: @@ -6911,6 +7683,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -7243,7 +8017,9 @@ classes: description: Short explanation describing the specific task or tasks for which this dataset was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:questionResponse alias: response owner: Task domain_of: @@ -7254,6 +8030,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -7588,7 +8366,9 @@ classes: description: Short explanation of the knowledge or resource gap that this dataset was intended to address. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:questionResponse alias: response owner: AddressingGap domain_of: @@ -7599,6 +8379,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -7936,9 +8718,10 @@ classes: description: A key individual (Principal Investigator) responsible for or overseeing dataset creation. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - exact_mappings: + broad_mappings: + - dcterms:creator - schema:creator - slot_uri: dcterms:creator + slot_uri: d4d:principalInvestigator alias: principal_investigator owner: Creator domain_of: @@ -7948,7 +8731,9 @@ classes: name: affiliations description: Organizations with which the creator or team is affiliated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - slot_uri: schema:affiliation + broad_mappings: + - schema:affiliation + slot_uri: d4d:teamAffiliation alias: affiliations owner: Creator domain_of: @@ -7958,11 +8743,9 @@ classes: inlined_as_list: true credit_roles: name: credit_roles - description: 'Contributor roles using the CRediT (Contributor Roles Taxonomy) - for the principal investigator or creator team. Specifies the specific contributions - made to this dataset (e.g., Conceptualization, Data Curation, Methodology). - Note: roles are specified here rather than on Person directly, since the - same person may have different roles across different datasets.' + description: One or more contributor roles using the CRediT (Contributor Roles + Taxonomy) for the principal investigator or creator team (e.g., Conceptualization, + Data Curation, Methodology). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation slot_uri: d4d:creditRoles alias: credit_roles @@ -7974,6 +8757,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -8332,6 +9117,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -8666,6 +9453,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -8933,7 +9722,9 @@ classes: name: grant_number description: The alphanumeric identifier for the grant. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/motivation - slot_uri: schema:identifier + broad_mappings: + - schema:identifier + slot_uri: d4d:grantIdentifier alias: grant_number owner: Grant domain_of: @@ -8942,6 +9733,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -9221,11 +10014,15 @@ classes: range: uriorcurie instance_type: name: instance_type - description: 'Multiple types of instances? (e.g., movies, users, and ratings). + description: 'The type or types of instances in the dataset (e.g., "movie", + "user", "rating", "clinical record"). Use when the dataset contains multiple + instance types with different structures. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:type + broad_mappings: + - dcterms:type + slot_uri: d4d:instanceType alias: instance_type owner: Instance domain_of: @@ -9239,7 +10036,7 @@ classes: from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition values_from: - B2AI_SUBSTRATE - slot_uri: dcterms:format + slot_uri: dcterms:type alias: data_substrate owner: Instance domain_of: @@ -9250,6 +10047,9 @@ classes: description: 'How many instances are there in total (of each type, if appropriate)? ' + examples: + - value: '42000' + description: 42,000 patient records from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition slot_uri: schema:numberOfItems alias: counts @@ -9275,7 +10075,9 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: schema:description + broad_mappings: + - schema:description + slot_uri: d4d:labelPattern alias: label_description owner: Instance domain_of: @@ -9313,6 +10115,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -9655,7 +10459,6 @@ classes: domain_of: - SamplingStrategy range: boolean - multivalued: true is_random: name: is_random description: Indicates whether the sample is random. @@ -9666,11 +10469,10 @@ classes: domain_of: - SamplingStrategy range: boolean - multivalued: true source_data: name: source_data - description: 'Description of the larger set from which the sample was drawn, - if any. + description: 'One or more descriptions of the larger sets from which the sample + was drawn, if applicable. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -9694,14 +10496,16 @@ classes: domain_of: - SamplingStrategy range: boolean - multivalued: true representative_verification: name: representative_verification - description: 'Explanation of how representativeness was validated or verified. + description: 'One or more explanations of how representativeness was validated + or verified (e.g., statistical tests, domain expert review). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: schema:description + broad_mappings: + - schema:description + slot_uri: d4d:verificationDescription alias: representative_verification owner: SamplingStrategy domain_of: @@ -9710,7 +10514,8 @@ classes: multivalued: true why_not_representative: name: why_not_representative - description: 'Explanation of why the sample is not representative, if applicable. + description: 'One or more explanations of why the sample is not representative + of the larger set, if applicable. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -9723,8 +10528,8 @@ classes: multivalued: true strategies: name: strategies - description: 'Description of the sampling strategy (deterministic, probabilistic, - etc.). + description: 'One or more sampling strategies used (e.g., deterministic, simple + random, stratified, cluster, systematic). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -9738,6 +10543,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -10075,7 +10882,9 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:missingDataDescription alias: missing owner: MissingInfo domain_of: @@ -10088,7 +10897,9 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:missingDataCause alias: why_missing owner: MissingInfo domain_of: @@ -10098,6 +10909,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -10431,8 +11244,9 @@ classes: attributes: relationship_details: name: relationship_details - description: 'Details on relationships between instances (e.g., graph edges, - ratings). + description: 'Free-text description of how relationships between instances + are represented (e.g., graph edges, ratings matrices, foreign keys), including + relationship types and any associated metadata. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -10446,6 +11260,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -10779,7 +11595,9 @@ classes: attributes: split_details: name: split_details - description: 'Details on recommended data splits and their rationale. + description: 'Free-text description of the recommended data splits (e.g., + 80/10/10 train/ validation/test), how they are defined, and the rationale + for the split strategy. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -10793,6 +11611,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -11125,11 +11945,14 @@ classes: attributes: anomaly_details: name: anomaly_details - description: 'Details on errors, noise sources, or redundancies in the dataset. + description: 'Free-text description of errors, noise sources, or redundancies + in the dataset, including their known causes and estimated prevalence. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:anomalyDetails alias: anomaly_details owner: DataAnomaly domain_of: @@ -11139,6 +11962,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -11513,8 +12338,9 @@ classes: range: string affected_subsets: name: affected_subsets - description: 'Specific subsets or features of the dataset affected by this - bias. + description: 'One or more specific subsets or features of the dataset affected + by this bias (e.g., "female participants", "non-English text", "images taken + at night"). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -11528,6 +12354,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -11914,6 +12742,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -12260,7 +13090,9 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:availabilityGuarantee alias: future_guarantees owner: ExternalResource domain_of: @@ -12269,26 +13101,30 @@ classes: multivalued: true archival: name: archival - description: 'Indication whether official archival versions of external resources - are included. + description: 'Indicates whether official archival versions of external resources + are included in the dataset. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: schema:archivedAt + broad_mappings: + - schema:archivedAt + slot_uri: d4d:hasArchivalVersion alias: archival owner: ExternalResource domain_of: - ExternalResource range: boolean - multivalued: true restrictions: name: restrictions - description: 'Description of any restrictions or fees associated with external - resources. + description: 'One or more descriptions of restrictions or fees associated + with accessing these external resources (e.g., paywalls, registration requirements, + API limits). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:accessRights + broad_mappings: + - dcterms:accessRights + slot_uri: d4d:externalResourceRestrictions alias: restrictions owner: ExternalResource domain_of: @@ -12313,6 +13149,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -12656,7 +13494,9 @@ classes: range: boolean confidentiality_details: name: confidentiality_details - description: 'Details on confidential data elements and handling procedures. + description: 'Free-text description of which data elements are confidential, + the basis for confidentiality (e.g., legal privilege, patient data), and + how they are handled or restricted. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition @@ -12670,6 +13510,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -13013,6 +13855,9 @@ classes: range: boolean warnings: name: warnings + description: One or more specific content warnings describing potentially + offensive, insulting, threatening, or anxiety-provoking content present + in the dataset (e.g., violence, profanity, explicit imagery, hate speech). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition slot_uri: dcterms:description alias: warnings @@ -13024,6 +13869,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -13367,8 +14214,13 @@ classes: range: boolean identification: name: identification + description: How subpopulations are identified and defined (e.g., by age groups, + gender, geographic region, disease status, or other demographic/clinical + characteristics). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:subpopulationIdentification alias: identification owner: Subpopulation domain_of: @@ -13377,8 +14229,12 @@ classes: multivalued: true distribution: name: distribution + description: The distribution of instances across identified subpopulations, + including counts, percentages, or proportions for each subgroup. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:subpopulationDistribution alias: distribution owner: Subpopulation domain_of: @@ -13388,6 +14244,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -13740,9 +14598,10 @@ classes: range: string identifiers_removed: name: identifiers_removed - description: List of identifier types removed during de-identification. + description: List of identifier types removed during de-identification (e.g., + 'name', 'date of birth', 'SSN', 'email address', 'geographic subdivision'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: schema:identifier + slot_uri: d4d:removedIdentifierTypes alias: identifiers_removed owner: Deidentification domain_of: @@ -13765,6 +14624,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -14124,6 +14985,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -14461,7 +15324,7 @@ classes: description: The dataset that this relationship points to. Can be specified by identifier, URL, or Dataset object. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/composition - slot_uri: schema:identifier + slot_uri: dcterms:relation alias: target_dataset owner: DatasetRelationship domain_of: @@ -14523,6 +15386,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -14773,7 +15638,8 @@ classes: attributes: was_directly_observed: name: was_directly_observed - description: Whether the data was directly observed + description: True if the data was directly observed by a researcher or instrument; + false if it was obtained through other means (e.g., reported, inferred). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: d4d:wasDirectlyObserved alias: was_directly_observed @@ -14783,7 +15649,8 @@ classes: range: boolean was_reported_by_subjects: name: was_reported_by_subjects - description: Whether the data was reported directly by the subjects themselves + description: True if the data was self-reported directly by the subjects themselves + (e.g., survey responses, questionnaires); false otherwise. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: d4d:wasReportedBySubjects alias: was_reported_by_subjects @@ -14793,7 +15660,8 @@ classes: range: boolean was_inferred_derived: name: was_inferred_derived - description: Whether the data was inferred or derived from other data + description: True if the data was computationally inferred or derived from + other data (e.g., model outputs, imputed values); false otherwise. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: d4d:wasInferred alias: was_inferred_derived @@ -14803,7 +15671,8 @@ classes: range: boolean was_validated_verified: name: was_validated_verified - description: Whether the data was validated or verified in any way + description: True if the data underwent a validation or verification process + (e.g., expert review, cross-checking with ground truth); false otherwise. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: d4d:wasValidated alias: was_validated_verified @@ -14813,7 +15682,8 @@ classes: range: boolean acquisition_details: name: acquisition_details - description: 'Details on how data was acquired for each instance. + description: 'Free-text description of how data was acquired for each instance, + including instruments, protocols, and any manual steps involved. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -14827,6 +15697,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -15163,7 +16035,9 @@ classes: attributes: mechanism_details: name: mechanism_details - description: 'Details on mechanisms or procedures used to collect the data. + description: 'Free-text description of the specific mechanisms or procedures + used to collect the data (e.g., hardware model, software API, manual curation + process), including how those mechanisms were validated. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -15177,6 +16051,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -15510,7 +16386,7 @@ classes: attributes: role: name: role - description: Role of the data collector (e.g., researcher, crowdworker) + description: Role of the data collector (e.g., researcher, crowdworker). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: schema:roleName alias: role @@ -15521,7 +16397,9 @@ classes: range: string collector_details: name: collector_details - description: 'Details on who collected the data and their compensation. + description: 'Free-text description of who was involved in data collection + (e.g., students, crowdworkers, contractors), their training or qualifications, + and how they were compensated. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -15535,6 +16413,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -15870,7 +16750,7 @@ classes: attributes: start_date: name: start_date - description: Start date of data collection + description: Start date of data collection. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: schema:startDate alias: start_date @@ -15880,7 +16760,7 @@ classes: range: date end_date: name: end_date - description: End date of data collection + description: End date of data collection. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: schema:endDate alias: end_date @@ -15890,8 +16770,9 @@ classes: range: date timeframe_details: name: timeframe_details - description: 'Details on the collection timeframe and relationship to data - creation dates. + description: 'Free-text description of the data collection period and whether + this timeframe matches the creation timeframe of the underlying data (e.g., + historical records, prospective collection). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -15905,6 +16786,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -16238,7 +17121,7 @@ classes: attributes: is_direct: name: is_direct - description: Whether collection was direct from individuals + description: Whether collection was direct from individuals. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection slot_uri: d4d:isDirect alias: is_direct @@ -16248,7 +17131,9 @@ classes: range: boolean collection_details: name: collection_details - description: 'Details on direct vs. indirect collection methods and sources. + description: 'Free-text description of whether data was collected directly + from individuals or obtained via third parties or other indirect sources, + and what those sources are. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -16262,6 +17147,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -16625,8 +17512,8 @@ classes: multivalued: true handling_strategy: name: handling_strategy - description: 'Strategy used to handle missing data (e.g., deletion, imputation, - flagging, multiple imputation). + description: 'The primary strategy used to handle missing data (e.g., listwise + deletion, mean imputation, multiple imputation, flagging with sentinel values). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -16639,6 +17526,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -16989,12 +17878,14 @@ classes: required: true source_type: name: source_type - description: 'Type of raw source (sensor, database, user input, web scraping, - etc.). + description: 'One or more types of raw source (e.g., sensor, database, user + input, web scraping). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection - slot_uri: dcterms:type + broad_mappings: + - dcterms:type + slot_uri: d4d:sourceType alias: source_type owner: RawDataSource domain_of: @@ -17015,7 +17906,8 @@ classes: range: string raw_data_format: name: raw_data_format - description: 'Format of the raw data before any preprocessing. + description: 'One or more formats of the raw data before any preprocessing + (e.g., CSV, DICOM, JSON). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/collection @@ -17029,6 +17921,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -17364,7 +18258,9 @@ classes: attributes: preprocessing_details: name: preprocessing_details - description: 'Details on preprocessing steps applied to the data. + description: 'Free-text description of preprocessing steps applied to the + data, including tools used, parameters, order of operations, and rationale + for each step. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling @@ -17378,6 +18274,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -17713,7 +18611,9 @@ classes: attributes: cleaning_details: name: cleaning_details - description: 'Details on data cleaning procedures applied. + description: 'Free-text description of data cleaning procedures applied, including + criteria for removing or correcting instances, tools used, and how removed + instances are accounted for. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling @@ -17727,6 +18627,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -18060,17 +18962,16 @@ classes: attributes: data_annotation_platform: name: data_annotation_platform - description: Platform or tool used for annotation (e.g., Label Studio, Prodigy, - Amazon Mechanical Turk, custom annotation tool). + description: One or more platforms or tools used for annotation (e.g., Label + Studio, Prodigy, Amazon Mechanical Turk, custom annotation tool). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling - exact_mappings: - - rai:dataAnnotationPlatform - slot_uri: schema:instrument + slot_uri: rai:dataAnnotationPlatform alias: data_annotation_platform owner: LabelingStrategy domain_of: - LabelingStrategy range: string + multivalued: true data_annotation_protocol: name: data_annotation_protocol description: Annotation methodology, tasks, and protocols followed during @@ -18090,6 +18991,9 @@ classes: name: annotations_per_item description: Number of annotations collected per data item. Multiple annotations per item enable calculation of inter-annotator agreement. + examples: + - value: '3' + description: Three independent annotators per item from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling exact_mappings: - rai:annotationsPerItem @@ -18113,8 +19017,9 @@ classes: range: string annotator_demographics: name: annotator_demographics - description: Demographic information about annotators, if available and relevant - (e.g., geographic location, language background, expertise level). + description: One or more demographic characteristics of the annotators, if + available and relevant (e.g., geographic location, language background, + expertise level, native language). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling exact_mappings: - rai:annotatorDemographics @@ -18127,7 +19032,8 @@ classes: multivalued: true labeling_details: name: labeling_details - description: 'Details on labeling/annotation procedures and quality metrics. + description: 'Free-text description of the labeling or annotation procedures, + including annotation guidelines, task definitions, and quality control metrics. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling @@ -18141,6 +19047,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -18475,8 +19383,10 @@ classes: access_url: name: access_url description: URL or access point for the raw data. + examples: + - value: https://example.org/dataset/raw/raw-data.zip from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling - slot_uri: dcat:accessURL + slot_uri: d4d:rawDataAccessURL alias: access_url owner: RawData domain_of: @@ -18484,7 +19394,8 @@ classes: range: uri raw_data_details: name: raw_data_details - description: 'Details on raw data availability and access procedures. + description: 'Free-text description of raw data availability, access procedures, + and any conditions or restrictions on accessing the raw data. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling @@ -18498,6 +19409,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -18887,6 +19800,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -19288,6 +20203,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -19631,7 +20548,9 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling - slot_uri: schema:name + broad_mappings: + - schema:name + slot_uri: d4d:toolNames alias: tools owner: MachineAnnotationTools domain_of: @@ -19655,9 +20574,9 @@ classes: multivalued: true tool_accuracy: name: tool_accuracy - description: 'Known accuracy or performance metrics for the automated tools - (if available). Include metric name and value (e.g., "spaCy F1: 0.95", "GPT-4 - Accuracy: 92%"). + description: 'One or more known accuracy or performance metrics for the automated + tools (if available). Include metric name and value (e.g., "spaCy F1: 0.95", + "GPT-4 Accuracy: 92%"). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/preprocessing-cleaning-labeling @@ -19671,6 +20590,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -20017,6 +20938,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -20341,16 +21264,17 @@ classes: inlined_as_list: true UseRepository: name: UseRepository - description: 'Is there a repository that links to any or all papers or systems - that use the dataset? If so, provide a link or other access point. - - ' + description: A repository or registry of known uses of this dataset by third parties. + Documents where the dataset has been applied, enabling discoverability of downstream + use cases and impact tracking. from_schema: https://w3id.org/bridge2ai/data-sheets-schema is_a: DatasetProperty attributes: repository_url: name: repository_url description: URL to a repository of known dataset uses. + examples: + - value: https://example.org/dataset/known-uses from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses alias: repository_url owner: UseRepository @@ -20359,7 +21283,8 @@ classes: range: uri repository_details: name: repository_details - description: 'Details on the repository of known dataset uses. + description: 'Free-text description of the repository of known dataset uses, + including how it is maintained and how to contribute new use cases. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses @@ -20373,6 +21298,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -20705,7 +21632,8 @@ classes: attributes: task_details: name: task_details - description: 'Details on other potential tasks the dataset could be used for. + description: 'Free-text description of other potential tasks the dataset could + support, including any prerequisites or limitations for those uses. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses @@ -20719,6 +21647,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -21056,7 +21986,9 @@ classes: attributes: impact_details: name: impact_details - description: 'Details on potential impacts, risks, and mitigation strategies. + description: 'Free-text description of potential future impacts or risks arising + from the dataset''s composition or collection (e.g., unfair treatment, privacy + violations, legal or financial risks), and any recommended mitigation strategies. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses @@ -21071,6 +22003,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -21403,7 +22337,9 @@ classes: attributes: discouragement_details: name: discouragement_details - description: 'Details on tasks for which the dataset should not be used. + description: 'Free-text description of tasks or applications for which the + dataset is not recommended, with explanation of why (e.g., out-of-scope, + risk of harm, poor coverage). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses @@ -21417,6 +22353,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -21765,7 +22703,7 @@ classes: multivalued: true usage_notes: name: usage_notes - description: Notes or caveats about using the dataset for intended purposes. + description: A note or caveat about using the dataset for its intended purposes. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses alias: usage_notes owner: IntendedUse @@ -21774,8 +22712,8 @@ classes: range: string use_category: name: use_category - description: Category of intended use (e.g., research, clinical, educational, - commercial, policy). + description: One or more categories of intended use (e.g., research, clinical, + educational, commercial, policy). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses slot_uri: d4d:useCategory alias: use_category @@ -21787,6 +22725,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -22121,8 +23061,8 @@ classes: attributes: prohibition_reason: name: prohibition_reason - description: Reason why this use is prohibited (e.g., license restriction, - ethical concern, privacy risk, legal constraint). + description: One or more reasons why this use is prohibited (e.g., license + restriction, ethical concern, privacy risk, legal constraint). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/uses slot_uri: d4d:prohibitionReason alias: prohibition_reason @@ -22134,6 +23074,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -22473,7 +23415,7 @@ classes: ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/distribution - slot_uri: dcterms:accessRights + slot_uri: d4d:isExternallyShared alias: is_shared owner: ThirdPartySharing domain_of: @@ -22482,6 +23424,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -22815,18 +23759,24 @@ classes: attributes: access_urls: name: access_urls - description: Details of the distribution channel(s) or format(s). + description: One or more URLs providing access to the distribution channel(s) + or format(s). + examples: + - value: https://example.org/dataset/download + - value: https://example.org/api/v1/dataset from_schema: https://w3id.org/bridge2ai/data-sheets-schema/distribution slot_uri: dcat:accessURL alias: access_urls owner: DistributionFormat domain_of: - DistributionFormat - range: string + range: uri multivalued: true id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -23159,8 +24109,9 @@ classes: attributes: release_dates: name: release_dates - description: 'Dates or timeframe for dataset release. Could be a one-time - release date or multiple scheduled releases. + description: 'One or more dates or timeframes for dataset release, in ISO + 8601 format (e.g., "2024-03-15") or as a descriptive string (e.g., "Q2 2024"). + Use multiple values for staged or scheduled releases. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/distribution @@ -23174,6 +24125,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -23519,7 +24472,9 @@ classes: range: CreatorOrMaintainerEnum maintainer_details: name: maintainer_details - description: 'Details on who will support, host, or maintain the dataset. + description: 'Free-text description of the organization, team, or individual + responsible for maintaining the dataset, including contact information and + hosting arrangements. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -23533,6 +24488,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -23867,8 +24824,10 @@ classes: erratum_url: name: erratum_url description: URL or access point for the erratum. + examples: + - value: https://example.org/dataset/errata/2024-01-15 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance - slot_uri: dcat:accessURL + slot_uri: d4d:erratumURL alias: erratum_url owner: Erratum domain_of: @@ -23876,7 +24835,8 @@ classes: range: uri erratum_details: name: erratum_details - description: 'Details on any errata or corrections to the dataset. + description: 'Free-text description of the error, its scope, the affected + data or records, and the correction applied. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -23890,6 +24850,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -24236,8 +25198,9 @@ classes: range: string update_details: name: update_details - description: 'Details on update plans, responsible parties, and communication - methods. + description: 'Free-text description of planned update types (e.g., corrections, + additions, deletions), responsible parties, and how updates will be communicated + to users. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -24251,6 +25214,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -24596,7 +25561,9 @@ classes: range: string retention_details: name: retention_details - description: 'Details on data retention limits and enforcement procedures. + description: 'Free-text description of applicable retention limits, legal + or ethical basis for those limits, and how they will be enforced (e.g., + automated deletion, anonymization after the retention period). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -24610,6 +25577,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -24943,14 +25912,16 @@ classes: attributes: latest_version_doi: name: latest_version_doi - description: DOI or URL of the latest dataset version. + description: DOI or URL identifying the latest version of this dataset (e.g., + '10.5281/zenodo.1234567' for a DOI or 'https://doi.org/10.5281/zenodo.1234567' + for a full URL). Use CURIE format for DOIs (e.g., 'doi:10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance - slot_uri: schema:identifier + slot_uri: dcterms:hasVersion alias: latest_version_doi owner: VersionAccess domain_of: - VersionAccess - range: string + range: uriorcurie versions_available: name: versions_available description: List of available versions with metadata. @@ -24964,7 +25935,9 @@ classes: multivalued: true version_details: name: version_details - description: 'Details on version support policies and obsolescence communication. + description: 'Free-text description of version support policies, how long + older versions will be hosted, and how dataset consumers will be notified + when versions become obsolete. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -24978,6 +25951,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -25313,8 +26288,10 @@ classes: contribution_url: name: contribution_url description: URL for contribution guidelines or process. + examples: + - value: https://example.org/dataset/contributing from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance - slot_uri: dcat:landingPage + slot_uri: d4d:contributionURL alias: contribution_url owner: ExtensionMechanism domain_of: @@ -25322,8 +26299,9 @@ classes: range: uri extension_details: name: extension_details - description: 'Details on extension mechanisms, contribution validation, and - communication. + description: 'Free-text description of how third parties can contribute to + the dataset, how contributions are validated (e.g., peer review, automated + tests), and how accepted contributions will be communicated to the community. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/maintenance @@ -25337,6 +26315,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -25675,9 +26655,9 @@ classes: description: Contact person for questions about ethical review. Provides structured contact information including name, email, affiliation, and optional ORCID. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics - exact_mappings: + broad_mappings: - schema:contactPoint - slot_uri: schema:contactPoint + slot_uri: d4d:ethicsContactPoint alias: contact_person owner: EthicalReview domain_of: @@ -25700,8 +26680,9 @@ classes: range: Organization review_details: name: review_details - description: 'Details on ethical review processes, outcomes, and supporting - documentation. + description: 'Free-text description of the ethical review process, board decisions, + outcomes, and any supporting documentation (e.g., IRB approval number, ethics + committee name). ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics @@ -25715,6 +26696,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -26050,7 +27033,9 @@ classes: attributes: impact_details: name: impact_details - description: 'Details on data protection impact analysis, outcomes, and documentation. + description: 'Free-text description of the data protection impact analysis, + including methodology, privacy risks identified, mitigation measures taken, + and any regulatory findings. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics @@ -26065,6 +27050,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -26399,7 +27386,10 @@ classes: attributes: notification_details: name: notification_details - description: 'Details on how individuals were notified about data collection. + description: 'Free-text description of how individuals were notified about + data collection, including the notification method (e.g., email, poster, + in-person), timing, and the language or text of the notification itself + if available. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics @@ -26413,6 +27403,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -26747,7 +27739,9 @@ classes: attributes: consent_details: name: consent_details - description: 'Details on how consent was requested, provided, and documented. + description: 'Free-text description of how consent was requested (e.g., opt-in + form, verbal agreement), provided, and documented, including the language + individuals consented to. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics @@ -26761,6 +27755,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -27095,7 +28091,10 @@ classes: attributes: revocation_details: name: revocation_details - description: 'Details on consent revocation mechanisms and procedures. + description: 'Free-text description of the mechanism provided for individuals + to revoke consent (e.g., opt-out portal, written request), the scope of + revocation (full withdrawal or specific uses), and what happens to their + data after revocation. ' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/ethics @@ -27109,6 +28108,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -27509,6 +28510,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -27909,6 +28912,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -28299,6 +29304,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -28685,6 +29692,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -29074,6 +30083,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -29408,12 +30419,14 @@ classes: attributes: license_terms: name: license_terms - description: 'Description of the dataset''s license and terms of use (including - links, costs, or usage constraints). - - ' + description: Description of the dataset's license and terms of use, including + links, costs, or usage constraints (e.g., 'CC BY 4.0', 'Apache 2.0', 'MIT', + 'CC BY-NC-SA 4.0', 'proprietary - contact data@example.org for access'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance - slot_uri: dcterms:license + broad_mappings: + - dcterms:license + - dcterms:rights + slot_uri: d4d:licenseDescription alias: license_terms owner: LicenseAndUseTerms domain_of: @@ -29425,7 +30438,7 @@ classes: description: Structured data use permissions using the Data Use Ontology (DUO). Specifies permitted uses (e.g., general research, health/medical research, disease-specific research) and restrictions (e.g., non-commercial use, ethics - approval required, collaboration required). See https://github.com/EBISPOT/DUO + approval required, collaboration required). See https://github.com/EBISPOT/DUO. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance exact_mappings: - DUO:0000001 @@ -29443,9 +30456,9 @@ classes: person can answer questions about licensing terms, usage restrictions, fees, and permissions. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance - exact_mappings: + broad_mappings: - schema:contactPoint - slot_uri: schema:contactPoint + slot_uri: d4d:licenseContactPoint alias: contact_person owner: LicenseAndUseTerms domain_of: @@ -29455,6 +30468,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -29790,7 +30805,8 @@ classes: attributes: restrictions: name: restrictions - description: Explanation of third-party IP restrictions. + description: One or more explanations of third-party IP restrictions or associated + fees. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance broad_mappings: - DUO:0000046 @@ -29806,6 +30822,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -30142,7 +31160,8 @@ classes: attributes: regulatory_restrictions: name: regulatory_restrictions - description: Export or regulatory restrictions on the dataset. + description: One or more export controls or regulatory restrictions applicable + to the dataset (e.g., HIPAA, ITAR, GDPR). from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance broad_mappings: - DUO:0000021 @@ -30198,9 +31217,9 @@ classes: answer questions about data governance policies, access procedures, and oversight mechanisms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/data-governance - exact_mappings: + broad_mappings: - schema:contactPoint - slot_uri: schema:contactPoint + slot_uri: d4d:governanceContactPoint alias: governance_committee_contact owner: ExportControlRegulatoryRestrictions domain_of: @@ -30209,6 +31228,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -30546,9 +31567,10 @@ classes: description: The name or identifier of the variable as it appears in the data files. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables - exact_mappings: + broad_mappings: - schema:name - slot_uri: schema:name + - schema:identifier + slot_uri: d4d:variableName alias: variable_name owner: VariableMetadata domain_of: @@ -30599,6 +31621,8 @@ classes: name: minimum_value description: The minimum value that the variable can take. Applicable to numeric variables. + examples: + - value: '0.0' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables slot_uri: schema:minValue alias: minimum_value @@ -30610,6 +31634,8 @@ classes: name: maximum_value description: The maximum value that the variable can take. Applicable to numeric variables. + examples: + - value: '100.0' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables slot_uri: schema:maxValue alias: maximum_value @@ -30619,8 +31645,8 @@ classes: range: float categories: name: categories - description: The permitted categories or values for a categorical variable. - Each entry should describe a possible value and its meaning. + description: One or more permitted categories or values for a categorical + variable. Each entry should describe a possible value and its meaning. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables slot_uri: schema:valueReference alias: categories @@ -30647,7 +31673,7 @@ classes: description: Indicates whether this variable serves as a unique identifier or key for records in the dataset. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables - slot_uri: schema:identifier + slot_uri: d4d:isIdentifier alias: is_identifier owner: VariableMetadata domain_of: @@ -30667,6 +31693,9 @@ classes: precision: name: precision description: The precision or number of decimal places for numeric variables. + examples: + - value: '2' + description: Two decimal places, e.g., 3.14 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables slot_uri: schema:valuePrecision alias: precision @@ -30701,7 +31730,9 @@ classes: description: Notes about data quality, reliability, or known issues specific to this variable. from_schema: https://w3id.org/bridge2ai/data-sheets-schema/variables - slot_uri: dcterms:description + broad_mappings: + - dcterms:description + slot_uri: d4d:qualityNotes alias: quality_notes owner: VariableMetadata domain_of: @@ -30711,6 +31742,8 @@ classes: id: name: id description: An optional identifier for this property. + examples: + - value: https://example.org/dataset/property-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier alias: id @@ -31082,6 +32115,7 @@ classes: range: integer path: name: path + description: The file path or URL where the content is located. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: schema:contentUrl alias: path @@ -31103,9 +32137,9 @@ classes: range: FormatEnum encoding: name: encoding - description: the character encoding of the data + description: The character encoding of the data. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcat:mediaType + slot_uri: d4d:characterEncoding alias: encoding owner: File domain_of: @@ -31113,7 +32147,7 @@ classes: range: EncodingEnum compression: name: compression - description: compression format used, if any. e.g., gzip, bzip2, zip + description: Compression format used, if any (e.g., gzip, bzip2, zip). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:compressFormat alias: compression @@ -31140,9 +32174,12 @@ classes: range: MediaTypeEnum hash: name: hash - description: hash of the data + description: 'Cryptographic hash value of the data for integrity verification + (e.g., SHA-256: ''e3b0c44298fc1c149afb...'', MD5: ''d41d8cd98f00b204e9800998ecf8427e'').' from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:hashValue alias: hash owner: File domain_of: @@ -31150,9 +32187,11 @@ classes: range: string md5: name: md5 - description: md5 hash of the data + description: MD5 hash value of the data (128-bit cryptographic hash). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:md5Checksum alias: md5 owner: File domain_of: @@ -31160,9 +32199,9 @@ classes: range: string sha256: name: sha256 - description: sha256 hash of the data + description: SHA-256 hash value of the data (256-bit cryptographic hash, recommended). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + slot_uri: schema:sha256 alias: sha256 owner: File domain_of: @@ -31181,6 +32220,8 @@ classes: range: string conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -31194,8 +32235,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: File domain_of: @@ -31207,8 +32252,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: File domain_of: @@ -31220,6 +32268,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -31233,6 +32283,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -31246,9 +32297,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: File domain_of: @@ -31279,6 +32335,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -31292,6 +32349,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -31306,7 +32364,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -31322,6 +32380,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -31335,6 +32395,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -31349,6 +32411,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -31362,6 +32426,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -31375,6 +32441,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -31388,8 +32456,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: File domain_of: @@ -31401,7 +32470,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -31415,8 +32484,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: File domain_of: @@ -31429,6 +32499,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -31445,6 +32517,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true @@ -31764,6 +32838,8 @@ classes: file_count: name: file_count description: Number of files in this collection. + examples: + - value: '47' from_schema: https://w3id.org/bridge2ai/data-sheets-schema/file-collection slot_uri: d4d:fileCount alias: file_count @@ -31773,7 +32849,11 @@ classes: range: integer total_bytes: name: total_bytes - description: Total size of all files in bytes. + description: Total size of all files in this collection, in bytes (integer). + Maps to dcat:byteSize. + examples: + - value: '1073741824' + description: 1 GiB (1024³ bytes) from_schema: https://w3id.org/bridge2ai/data-sheets-schema/file-collection slot_uri: dcat:byteSize alias: total_bytes @@ -31847,6 +32927,8 @@ classes: - range: FileCollection conforms_to: name: conforms_to + description: An established standard, specification, or schema to which the + resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:conformsTo alias: conforms_to @@ -31860,8 +32942,12 @@ classes: range: string conforms_to_class: name: conforms_to_class + description: The specific class or type within a schema to which the resource + conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToClass alias: conforms_to_class owner: FileCollection domain_of: @@ -31873,8 +32959,11 @@ classes: range: string conforms_to_schema: name: conforms_to_schema + description: The schema or data model to which the resource conforms. from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:conformsTo + broad_mappings: + - dcterms:conformsTo + slot_uri: d4d:conformsToSchema alias: conforms_to_schema owner: FileCollection domain_of: @@ -31886,6 +32975,8 @@ classes: range: string created_by: name: created_by + description: The person or organization primarily responsible for creating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:creator alias: created_by @@ -31899,6 +32990,7 @@ classes: range: string created_on: name: created_on + description: The date and time when the resource was created. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:created alias: created_on @@ -31912,9 +33004,14 @@ classes: range: datetime doi: name: doi - description: digital object identifier + description: Digital Object Identifier (DOI) in format 10.xxxx/xxxxx providing + persistent identification (e.g., '10.1038/s41586-020-2649-2', '10.5281/zenodo.1234567'). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:identifier + exact_mappings: + - schema:identifier + broad_mappings: + - dcterms:identifier + slot_uri: d4d:doiIdentifier alias: doi owner: FileCollection domain_of: @@ -31945,6 +33042,7 @@ classes: range: uri issued: name: issued + description: Date of formal issuance or publication of the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:issued alias: issued @@ -31958,6 +33056,7 @@ classes: range: datetime keywords: name: keywords + description: Keywords or tags describing the resource for discovery and classification. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:keyword alias: keywords @@ -31972,7 +33071,7 @@ classes: multivalued: true language: name: language - description: language in which the information is expressed + description: Language in which the information is expressed. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - schema:inLanguage @@ -31988,6 +33087,8 @@ classes: range: string last_updated_on: name: last_updated_on + description: The date and time when the resource was most recently modified + or updated. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:modified alias: last_updated_on @@ -32001,6 +33102,8 @@ classes: range: datetime license: name: license + description: The legal license under which the resource is made available + (e.g., "MIT", "CC-BY-4.0"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:license alias: license @@ -32015,6 +33118,8 @@ classes: range: string modified_by: name: modified_by + description: A person or organization that contributed to modifying or updating + the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:contributor alias: modified_by @@ -32028,6 +33133,8 @@ classes: range: string page: name: page + description: A landing page or web page providing access to or information + about the resource. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcat:landingPage alias: page @@ -32041,6 +33148,8 @@ classes: range: string publisher: name: publisher + description: The organization or entity responsible for making the resource + available. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:publisher alias: publisher @@ -32054,8 +33163,9 @@ classes: range: uriorcurie status: name: status + description: The status of the resource (e.g., draft, published, deprecated). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:type + slot_uri: d4d:publicationStatus alias: status owner: FileCollection domain_of: @@ -32067,7 +33177,7 @@ classes: range: string title: name: title - description: the official title of the element + description: The official title of the element. from_schema: https://w3id.org/bridge2ai/data-sheets-schema slot_uri: dcterms:title alias: title @@ -32081,8 +33191,9 @@ classes: range: string version: name: version + description: The version identifier of the resource (e.g., "1.0", "2.3.1"). from_schema: https://w3id.org/bridge2ai/data-sheets-schema - slot_uri: dcterms:hasVersion + slot_uri: schema:version alias: version owner: FileCollection domain_of: @@ -32095,6 +33206,8 @@ classes: range: string was_derived_from: name: was_derived_from + description: A resource from which this resource was derived, in whole or + in part. from_schema: https://w3id.org/bridge2ai/data-sheets-schema exact_mappings: - dcterms:source @@ -32111,6 +33224,8 @@ classes: id: name: id description: A unique identifier for a thing. + examples: + - value: https://example.org/dataset/my-dataset-001 from_schema: https://w3id.org/bridge2ai/data-sheets-schema/base slot_uri: schema:identifier identifier: true