diff --git a/mkdocs.yml b/mkdocs.yml index b860d6ca9e..b238763ab5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -10,6 +10,7 @@ nav: - Data summary files: modality-agnostic-files/data-summary-files.md - Phenotypic and assessment data: modality-agnostic-files/phenotypic-and-assessment-data.md - Code: modality-agnostic-files/code.md + - Provenance: modality-agnostic-files/provenance.md - Events: modality-agnostic-files/events.md - Modality specific files: - Magnetic Resonance Imaging: modality-specific-files/magnetic-resonance-imaging-data.md @@ -124,6 +125,9 @@ markdown_extensions: - name: tsvgz class: tsv format: !!python/name:bidsschematools.render.tsv.fence + - name: mermaid + class: mermaid + format: !!python/name:pymdownx.superfences.fence_code_format - admonition - pymdownx.details plugins: diff --git a/src/common-principles.md b/src/common-principles.md index e5b7240361..070b8c12e0 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -928,10 +928,10 @@ Bare DOIs such as `10.18112/openneuro.ds000001.v1.0.0` are [DEPRECATED][]. ### BIDS URI -To reference files in BIDS datasets, the following URI scheme may be used: +To reference files or directories in BIDS datasets, the following URI scheme may be used: ```plain -bids:[]: +bids:[]:[#] ``` The scheme component `bids` identifies a BIDS URI, @@ -940,6 +940,7 @@ The `dataset-name` component is an identifier for a BIDS dataset, and the `relative-path` component is the location of a resource within that BIDS dataset, relative to the root of that dataset. The `relative-path` MUST NOT start with a forward-slash character (`/`). +The `fragment` MAY be used to identify a resource that is subordinate to the file or directory. Examples: @@ -947,11 +948,16 @@ Examples: bids::sub-01/fmap/sub-01_dir-AP_epi.nii.gz bids:ds000001:sub-02/anat/sub-02_T1w.nii.gz bids:myderivatives:sub-03/func/sub-03_task-rest_space-MNI152_bold.nii.gz +bids:fmriprep:sub-001/anat/sub-001_T1w_preproc.nii.gz#aa56ztg8 +bids::prov#preprocessing-00f3a18f ``` If no dataset name is specified, the URI is relative to the current BIDS dataset. This is made more precise in the next section. +!!! note + A BIDS dataset can be referenced using the BIDS URI of its root directory. For example: `bids:ds000001:.` + #### Resolution of BIDS URIs In order to resolve a BIDS URI, the dataset name must be mapped to a BIDS dataset. @@ -1003,7 +1009,7 @@ No protocol is currently proposed to automatically resolve all possible BIDS URI BIDS URIs are parsable as standard [URIs][] with scheme `bids` and path `[]:`. -The authority, query and fragment components are unused. +The authority and query components are unused. Future versions of BIDS may specify interpretations for these components, but MUST NOT change the interpretation of a previously valid BIDS URI. For example, a future version may specify an authority that would allow BIDS diff --git a/src/introduction.md b/src/introduction.md index 2a5a377fe7..f0070de928 100644 --- a/src/introduction.md +++ b/src/introduction.md @@ -192,6 +192,15 @@ For example: Scientific Data 12, (13841). [doi:10.1038/s41597-025-05543-2](https://doi.org/10.1038/s41597-025-05543-2) +### Other extensions specific publications + +#### Provenance + +- Rémi Adon, Stefan Appelhoff, Tibor Auer, Laurent Guillo, Yaroslav O Halchenko, David Keator, Christopher J Markiewicz, Thomas E Nichols, Jean-Baptiste Poline, Satrajit Ghosh, Camille Maumet (2021). + **BIDS-prov: a provenance framework for BIDS**. + OHBM 2021 - 25th Annual Meeting of the Organization for Human Brain Mapping, Jun 2021, Online, South Korea. pp.1-3 + [https://inserm.hal.science/inserm-03478998v1](https://inserm.hal.science/inserm-03478998v1) + ### Research Resource Identifier (RRID) BIDS has also a diff --git a/src/metaschema.json b/src/metaschema.json index 49d0f8cab7..64d58c849f 100644 --- a/src/metaschema.json +++ b/src/metaschema.json @@ -362,6 +362,15 @@ }, "additionalProperties": false }, + "modality_agnostic": { + "type": "object", + "patternProperties": { + "^[a-zA-Z0-9_]+$": { + "$ref": "#/definitions/suffixRule" + } + }, + "additionalProperties": false + }, "tables": { "type": "object", "patternProperties": { @@ -375,7 +384,7 @@ "additionalProperties": false } }, - "required": ["core", "tables"], + "required": ["core", "modality_agnostic", "tables"], "additionalProperties": false }, "deriv": { diff --git a/src/modality-agnostic-files/dataset-description.md b/src/modality-agnostic-files/dataset-description.md index 79d75264c3..c4f45f259c 100644 --- a/src/modality-agnostic-files/dataset-description.md +++ b/src/modality-agnostic-files/dataset-description.md @@ -45,16 +45,9 @@ and a guide for using macros can be found at } ) }} -Each object in the `GeneratedBy` array includes the following REQUIRED, RECOMMENDED -and OPTIONAL keys: - - -{{ MACROS___make_subobject_table("metadata.GeneratedBy.items") }} +!!! Note + See the [Provenance of a BIDS dataset](provenance.md#provenance-of-a-bids-dataset) section + for more information on how to describe provenance using the `GeneratedBy` field. Example: @@ -107,8 +100,6 @@ Example: } ``` -### Derived dataset and pipeline description - As for any BIDS dataset, a `dataset_description.json` file MUST be found at the top level of every derived dataset: `/derivatives//dataset_description.json`. diff --git a/src/modality-agnostic-files/provenance.md b/src/modality-agnostic-files/provenance.md new file mode 100644 index 0000000000..243e261d0d --- /dev/null +++ b/src/modality-agnostic-files/provenance.md @@ -0,0 +1,771 @@ +# Provenance + +Support for provenance was developed as a [BIDS Extension Proposal](../extensions.md#bids-extension-proposals). +Please see [Citing BIDS](../introduction.md#citing-bids) on how to appropriately credit this extension when referring to it in the +context of the academic literature. + +!!! example "Example datasets" + + Several [example BIDS-Prov datasets](https://bids-website.readthedocs.io/en/latest/datasets/examples.html#provenance) have been formatted using this specification and can be used for practical guidance when curating a new dataset. + +This part of the BIDS specification is aimed at describing the provenance of a BIDS dataset. This description is retrospective: it describes a set of steps that were executed in order to obtain the dataset. Note: This is different from prospective provenance that focuses describing workflows that may be run on a dataset. This description is based on the [W3C Prov](https://www.w3.org/TR/2013/REC-prov-o-20130430/) standard (see the [Provenance from an RDF perspective](#provenance-from-a-rdf-perspective) section for more information). + +Provenance information SHOULD be included in a BIDS dataset when possible. If provenance information is included, it MUST be described using the conventions detailed by this specification. Provenance information reflects the provenance of a full dataset and/or of specific files at any level of the BIDS hierarchy. Provenance information SHOULD not include human subject identifying data. + +## Provenance of a BIDS file + +Provenance of a BIDS file SHOULD be stored inside its sidecar JSON. + +For that purpose, any sidecar JSON file MAY include the following keys: + + +{{ MACROS___make_metadata_table( + { + "GeneratedById": "OPTIONAL", + "SidecarGeneratedBy": "OPTIONAL", + "Digest": "OPTIONAL", + "ProvEntityType": "OPTIONAL" + } +) }} + +!!! example "Example of metadata in a sidecar JSON file" + ```JSON + { + "GeneratedBy": "bids::prov#conversion-00f3a18f", + "SidecarGeneratedBy": [ + "bids::prov#preparation-conversion-1xkhm1ft", + "bids::prov#conversion-00f3a18f" + ], + "Digest": { + "sha256": "66eeafb465559148e0222d4079558a8354eb09b9efabcc47cd5b8af6eed51907" + } + } + ``` + This snippet is similar to fields described in [DICOM to Nifti conversion with `heudiconv` example](https://github.com/bclenet/bids-examples/tree/BEP028_heudiconv/provenance_heudiconv). + +## Provenance of a BIDS dataset + +Provenance of a BIDS dataset (raw, derivative, or study) SHOULD be stored inside its `dataset_description.json` file. Corresponding metadata describes the provenance of the whole dataset. + +The `dataset_description.json` file of a **BIDS raw dataset** or **BIDS study dataset** MAY include the `GeneratedBy` key to describe provenance. + +The `dataset_description.json` file of a **BIDS derivative dataset** MUST include the `GeneratedBy` key to describe provenance. + +The `GeneratedBy` field MAY contain either of the following values: + +- Identifier(s) of the activity/activities responsible for the creation of the dataset (see the [Description using identifiers](#description-using-provenance-objects) section). +- A description of pipelines or processes responsible for the creation of the dataset (see the [Description of pipelines or processes](#description-of-pipelines-or-processes) section). + +### Description using identifiers + +This section details the way to describe provenance of a dataset in the `GeneratedBy` field, using identifiers. + + +{{ MACROS___make_metadata_table( + { + "GeneratedById": "RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets" + } +) }} + +!!! example "Example of `GeneratedBy` contents in a `dataset_description.json`" + ```JSON + { + "GeneratedBy": "bids::prov#preprocessing-xMpFqB5q" + } + ``` + This is a snippet from the [fMRI preprocessing with `fMRIPrep` example](https://github.com/bclenet/bids-examples/tree/BEP028_fmriprep/provenance_fmriprep). + +### Description of processes or pipelines + +This section details a way to describe the provenance of a dataset, providing `GeneratedBy` with an array of objects representing pipelines or processes that generated the dataset. + +!!! note + This description can be equivalently represented using the previous section. This modeling is kept for backward-compatibility but might be removed in future BIDS releases (see BIDS 2.0). + + +{{ MACROS___make_metadata_table( + { + "GeneratedBy": "RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets" + } +) }} + +Each object in the `GeneratedBy` array includes the following REQUIRED, RECOMMENDED +and OPTIONAL keys: + + +{{ MACROS___make_metadata_table( + { + "Name__GeneratedBy": "REQUIRED", + "Version__GeneratedBy": "RECOMMENDED", + "Description__GeneratedBy": 'RECOMMENDED if `Name` is `"Manual"`, OPTIONAL otherwise', + "CodeURL": "OPTIONAL", + "Container": "OPTIONAL" + } +) }} + +!!! example "Example of `GeneratedBy` contents in a `dataset_description.json`" + ```JSON + { + "GeneratedBy": [ + { + "Name": "reproin", + "Version": "0.6.0", + "Container": { + "Type": "docker", + "Tag": "repronim/reproin:0.6.0" + } + } + ] + } + ``` + +## Provenance files + +When not inside sidecar JSON files or `dataset_description.json`, provenance information MUST be stored inside provenance files. + + +{{ MACROS___make_filename_template( + "common", + datatypes=["prov"], + suffixes=["act", "ent", "env", "soft"]) +}} + +!!! note + The [`prov entity`](../appendices/entities.md#prov) allows to group related provenance files, using an arbitrary value for `