Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: how to describe the nature of the content for Datasets? #629

Open
ljgarcia opened this issue Jan 24, 2023 · 4 comments
Open
Labels
fix pending A pull request has been opened to address this issue topic: specification type: Dataset

Comments

@ljgarcia
Copy link
Contributor

Some options that were mentioned on emails and community calls:

What way would the community want to go? Please add your thoughts, pros and cons to help us find a community-based approach

@ljgarcia
Copy link
Contributor Author

I would go for about as its range Thing would make it possible to use Bioschemas types such as Taxon while also possible to use DefineTerms coming, for instance, from EDAM.

RO-Crate also use "about" (in use) and “keywords” like that.
https://www.researchobject.org/ro-crate/1.1/contextual-entities.html#subjects--keywords

@arendd
Copy link

arendd commented Jan 26, 2023

@ljgarcia Thanks for opening the discussion.

In our case the repository provides heterogeneous datasets, focused on plant research data, but without a specific data domain focus, because the aim was to provide a generic platform to share datasets, which are too large or not in the scope of existing database. We have genomic data, phenotypic images, metabolomics dataset, microscopy pictures, software and so on. That is why the general specification is “dataset”, but of course, all are related to plants and can therefore described with a "taxon".

I think I would prefer the solution to add the taxon content in the "about" section, because it looks more clear and the "keywords section is already used for the general dataset description.

Here is an example:

<script type="application/ld+json">{

  "@context":"http://schema.org/",
  "@type":"Dataset",
  "http://purl.org/dc/terms/conformsTo":"https://bioschemas.org/profiles/Dataset/1.0-RELEASE",
  "@id":"https://doi.ipk-gatersleben.de/DOI/b2f47dfb-47ff-4114-89ae-bad8dcc515a1/7eb2707b-d447-425c-be7a-fe3f1fae67cb/2",
  "keywords":"barley, Hordeum vulgare, genome sequence assembly, long read sequencing, gene annotation, transposable elements",
  "about":  {
    "@type":"Taxon",
    "@id":"http://purl.bioontology.org/ontology/NCBITAXON/4513",
    "http://purl.org/dc/terms/conformsTo":"https://bioschemas.org/profiles/Taxon/0.6-RELEASE",
    "url":"https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=info&id=4513",  
    "taxonRank":"species",
    "parentTaxon":"Hordeum",
    "http://rs.tdwg.org/dwc/terms/vernacularName":"barley"
  }
  #The rest of the properties describing this dataset
}
</script>

@frmichel
Copy link
Member

Hi, I also totally agree with the "about" option, the illustration given by @arendd is very convincing that this is very appropriate.

@ljgarcia
Copy link
Contributor Author

@gtsueng this discussion is useful also for the "topic" and "organism" elements needed in the synthetic datasets.
We could use about to describe the topic/subject of the Dataset (including the organism), see also https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/subject.

ljgarcia added a commit that referenced this issue Apr 12, 2024
Add about as recommended.

In Bioschemas, following what is done in RO-Crate, about should be used to specified the nature of the content of the dataset, uisng DefinedTerm whenever possible. For instance, if a dataset contains information about mice, the about could be ncit:10090. Ideally this info should go as part of the Bioschemas description but currently there is no way to capture those. 

This change is discussed in #629
@gtsueng gtsueng added the fix pending A pull request has been opened to address this issue label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pending A pull request has been opened to address this issue topic: specification type: Dataset
Projects
None yet
Development

No branches or pull requests

4 participants