Back to Projects List
- Marco Nolden (German Cancer Research Center, Helmholtz Metadata Collaboration, Germany)
- Andrey Fedorov (Brigham and Women’s Hospital, USA)
- Paolo Zaffino (Magna Graecia University of Catanzaro, Italy)
- Maria Francesca Spadea (Institute of Biomedical Engineering, KIT - Karlsruher Institut für Technologie, Germany)
“Metadata is a love note to the future”
- Jason Scott
The Helmholtz Metadata Collaboration is a cross-domain initiative across the whole Helmholtz Association, which is the largest funding agency in Germany. It follows the goal to develop and establish novel methods and tools documenting and sharing research data by means of enriched metadata, as well as improved interoperability of data across disciplines. The Hub Health of this initiative is anchored in the Division of Medical Image Computing at the German Cancer Research Center Heidelberg.
The FAIR principles are guidelines to make your data, including software, findable, accessible, interoperable and reusable. They are an important component of Open Science.
NCI Imaging Data Commons is tasked with establishing publicly available repository of cancer imaging data, and in this role is developing workflows to harmonize image and image-derived data representation into DICOM, make metadata searchable, and connect imaging metadata with clinical metadata. Thus, this project might be helpful to the HMC project. We will explore this connection this week!
We will investigate relevant metadata descriptions of medical images, cohorts, and medical image analyis pipelines and results like machine learning models.
An additional aspect to look at will be aspects of generating, reviewing and sharing of metadata of research data which contains personally identifiable information.
Common standards, tools and practices can make interoperability much easier. Within this project we want to investigate which tools are already used in our community, which lessons were already learned, and perform experiments regarding interoperability of data and analysis pipelines as well as analysis results.
- Objective A. Create an overview on existing tools and standards
- Objective B. Identify challenges.
- Objective C. Perform interoperability experiments
- Have a walkthrough of the IDC project and tech stack - starting from this introductory tutorial series in IDC:
- Discuss best practices of data sharing with project attendees.
- Marco completed IDC getting started tutorial
- Set up cloud project for experimentation, Andrey added Marco to a project that has billing set up.
- Worked on exploring BigQuery for querying of IDC data and exporting metadata into JSON for exploration outside of IDC.
- Met with Paolo Zaffino and Maria Francesca Spadea to discuss recommended practices for data sharing (representation, repositories, issues related to de-identification).
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016).
Bridge, C.P., Gorman, C., Pieper, S. et al. Highdicom: a Python Library for Standardized Encoding of Image Annotations and Machine Learning Model Outputs in Pathology and Radiology. J Digit Imaging 35, 1719–1737 (2022).
Deepa Krishnaswamy, Dennis Bontempi, David Clunie, Hugo Aerts, & Andrey Fedorov. (2023). AI-derived annotations for the NLST and NSCLC-Radiomics computed tomography imaging collections [Data set]. Zenodo.
Zaffino P, Marzullo A, Moccia S, Calimeri F, De Momi E, Bertucci B, Arcuri PP, Spadea MF. An Open-Source COVID-19 CT Dataset with Automatic Lung Tissue Classification for Radiomics. Bioengineering. 2021; 8(2):26.