Skip to content

Web2LearnEU/AISTER-Crowdsourcing-Pilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alt text Folk Painting "Portrait of a girl" by unknown - Online Museum of the traditional art of Ukraine - KROVETS, Ukraine - CC BY-NC-ND. https://www.europeana.eu/en/item/1413/KYD1791

AISTER Human-In-The-Loop Crowdsourcing Pilot

The repository contains the complete workflow, actionable code in notebooks and output files for:

  1. Automatically generating annotations (description tags) for artefacts on Europeana using AI tools (natural language processing, computer vision) and Europeana APIs, and
  2. Developing a human-in-the-loop crowdsourcing campaign on the CrowdHeritage platform to validate the annotations, also enabling participants to contribute additional user-generated annotations.

Pilot description

This case study forms a small-scale exploratory pilot within the field of Digital Humanities / Cultural Heritage Informatics, operationalising and analysing a human-in-the-loop (HITL) crowdsourcing framework for metadata enrichment in Europeana collections.

Context

The pilot is part of the AISTER project “AI-enabled Citizen Participation in University-driven Ukrainian Cultural Heritage Safeguarding”, under the Erasmus+ KA2, Higher Education sector programme. Learn more about the pilot project, the team, collaborating experts and partner institutions, and the workshops conducted: https://web2learn.eu/crowdsourcing-campaign-aister

Who can use the repository and how

The repository is designated as an open-source resource for digital humanities research, freely available for reproduction or reuse by scholars, students, and teachers. It is also intended for creative reuses.

Repository description

The pilot consists of 3 technical steps, each documented in its respective folder within this repository (step_1, step_2, step 3). Each step folder includes:

  • 2 text files detailing the full technical stack and a structured report for each step.
  • 2 subfolders: notebooks and outputs The notebooks folder contains executable code in Jupyter Notebook (.ipynb) format, while the outputs folder includes the resulting files generated from running the notebooks (CSV and JSON).

Workflow overview

Step 1: Annotation generation from text (NLP-based)

  • Retrieve textual metadata (e.g., titles, descriptions) of artefacts from the Europeana API
  • Generate annotations (description tags) from the metadata using NLP (NER) with spaCy

Step 2: Annotation generation from images (computer vision-based)

  • Download the artefacts as images
  • Generate image captions using computer vision
  • Generate annotations from the image captions

Step 3: Preparation for crowdsourcing (JSON-LD formatting)

  • Format all generated annotations based on the W3C annotation model for direct ingestion in the crowdsourcing platform

Applied heritage collection: The Krovets ethnographic collection

The Krovets ethnographic collection contains 3840 artefacts of Ukrainian traditional art and life spanning the 19th and 20th centuries. It is part of the Krovets Online Museum of Traditional Art of Ukraine (https://krovets.ua/en). The collection includes a variety of different artefacts, from everyday objects like utensils and clothing to folk art.
The pilot focuses on the folk paintings subset of the Krovets collection, which contains 312 folk art paintings depicting scenes from everyday rural life and religious themes. In some cases, they depict historical figures (e.g. Taras Shevchenko). The artefacts were collected as part of a private initiative to preserve Ukrainian cultural heritage. The folk art paintings can be explored as a public gallery on Europeana: https://www.europeana.eu/en/galleries/26106-ukrainian-folk-art

Data Infrastructures and Provenance

The institution providing the ethnographic collection and metadata is the Online Museum of the traditional art of Ukraine - KROVETS. The aggregator gathering the content and metadata is MUSEU. The platform for accessing the collection of 3840 artefacts and the folk art subcollection gallery of 312 artefacts is Europeana. The platform for running the crowdsourcing campaign is CrowdHeritage maintained by Datoptron.
The pilot is developed by Web2Learn, using Google Colab/Jupyter Notebook for writing executable code, GitHub for version control and code sharing, Zenodo open repository for long-term preservation of the dataset and related digital scholarship, and Tableau for data modelling and visual analytics. 5 crowdsourcing workshops and events in total were organised to perform human-in-the-loop tasks: 4 were organised by Web2Learn, 3 online and 1 onsite at the Library of the University of Latvia, and 1 online workhsop was organised by Young Folks and the Institute of Literature, Folklore and Art at the University of Latvia.

Technical Aspects

Step 1:

README for step 1: step_1/README.md The krovets_folk_metadata.csv file contains metadata for the 312 records concerning folk paintings and icons retrieved using the Europeana API such as title, description, creator and image URL. The generated krovets_folk_tags.csv contains the NLP-generated tags (figures, objects, scenes) for each of the 312 records.

Step 2:

README for step 2: step_2/README.md The captions.csv file contains captions generated for each record using machine vision. The tags.csv file contains all candidate tags generated from the image caption (figures, objects, scenes, background, attire, text, damage).

Step 3:

README for step 3: step_3/README.md In step 3, there are 6 separate CSV files, 5 capturing data from an event plus one combined csv file for all events. The files contain info about each user annotation, such as upvotes and downvotes and eventual approval or disapproval of the annotation depending on the difference between them.

In order to correctly display the CSV files and the registered records, it is suggested to use Libre or Open Office. The recommended setup for a properly display of the CSV file can be adjusted in the import popup window: 'Character set' to 'Unicode (UTF-8)', the separator options 'Tab', 'Comma' and 'Semicolon' should be selected and the 'Column type' of Column A (Record ID) should be changed from 'Standard' to 'Text'.

Analytical information about each CSV file:

KROVETS_FOLK_METADATA.CSV

Column name Description
europeana_id Item's id on Europeana
Title Item's title
Description A description about what is depicted in the record, materials, timespan, place of origin etc.
ImageLink The item's image URL
Creator Creator of the depicted artifact (if they're known)
Subject Ethnographical region of origin & Item's category
Type of Item Type of the depicted artefact (e.g. painting, clothing)
Medium Materials used for crafting the item
Providing Institution All items are provided by the Online Museum of the traditional art of Ukraine - KROVETS
Aggregator Items are gathered by MUSEU
Rights statement Rights statement for each record
Creation date Estimated timespan of the record (e.g. 20th century)
Places Regions-oblasts where items originate from
Identifier Record identifier on Europeana in the form of /{collection_id}/{item_id}
Is Part Of All items are part of 1413_KROVETS_Museum
Providing Country Country providing the record, in this case all items are from Ukraine
Collection Name Same as Is Part Of
First time published on Europeana Timestamp of when the item was first published on Europeana
Last time updated from providing institution Timestamp of when the item was last updated on Europeana by its provider

Metadata retrieved using the Europeana API endpoint for the specific set.

KROVETS_FOLK_TAGS.CSV

Column name Description
europeana_id Item's id on Europeana
Figures Persons that can be visibly seen in each painting
Objects Items such as materials that are depicted
Scenes Actions, context or anything in the background that is clearly depicted

CAPTIONS.CSV

Column name Description
image_id Id of the image's item
Caption Caption generated for each painting through machine vision

TAGS.CSV

Column name Description
image_id Id of the image's item
figures Person(s) visible on the image
objects Items depicted on the image
scenes Context of the image, scenery depicted
background What can be seen in the background
attire Clothing used by the persons depicted
text Any text depicted on the painting
damage Signs of damage on the artefact

*_UKRAINIAN-FOLKART-ANNOTATIONS.CSV

Column name Description
created Date of creation for the annotation
value Tag evaluated
europeana_id Id of the item possessing the tag
upvotes Amount of upvotes by users
downvotes Amount of downvotes by users
recommendation 'accept' if upvotes > downvotes else 'reject' (or 'unknown' if upvotes=downvotes)
event_number The event in which the annotation was made (only in the combined csv file)

ANNOTATIONS JSON FILES

Header name Description
@context Essential metadata for the json file
@graph The body of the json file, holding the following info

BODY FROM JSON FILES

Header name Description
type 'Annotation' in this case
created Date of creation + hour
creator Creator of the annotation (if recorded), usually an AI model
confidence Value in range 0-1 about the confidence with which the tag annotation was made
body Info about the content of the actual tag in question
target The item's id on Europeana
review Info about the amount of downvotes and upvotes the tag received and subsequent recommendation

Data Statistics

Step 1 and Step 2 annotation creation

Step 1 produced text-based annotations and Step 2 produced image-based annotations across 312 artefacts. After merging and deduplication, the combined set contains 3,927 annotations.

Source Annotations
Text-based annotations (Step 1) 893
Image-based annotations (Step 2, raw) 4,581
Combined annotations after merge/deduplication 3,927

Step 3 crowdsourcing

In Step 3, the annotations produced in the previous stages were presented on the crowdsourcing platform for participant validation, while expert participants could also contribute additional human annotations. The final Step 3 snapshot covers 311 artefacts and contains 5,946 annotations in total.

Annotation type Value
Software annotations 3,917
Human annotations 2,029
Total annotations 5,946

Note: one artefact from Steps 1 and 2 (KYD1699) is not present in the final Step 3 snapshot, which is why the software-annotation count is 3,917 rather than 3,927.

Participants submitted 51,952 votes in total, including 48,599 upvotes and 3,353 downvotes.

Voting metric Value
Upvotes 48,599
Downvotes 3,353
Total votes 51,952

The table below shows the number of annotations processed in each event, where processing refers to participant voting.

Event Value
Event 1 4,018
Event 2 742
Event 3 1,069
Event 4 117
Total processed annotations 5,946

License

This data repository is released under the Apache 2.0 licence. The Krovets ethnographic collection is published under a CC BY-NC-ND 4.0 licence. All accompanying metadata are released into the public domain using CC0, to be freely copied, modified, distributed and reused.

Code of Conduct

This GitHub repository of Web2Learn follows the Contributor Covenant to be transparent and open, welcoming all people to engage and contribute, and pledging in return to value them. To make our open communities welcoming, diverse, and inclusive, we are encouraging the adoption of a mindful code of conduct to express and share those values. Any unacceptable behaviour, such as trolling, insulting/derogatory comments, or personal or political attacks, will not be tolerated. The Contributor Covenant is released under the Creative Commons Attribution 4.0 International Public License.

Project attribution

Web2Learn pilot team:
Mariana Ziku, Lead Researcher
Andreas Kouzelis, Information Systems Εngineer
Andreas Darsaklis, IT trainee
Stefania Oikonomou, Research collaborator
Katerina Zourou, Director

Collaborators:
Yevgen Dmytruk, Director, Krovets Online-museum of Traditional Art of Ukraine
Eirini Kaldeli, Co-Founder & Software Engineer Datoptron, and CrowdHeritage
Hugo Manguinhas, Head of Engineering, Europeana Foundation

Workshop facilitators:
Sanita Reinsone, Associate Professor, Faculty of Humanities, University of Latvia
Lyudmyla Kruhlenko, Associate Professor, Pryazovskyi State Technical University
Anna Shilinh, Scientist and Assistant, Lviv Polytechnic National University
Olha Hapii, Stand with Ukraine Foundation
Alina Tsurkalenko
Ilze Ļaksa-Timinska, Researcher, Institute of Literature, Folklore and Art, University of Latvia
Uldis Zarinš, Deputy State Secretary, Ministry of Culture, Latvia
Konstantine Gagnidze, Senior Project Manager, Young Folks LV

Cite this dataset

When referring to or using the data repository in research publications and documentation, cite the dataset using its digital object identifier (DOI) on Zenodo. Citing the dataset from the HITL crowdsourcing pilot creates a mapping of attribution that supports future efforts to release other datasets. It also reduces the amount of "orphaned data," helping to retain source links.
Cite the repository as: Ziku, M., Kouzelis, A., Darsaklis, A., Oikonomou, S., & Zourou, K. (2026). Human-in-the-Loop Crowdsourced Annotation Dataset for Ukrainian Folk Art with Reproducible Jupyter Notebooks [Data set]. AISTER. https://doi.org/10.5281/zenodo.19475309

Acknowledgements

This README file adopts the structure of the KU Leuven Libraries Git-based dataset documentation, i.e., https://github.com/KU-Leuven-Libraries/Portraits-Collection-Dataset. See: KU Leuven Libraries, Digitisation Department. (2019). The Portraits Collection Dataset of KU Leuven Libraries, Special Collections (Version 01-beta2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3460785.

The documentation of the Jupyter Notebooks follows the criteria of quality assessment for Jupyter projects by GLAM institutions, as published in Candela, G., Chambers, S., & Sherratt, T. (2023). An approach to assess the quality of Jupyter projects published by GLAM institutions. Journal of the Association for Information Science and Technology, 74(13), 1550–1564. https://doi.org/10.1002/asi.24835

Disclaimer

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

About

Pilot implementing and analysing an experimental digital humanities workflow that combines AI tools (natural language proccessing, machine vision) and human-in-the-loop processes, to enhance the accessibility and discoverability of Ukrainian ethnographic heritage by improving the quality of descriptive metadata.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors