
The repository contains the complete workflow, actionable code in notebooks and output files for:
- Automatically generating annotations (description tags) for artefacts on Europeana using AI tools (natural language processing, computer vision) and Europeana APIs, and
- Developing a human-in-the-loop crowdsourcing campaign on the CrowdHeritage platform to validate the annotations, also enabling participants to contribute additional user-generated annotations.
This case study forms a small-scale exploratory pilot within the field of Digital Humanities / Cultural Heritage Informatics, operationalising and analysing a human-in-the-loop (HITL) crowdsourcing framework for metadata enrichment in Europeana collections.
The pilot is part of the AISTER project “AI-enabled Citizen Participation in University-driven Ukrainian Cultural Heritage Safeguarding”, under the Erasmus+ KA2, Higher Education sector programme. Learn more about the pilot project, the team, collaborating experts and partner institutions, and the workshops conducted: https://web2learn.eu/crowdsourcing-campaign-aister
The repository is designated as an open-source resource for digital humanities research, freely available for reproduction or reuse by scholars, students, and teachers. It is also intended for creative reuses.
The pilot consists of 3 technical steps, each documented in its respective folder within this repository (step_1, step_2, step 3). Each step folder includes:
- 2 text files detailing the full technical stack and a structured report for each step.
- 2 subfolders: notebooks and outputs The notebooks folder contains executable code in Jupyter Notebook (.ipynb) format, while the outputs folder includes the resulting files generated from running the notebooks (CSV and JSON).
Step 1: Annotation generation from text (NLP-based)
- Retrieve textual metadata (e.g., titles, descriptions) of artefacts from the Europeana API
- Generate annotations (description tags) from the metadata using NLP (NER) with spaCy
Step 2: Annotation generation from images (computer vision-based)
- Download the artefacts as images
- Generate image captions using computer vision
- Generate annotations from the image captions
Step 3: Preparation for crowdsourcing (JSON-LD formatting)
- Format all generated annotations based on the W3C annotation model for direct ingestion in the crowdsourcing platform
The Krovets ethnographic collection contains 3840 artefacts of Ukrainian traditional art and life spanning the 19th and 20th centuries. It is part of the Krovets Online Museum of Traditional Art of Ukraine (https://krovets.ua/en). The collection includes a variety of different artefacts, from everyday objects like utensils and clothing to folk art.
The pilot focuses on the folk paintings subset of the Krovets collection, which contains 312 folk art paintings depicting scenes from everyday rural life and religious themes. In some cases, they depict historical figures (e.g. Taras Shevchenko). The artefacts were collected as part of a private initiative to preserve Ukrainian cultural heritage.
The folk art paintings can be explored as a public gallery on Europeana: https://www.europeana.eu/en/galleries/26106-ukrainian-folk-art
The institution providing the ethnographic collection and metadata is the Online Museum of the traditional art of Ukraine - KROVETS. The aggregator gathering the content and metadata is MUSEU. The platform for accessing the collection of 3840 artefacts and the folk art subcollection gallery of 312 artefacts is Europeana. The platform for running the crowdsourcing campaign is CrowdHeritage maintained by Datoptron.
The pilot is developed by Web2Learn, using Google Colab/Jupyter Notebook for writing executable code, GitHub for version control and code sharing, Zenodo open repository for long-term preservation of the dataset and related digital scholarship, and Tableau for data modelling and visual analytics.
5 crowdsourcing workshops and events in total were organised to perform human-in-the-loop tasks: 4 were organised by Web2Learn, 3 online and 1 onsite at the Library of the University of Latvia, and 1 online workhsop was organised by Young Folks and the Institute of Literature, Folklore and Art at the University of Latvia.
README for step 1: step_1/README.md
The krovets_folk_metadata.csv file contains metadata for the 312 records concerning folk paintings and icons retrieved using the Europeana API such as title, description, creator and image URL. The generated krovets_folk_tags.csv contains the NLP-generated tags (figures, objects, scenes) for each of the 312 records.
README for step 2: step_2/README.md
The captions.csv file contains captions generated for each record using machine vision. The tags.csv file contains all candidate tags generated from the image caption (figures, objects, scenes, background, attire, text, damage).
README for step 3: step_3/README.md
In step 3, there are 6 separate CSV files, 5 capturing data from an event plus one combined csv file for all events. The files contain info about each user annotation, such as upvotes and downvotes and eventual approval or disapproval of the annotation depending on the difference between them.
In order to correctly display the CSV files and the registered records, it is suggested to use Libre or Open Office. The recommended setup for a properly display of the CSV file can be adjusted in the import popup window: 'Character set' to 'Unicode (UTF-8)', the separator options 'Tab', 'Comma' and 'Semicolon' should be selected and the 'Column type' of Column A (Record ID) should be changed from 'Standard' to 'Text'.
KROVETS_FOLK_METADATA.CSV
| Column name | Description |
|---|---|
| europeana_id | Item's id on Europeana |
| Title | Item's title |
| Description | A description about what is depicted in the record, materials, timespan, place of origin etc. |
| ImageLink | The item's image URL |
| Creator | Creator of the depicted artifact (if they're known) |
| Subject | Ethnographical region of origin & Item's category |
| Type of Item | Type of the depicted artefact (e.g. painting, clothing) |
| Medium | Materials used for crafting the item |
| Providing Institution | All items are provided by the Online Museum of the traditional art of Ukraine - KROVETS |
| Aggregator | Items are gathered by MUSEU |
| Rights statement | Rights statement for each record |
| Creation date | Estimated timespan of the record (e.g. 20th century) |
| Places | Regions-oblasts where items originate from |
| Identifier | Record identifier on Europeana in the form of /{collection_id}/{item_id} |
| Is Part Of | All items are part of 1413_KROVETS_Museum |
| Providing Country | Country providing the record, in this case all items are from Ukraine |
| Collection Name | Same as Is Part Of |
| First time published on Europeana | Timestamp of when the item was first published on Europeana |
| Last time updated from providing institution | Timestamp of when the item was last updated on Europeana by its provider |
Metadata retrieved using the Europeana API endpoint for the specific set.
KROVETS_FOLK_TAGS.CSV
| Column name | Description |
|---|---|
| europeana_id | Item's id on Europeana |
| Figures | Persons that can be visibly seen in each painting |
| Objects | Items such as materials that are depicted |
| Scenes | Actions, context or anything in the background that is clearly depicted |
CAPTIONS.CSV
| Column name | Description |
|---|---|
| image_id | Id of the image's item |
| Caption | Caption generated for each painting through machine vision |
TAGS.CSV
| Column name | Description |
|---|---|
| image_id | Id of the image's item |
| figures | Person(s) visible on the image |
| objects | Items depicted on the image |
| scenes | Context of the image, scenery depicted |
| background | What can be seen in the background |
| attire | Clothing used by the persons depicted |
| text | Any text depicted on the painting |
| damage | Signs of damage on the artefact |
*_UKRAINIAN-FOLKART-ANNOTATIONS.CSV
| Column name | Description |
|---|---|
| created | Date of creation for the annotation |
| value | Tag evaluated |
| europeana_id | Id of the item possessing the tag |
| upvotes | Amount of upvotes by users |
| downvotes | Amount of downvotes by users |
| recommendation | 'accept' if upvotes > downvotes else 'reject' (or 'unknown' if upvotes=downvotes) |
| event_number | The event in which the annotation was made (only in the combined csv file) |
ANNOTATIONS JSON FILES
| Header name | Description |
|---|---|
| @context | Essential metadata for the json file |
| @graph | The body of the json file, holding the following info |
BODY FROM JSON FILES
| Header name | Description |
|---|---|
| type | 'Annotation' in this case |
| created | Date of creation + hour |
| creator | Creator of the annotation (if recorded), usually an AI model |
| confidence | Value in range 0-1 about the confidence with which the tag annotation was made |
| body | Info about the content of the actual tag in question |
| target | The item's id on Europeana |
| review | Info about the amount of downvotes and upvotes the tag received and subsequent recommendation |
Step 1 produced text-based annotations and Step 2 produced image-based annotations across 312 artefacts. After merging and deduplication, the combined set contains 3,927 annotations.
| Source | Annotations |
|---|---|
| Text-based annotations (Step 1) | 893 |
| Image-based annotations (Step 2, raw) | 4,581 |
| Combined annotations after merge/deduplication | 3,927 |
In Step 3, the annotations produced in the previous stages were presented on the crowdsourcing platform for participant validation, while expert participants could also contribute additional human annotations. The final Step 3 snapshot covers 311 artefacts and contains 5,946 annotations in total.
| Annotation type | Value |
|---|---|
| Software annotations | 3,917 |
| Human annotations | 2,029 |
| Total annotations | 5,946 |
Note: one artefact from Steps 1 and 2 (KYD1699) is not present in the final Step 3 snapshot, which is why the software-annotation count is 3,917 rather than 3,927.
Participants submitted 51,952 votes in total, including 48,599 upvotes and 3,353 downvotes.
| Voting metric | Value |
|---|---|
| Upvotes | 48,599 |
| Downvotes | 3,353 |
| Total votes | 51,952 |
The table below shows the number of annotations processed in each event, where processing refers to participant voting.
| Event | Value |
|---|---|
| Event 1 | 4,018 |
| Event 2 | 742 |
| Event 3 | 1,069 |
| Event 4 | 117 |
| Total processed annotations | 5,946 |
This data repository is released under the Apache 2.0 licence. The Krovets ethnographic collection is published under a CC BY-NC-ND 4.0 licence. All accompanying metadata are released into the public domain using CC0, to be freely copied, modified, distributed and reused.
This GitHub repository of Web2Learn follows the Contributor Covenant to be transparent and open, welcoming all people to engage and contribute, and pledging in return to value them. To make our open communities welcoming, diverse, and inclusive, we are encouraging the adoption of a mindful code of conduct to express and share those values. Any unacceptable behaviour, such as trolling, insulting/derogatory comments, or personal or political attacks, will not be tolerated. The Contributor Covenant is released under the Creative Commons Attribution 4.0 International Public License.
Web2Learn pilot team:
Mariana Ziku, Lead Researcher
Andreas Kouzelis, Information Systems Εngineer
Andreas Darsaklis, IT trainee
Stefania Oikonomou, Research collaborator
Katerina Zourou, Director
Collaborators:
Yevgen Dmytruk, Director, Krovets Online-museum of Traditional Art of Ukraine
Eirini Kaldeli, Co-Founder & Software Engineer Datoptron, and CrowdHeritage
Hugo Manguinhas, Head of Engineering, Europeana Foundation
Workshop facilitators:
Sanita Reinsone, Associate Professor, Faculty of Humanities, University of Latvia
Lyudmyla Kruhlenko, Associate Professor, Pryazovskyi State Technical University
Anna Shilinh, Scientist and Assistant, Lviv Polytechnic National University
Olha Hapii, Stand with Ukraine Foundation
Alina Tsurkalenko
Ilze Ļaksa-Timinska, Researcher, Institute of Literature, Folklore and Art, University of Latvia
Uldis Zarinš, Deputy State Secretary, Ministry of Culture, Latvia
Konstantine Gagnidze, Senior Project Manager, Young Folks LV
When referring to or using the data repository in research publications and documentation, cite the dataset using its digital object identifier (DOI) on Zenodo. Citing the dataset from the HITL crowdsourcing pilot creates a mapping of attribution that supports future efforts to release other datasets. It also reduces the amount of "orphaned data," helping to retain source links.
Cite the repository as:
Ziku, M., Kouzelis, A., Darsaklis, A., Oikonomou, S., & Zourou, K. (2026). Human-in-the-Loop Crowdsourced Annotation Dataset for Ukrainian Folk Art with Reproducible Jupyter Notebooks [Data set]. AISTER. https://doi.org/10.5281/zenodo.19475309
This README file adopts the structure of the KU Leuven Libraries Git-based dataset documentation, i.e., https://github.com/KU-Leuven-Libraries/Portraits-Collection-Dataset. See: KU Leuven Libraries, Digitisation Department. (2019). The Portraits Collection Dataset of KU Leuven Libraries, Special Collections (Version 01-beta2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3460785.
The documentation of the Jupyter Notebooks follows the criteria of quality assessment for Jupyter projects by GLAM institutions, as published in Candela, G., Chambers, S., & Sherratt, T. (2023). An approach to assess the quality of Jupyter projects published by GLAM institutions. Journal of the Association for Information Science and Technology, 74(13), 1550–1564. https://doi.org/10.1002/asi.24835
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.