AISTER Human-In-The-Loop Crowdsourcing Pilot

_{Folk Painting "Portrait of a girl" by unknown - Online Museum of the traditional art of Ukraine - KROVETS, Ukraine - CC BY-NC-ND. https://www.europeana.eu/en/item/1413/KYD1791}

AISTER Human-In-The-Loop Crowdsourcing Pilot

The repository contains the complete workflow, actionable code in notebooks and output files for:

Automatically generating annotations (description tags) for artefacts on Europeana using AI tools (natural language processing, computer vision) and Europeana APIs, and
Developing a human-in-the-loop crowdsourcing campaign on the CrowdHeritage platform to validate the annotations, also enabling participants to contribute additional user-generated annotations.

Pilot description

This case study forms a small-scale exploratory pilot within the field of Digital Humanities / Cultural Heritage Informatics, operationalising and analysing a human-in-the-loop (HITL) crowdsourcing framework for metadata enrichment in Europeana collections.

Context

The pilot is part of the AISTER project “AI-enabled Citizen Participation in University-driven Ukrainian Cultural Heritage Safeguarding”, under the Erasmus+ KA2, Higher Education sector programme. Learn more about the pilot project, the team, collaborating experts and partner institutions, and the workshops conducted: https://web2learn.eu/crowdsourcing-campaign-aister

Who can use the repository and how

The repository is designated as an open-source resource for digital humanities research, freely available for reproduction or reuse by scholars, students, and teachers. It is also intended for creative reuses.

Repository description

The pilot consists of 3 technical steps, each documented in its respective folder within this repository (step_1, step_2, step 3). Each step folder includes:

2 text files detailing the full technical stack and a structured report for each step.
2 subfolders: notebooks and outputs The notebooks folder contains executable code in Jupyter Notebook (.ipynb) format, while the outputs folder includes the resulting files generated from running the notebooks (CSV and JSON).

Workflow overview

Step 1: Annotation generation from text (NLP-based)

Retrieve textual metadata (e.g., titles, descriptions) of artefacts from the Europeana API
Generate annotations (description tags) from the metadata using NLP (NER) with spaCy

Step 2: Annotation generation from images (computer vision-based)

Download the artefacts as images
Generate image captions using computer vision
Generate annotations from the image captions

Step 3: Preparation for crowdsourcing (JSON-LD formatting)

Format all generated annotations based on the W3C annotation model for direct ingestion in the crowdsourcing platform

Applied heritage collection: The Krovets ethnographic collection

The Krovets ethnographic collection contains 3840 artefacts of Ukrainian traditional art and life spanning the 19th and 20th centuries. It is part of the Krovets Online Museum of Traditional Art of Ukraine (https://krovets.ua/en). The collection includes a variety of different artefacts, from everyday objects like utensils and clothing to folk art.
The pilot focuses on the folk paintings subset of the Krovets collection, which contains 312 folk art paintings depicting scenes from everyday rural life and religious themes. In some cases, they depict historical figures (e.g. Taras Shevchenko). The artefacts were collected as part of a private initiative to preserve Ukrainian cultural heritage. The folk art paintings can be explored as a public gallery on Europeana: https://www.europeana.eu/en/galleries/26106-ukrainian-folk-art

Data Infrastructures and Provenance

The institution providing the ethnographic collection and metadata is the Online Museum of the traditional art of Ukraine - KROVETS. The aggregator gathering the content and metadata is MUSEU. The platform for accessing the collection of 3840 artefacts and the folk art subcollection gallery of 312 artefacts is Europeana. The platform for running the crowdsourcing campaign is CrowdHeritage maintained by Datoptron.
The pilot is developed by Web2Learn, using Google Colab/Jupyter Notebook for writing executable code, GitHub for version control and code sharing, Zenodo open repository for long-term preservation of the dataset and related digital scholarship, and Tableau for data modelling and visual analytics. 5 crowdsourcing workshops and events in total were organised to perform human-in-the-loop tasks: 4 were organised by Web2Learn, 3 online and 1 onsite at the Library of the University of Latvia, and 1 online workhsop was organised by Young Folks and the Institute of Literature, Folklore and Art at the University of Latvia.

Technical Aspects

Step 1:

README for step 1: step_1/README.md The krovets_folk_metadata.csv file contains metadata for the 312 records concerning folk paintings and icons retrieved using the Europeana API such as title, description, creator and image URL. The generated krovets_folk_tags.csv contains the NLP-generated tags (figures, objects, scenes) for each of the 312 records.

Step 2:

README for step 2: step_2/README.md The captions.csv file contains captions generated for each record using machine vision. The tags.csv file contains all candidate tags generated from the image caption (figures, objects, scenes, background, attire, text, damage).

Step 3:

README for step 3: step_3/README.md In step 3, there are 6 separate CSV files, 5 capturing data from an event plus one combined csv file for all events. The files contain info about each user annotation, such as upvotes and downvotes and eventual approval or disapproval of the annotation depending on the difference between them.

In order to correctly display the CSV files and the registered records, it is suggested to use Libre or Open Office. The recommended setup for a properly display of the CSV file can be adjusted in the import popup window: 'Character set' to 'Unicode (UTF-8)', the separator options 'Tab', 'Comma' and 'Semicolon' should be selected and the 'Column type' of Column A (Record ID) should be changed from 'Standard' to 'Text'.

Analytical information about each CSV file:

KROVETS_FOLK_METADATA.CSV

Column name	Description
europeana_id	Item's id on Europeana
Title	Item's title
Description	A description about what is depicted in the record, materials, timespan, place of origin etc.
ImageLink	The item's image URL
Creator	Creator of the depicted artifact (if they're known)
Subject	Ethnographical region of origin & Item's category
Type of Item	Type of the depicted artefact (e.g. painting, clothing)
Medium	Materials used for crafting the item
Providing Institution	All items are provided by the Online Museum of the traditional art of Ukraine - KROVETS
Aggregator	Items are gathered by MUSEU
Rights statement	Rights statement for each record
Creation date	Estimated timespan of the record (e.g. 20th century)
Places	Regions-oblasts where items originate from
Identifier	Record identifier on Europeana in the form of /{collection_id}/{item_id}
Is Part Of	All items are part of 1413_KROVETS_Museum
Providing Country	Country providing the record, in this case all items are from Ukraine
Collection Name	Same as Is Part Of
First time published on Europeana	Timestamp of when the item was first published on Europeana
Last time updated from providing institution	Timestamp of when the item was last updated on Europeana by its provider

Metadata retrieved using the Europeana API endpoint for the specific set.

KROVETS_FOLK_TAGS.CSV

Column name	Description
europeana_id	Item's id on Europeana
Figures	Persons that can be visibly seen in each painting
Objects	Items such as materials that are depicted
Scenes	Actions, context or anything in the background that is clearly depicted

CAPTIONS.CSV

Column name	Description
image_id	Id of the image's item
Caption	Caption generated for each painting through machine vision

TAGS.CSV

Column name	Description
image_id	Id of the image's item
figures	Person(s) visible on the image
objects	Items depicted on the image
scenes	Context of the image, scenery depicted
background	What can be seen in the background
attire	Clothing used by the persons depicted
text	Any text depicted on the painting
damage	Signs of damage on the artefact

*_UKRAINIAN-FOLKART-ANNOTATIONS.CSV

Column name	Description
created	Date of creation for the annotation
value	Tag evaluated
europeana_id	Id of the item possessing the tag
upvotes	Amount of upvotes by users
downvotes	Amount of downvotes by users
recommendation	'accept' if upvotes > downvotes else 'reject' (or 'unknown' if upvotes=downvotes)
event_number	The event in which the annotation was made (only in the combined csv file)

ANNOTATIONS JSON FILES

Header name	Description
@context	Essential metadata for the json file
@graph	The body of the json file, holding the following info

BODY FROM JSON FILES

Header name	Description
type	'Annotation' in this case
created	Date of creation + hour
creator	Creator of the annotation (if recorded), usually an AI model
confidence	Value in range 0-1 about the confidence with which the tag annotation was made
body	Info about the content of the actual tag in question
target	The item's id on Europeana
review	Info about the amount of downvotes and upvotes the tag received and subsequent recommendation

Data Statistics

Step 1 and Step 2 annotation creation

Step 1 produced text-based annotations and Step 2 produced image-based annotations across 312 artefacts. After merging and deduplication, the combined set contains 3,927 annotations.

Source	Annotations
Text-based annotations (Step 1)	893
Image-based annotations (Step 2, raw)	4,581
Combined annotations after merge/deduplication	3,927

Step 3 crowdsourcing

In Step 3, the annotations produced in the previous stages were presented on the crowdsourcing platform for participant validation, while expert participants could also contribute additional human annotations. The final Step 3 snapshot covers 311 artefacts and contains 5,946 annotations in total.

Annotation type	Value
Software annotations	3,917
Human annotations	2,029
Total annotations	5,946

Note: one artefact from Steps 1 and 2 (KYD1699) is not present in the final Step 3 snapshot, which is why the software-annotation count is 3,917 rather than 3,927.

Participants submitted 51,952 votes in total, including 48,599 upvotes and 3,353 downvotes.

Voting metric	Value
Upvotes	48,599
Downvotes	3,353
Total votes	51,952

The table below shows the number of annotations processed in each event, where processing refers to participant voting.

Event	Value
Event 1	4,018
Event 2	742
Event 3	1,069
Event 4	117
Total processed annotations	5,946

License

This data repository is released under the Apache 2.0 licence. The Krovets ethnographic collection is published under a CC BY-NC-ND 4.0 licence. All accompanying metadata are released into the public domain using CC0, to be freely copied, modified, distributed and reused.

Code of Conduct

This GitHub repository of Web2Learn follows the Contributor Covenant to be transparent and open, welcoming all people to engage and contribute, and pledging in return to value them. To make our open communities welcoming, diverse, and inclusive, we are encouraging the adoption of a mindful code of conduct to express and share those values. Any unacceptable behaviour, such as trolling, insulting/derogatory comments, or personal or political attacks, will not be tolerated. The Contributor Covenant is released under the Creative Commons Attribution 4.0 International Public License.

Project attribution

Web2Learn pilot team:
Mariana Ziku, Lead Researcher
Andreas Kouzelis, Information Systems Εngineer
Andreas Darsaklis, IT trainee
Stefania Oikonomou, Research collaborator
Katerina Zourou, Director

Collaborators:
Yevgen Dmytruk, Director, Krovets Online-museum of Traditional Art of Ukraine
Eirini Kaldeli, Co-Founder & Software Engineer Datoptron, and CrowdHeritage
Hugo Manguinhas, Head of Engineering, Europeana Foundation

Workshop facilitators:
Sanita Reinsone, Associate Professor, Faculty of Humanities, University of Latvia
Lyudmyla Kruhlenko, Associate Professor, Pryazovskyi State Technical University
Anna Shilinh, Scientist and Assistant, Lviv Polytechnic National University
Olha Hapii, Stand with Ukraine Foundation
Alina Tsurkalenko
Ilze Ļaksa-Timinska, Researcher, Institute of Literature, Folklore and Art, University of Latvia
Uldis Zarinš, Deputy State Secretary, Ministry of Culture, Latvia
Konstantine Gagnidze, Senior Project Manager, Young Folks LV

Cite this dataset

When referring to or using the data repository in research publications and documentation, cite the dataset using its digital object identifier (DOI) on Zenodo. Citing the dataset from the HITL crowdsourcing pilot creates a mapping of attribution that supports future efforts to release other datasets. It also reduces the amount of "orphaned data," helping to retain source links.
Cite the repository as: Ziku, M., Kouzelis, A., Darsaklis, A., Oikonomou, S., & Zourou, K. (2026). Human-in-the-Loop Crowdsourced Annotation Dataset for Ukrainian Folk Art with Reproducible Jupyter Notebooks [Data set]. AISTER. https://doi.org/10.5281/zenodo.19475309

Acknowledgements

This README file adopts the structure of the KU Leuven Libraries Git-based dataset documentation, i.e., https://github.com/KU-Leuven-Libraries/Portraits-Collection-Dataset. See: KU Leuven Libraries, Digitisation Department. (2019). The Portraits Collection Dataset of KU Leuven Libraries, Special Collections (Version 01-beta2) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3460785.

The documentation of the Jupyter Notebooks follows the criteria of quality assessment for Jupyter projects by GLAM institutions, as published in Candela, G., Chambers, S., & Sherratt, T. (2023). An approach to assess the quality of Jupyter projects published by GLAM institutions. Journal of the Association for Information Science and Technology, 74(13), 1550–1564. https://doi.org/10.1002/asi.24835

Disclaimer

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AISTER Human-In-The-Loop Crowdsourcing Pilot

Pilot description

Context

Who can use the repository and how

Repository description

Workflow overview

Applied heritage collection: The Krovets ethnographic collection

Data Infrastructures and Provenance

Technical Aspects

Step 1:

Step 2:

Step 3:

Analytical information about each CSV file:

Data Statistics

Step 1 and Step 2 annotation creation

Step 3 crowdsourcing

License

Code of Conduct

Project attribution

Cite this dataset

Acknowledgements

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
step_1		step_1
step_2		step_2
step_3		step_3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AISTER Human-In-The-Loop Crowdsourcing Pilot

Pilot description

Context

Who can use the repository and how

Repository description

Workflow overview

Applied heritage collection: The Krovets ethnographic collection

Data Infrastructures and Provenance

Technical Aspects

Step 1:

Step 2:

Step 3:

Analytical information about each CSV file:

Data Statistics

Step 1 and Step 2 annotation creation

Step 3 crowdsourcing

License

Code of Conduct

Project attribution

Cite this dataset

Acknowledgements

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages