Skip to content

HPCI-Lab/yProv4WFs-openEO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

yProv4WFs-openEO

This repository provides the yProv4WFs extension for openEO, which adds provenance tracking to openEO workflows. It extends the process graph parser (openeo-pg-parser-networkx) to record detailed information about each workflow step - including inputs, outputs, parameters, and execution time - without changing how openEO works. It demonstrates how the openEO process graph parser (openeo-pg-parser-networkx) can be extended to capture detailed workflow-level provenance using the yProv ecosystem. The collected data follows the W3C PROV standard and can be visualized through the yProv ecosystem, using tools like yProvExplorer.

Overview

yProv4WFs is a provenance tracking library built upon the W3C PROV standard.
It captures retrospective provenance and what actually happened during execution of a workflow. In this integration, the provenance hooks are embedded into the graph.py of openeo-pg-parser-networkx,
allowing openEO to record detailed metadata about each executed process node.


How It Works

When a workflow is defined and executed in openEO (e.g., via the Python client):

  1. The workflow is serialized as a process graph (JSON).
  2. openeo-pg-parser-networkx parses it into a NetworkX Directed Acyclic Graph (DAG).
  3. The modified graph.py triggers yProv4WFs calls during node traversal and execution.
  4. Metadata about each activity (start time, end time, inputs, outputs) is captured.
  5. A provenance document (JSON) compliant with W3C PROV is created.
  6. This document can then be visualized interactively using yProvExplorer.

Installation

Make sure you have Python ≥ 3.8 and install the following dependencies:

pip install openeo-pg-parser-networkx>=2023.5.1
pip install openeo-processes-dask>=2023.7.1
pip install yprov4wfs==0.0.8

Integration Steps

Follow these steps to integrate yProv4WFs into your local openEO setup.

1. Locate the openEO parser directory

Find the folder where the parser module is installed: openeo-pg-parser-networkx/openeo_pg_parser_networkx/

2. Replace the original graph.py

Copy the updated file from this repository and replace the existing one:

cp path/to/yProv4WFs-openEO/graph.py path/to/openeo-pg-parser-networkx/openeo_pg_parser_networkx/graph.p

Saving the Provenance File Locally

By default, provenance tracking runs automatically, but the provenance JSON file might not be saved to disk unless explicitly enabled.
If you want to directly save and inspect the provenance output after each workflow execution, you can uncomment the following lines in the code:

# To save the provenance
# timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# save_path = os.path.join(os.getcwd(), f"run_{timestamp}")
# print(f"Provenance file saved to: {save_path}")
# os.makedirs(save_path, exist_ok=True)
# self.workflow.prov_to_json(directory_path=save_path)

After uncommenting, each workflow run will create a new directory like:

run_20251103_134520/
 └── provenance.json

Research Reference and Implementation Source

This implementation is based on the concepts and architecture presented in the following paper:

H. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore
Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study
In Proceedings of the 2025 IEEE International Conference on eScience (eScience), pp. 58–66.
DOI: 10.1109/eScience65000.2025.00016

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages