This repository provides the yProv4WFs extension for openEO, which adds provenance tracking to openEO workflows.
It extends the process graph parser (openeo-pg-parser-networkx) to record detailed information about each workflow step - including inputs, outputs, parameters, and execution time - without changing how openEO works.
It demonstrates how the openEO process graph parser (openeo-pg-parser-networkx) can be extended to capture detailed workflow-level provenance using the yProv ecosystem.
The collected data follows the W3C PROV standard and can be visualized through the yProv ecosystem, using tools like yProvExplorer.
yProv4WFs is a provenance tracking library built upon the W3C PROV standard.
It captures retrospective provenance and what actually happened during execution of a workflow.
In this integration, the provenance hooks are embedded into the graph.py of openeo-pg-parser-networkx,
allowing openEO to record detailed metadata about each executed process node.
When a workflow is defined and executed in openEO (e.g., via the Python client):
- The workflow is serialized as a process graph (JSON).
openeo-pg-parser-networkxparses it into a NetworkX Directed Acyclic Graph (DAG).- The modified
graph.pytriggers yProv4WFs calls during node traversal and execution. - Metadata about each activity (start time, end time, inputs, outputs) is captured.
- A provenance document (JSON) compliant with W3C PROV is created.
- This document can then be visualized interactively using yProvExplorer.
Make sure you have Python ≥ 3.8 and install the following dependencies:
pip install openeo-pg-parser-networkx>=2023.5.1
pip install openeo-processes-dask>=2023.7.1
pip install yprov4wfs==0.0.8Follow these steps to integrate yProv4WFs into your local openEO setup.
Find the folder where the parser module is installed: openeo-pg-parser-networkx/openeo_pg_parser_networkx/
Copy the updated file from this repository and replace the existing one:
cp path/to/yProv4WFs-openEO/graph.py path/to/openeo-pg-parser-networkx/openeo_pg_parser_networkx/graph.pBy default, provenance tracking runs automatically, but the provenance JSON file might not be saved to disk unless explicitly enabled.
If you want to directly save and inspect the provenance output after each workflow execution, you can uncomment the following lines in the code:
# To save the provenance
# timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# save_path = os.path.join(os.getcwd(), f"run_{timestamp}")
# print(f"Provenance file saved to: {save_path}")
# os.makedirs(save_path, exist_ok=True)
# self.workflow.prov_to_json(directory_path=save_path)After uncommenting, each workflow run will create a new directory like:
run_20251103_134520/
└── provenance.jsonThis implementation is based on the concepts and architecture presented in the following paper:
H. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore
Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study
In Proceedings of the 2025 IEEE International Conference on eScience (eScience), pp. 58–66.
DOI: 10.1109/eScience65000.2025.00016