KEP-897: Propose centralized experiment tracking in Kubeflow #892

mprahl · 2025-08-01T19:13:04Z

GitHub issue: #897

This proposal aims to resolve the current fragmented and limited experiment tracking experience by expanding the
Kubeflow Model Registry into a unified, centralized metadata store. Currently, experiment tracking is scattered
across components like Kubeflow Pipelines (which requires pipeline execution for tracking) and Katib (limited to
hyperparameter tuning). This leads to challenges such as limited flexibility for direct logging from Python scripts
or Jupyter notebooks, a fragmented user experience across multiple interfaces, and maintenance difficulties due
to reliance on the inactive MLMD project.

google-oss-prow · 2025-08-01T19:13:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign juliusvonkohout for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tarilabs · 2025-08-05T06:22:26Z

proposals/892-experiment-tracking/README.md

+This proposal introduces an optional/configurable authorization mode for Model Registry that leverages Kubernetes
+subject access review without adding custom resource definitions or modifying the existing API concepts. Instead, it
+maps Kubernetes RBAC concepts to the existing REST API entities at the namespace level.


I'm supportive of implementation solution(s) which do not impose in MR to mix in code multiple cross-cutting concerns, ie "business logic" (data transformation and queries) and "authorizations" (tenancy) 👍
It is a general enough driving principle and best-practice, but thought I'd add it here as well :)

Yes, we need to keep the multi tenancy.

andreyvelich · 2025-08-08T18:59:18Z

Thank you for driving this @mprahl! Please can you create a tracking issue under kubeflow/community, so you can get the KEP number ?

It would be also good to also mention the history as I mentioned here: #783

As previously discussed in

andreyvelich

cc WGs to review
@kubeflow/wg-pipeline-leads @kubeflow/wg-data-leads @kubeflow/wg-automl-leads @kubeflow/wg-data-leads @kubeflow/kubeflow-steering-committee @kubeflow/wg-manifests-leads @kubeflow/wg-notebooks-leads

andreyvelich · 2025-08-08T19:04:36Z

proposals/892-experiment-tracking/README.md

+
+This proposal aims to resolve the current fragmented and limited experiment tracking experience by expanding the
+**Kubeflow Model Registry** into a unified, centralized metadata store. Currently, experiment tracking is scattered
+across components like Kubeflow Pipelines (which requires pipeline execution for tracking) and Katib (limited to


I am interested whether we should define the concept of Kubeflow Experiment outside of KFP and Katib ?
E.g. Experiment which sits on top of TrainJob, Katib Jobs, Pipelines, Spark Jobs.

I think we are on the same page. The overall goal is that an experiment in Model Registry could contain a mix of experiment runs from Pipelines, TrainJob, SDK, etc. Each run would indicate the source of where they came from, but the experiments are not exclusive to a particular source.

When Pipelines is integrated in this mode, it wouldn't have its own experiments in its database like today. Instead it would pull the list of available ones from Model Registry. We'd still keep the existing Pipelines experiments concept for standalone installations of Pipelines though.

andreyvelich · 2025-08-08T19:05:03Z

proposals/892-experiment-tracking/README.md

+
+The proposal tackles these issues by:
+
+- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts


Would it be scope of Model Registry or we need another Kubeflow project for it?
If we decide to use Model Registry for it, we might want to find another name of this project.
cc @kubeflow/wg-data-leads

+1, in some regards we are seeing the need of a "AI Asset Registry", but I'm also favourable of simply "Kubeflow Registry"

i would stick with model registry

I like the idea of Kubeflow Registry, since it makes very clear for users that this project is designed for metadata storage, and not only for model artifacts. Alternatively, we can name it: Kubeflow Tracker/Tracking, or find better name.

I think the plan is that kubeflow model registry is mlflow compatible and we can then use the mlflow sdk also for pipelines, trainer etc. and get rid of ml-metadata CC @franciscojavierarceo

BTW, the name Model Registry is still valid since experiment tracking is just part of the Model lifecycle and needs to be stored in Model Registry to be able to track model lineage.

I think, we should align on what does Experiment mean in AI lifecycle ?
For example, if we consider any type of MLOps activity as Experiment (e.g. data preparation, training, HPOs, evals), we might need to find better name.

we could also consider a rename such as "Kubeflow registry" (or anything else which we find name convergence) after this work is completed so that, given more capabilities are added/will be added (like the catalog, experiment, etc) we will have more data and make a better informed decision.

this is also with the other hat-on-head that with *our present focus on Graduation, making a name change at this time, might be also strategically challenging.

cc @mprahl @andreyvelich wdyt?

I'm fine with postponing the renaming conversation. We can clarify through documentation and the UI as part of the delivery.

Sure, we can rename it after the CNCF graduation, and include it as a Northstar in this KEP.
I just would like us to align on the future project scope and goals.

andreyvelich · 2025-08-08T19:05:46Z

proposals/892-experiment-tracking/README.md

+
+- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts
+  across all Kubeflow components.
+- Providing **MLFlow SDK compatibility**, enabling users to leverage familiar APIs while storing data in Kubeflow.


Why MLFlow and not Weight & Biases: https://wandb.ai/site/ ?
I found that these days more and more organizations adopt it.

@andreyvelich I do really like Weight & Biases but MLflow has a plugin architecture that allows the SDK to use a different backend (Model Registry in this case). So we could add integrations with Weight & Biases (e.g. export experiments and metrics to it from KFP) but we could not easily reuse their SDK, which is the reason MLflow is proposed here.

What are the benefits to use MLFlow SDK if eventually we will implement tracking API in the Kubeflow SDK ?

MLFlow is the dominant player in the market. https://clickpy.clickhouse.com/dashboard/mlflow

FWIW Weights and Biases is a close second but MLFlow has dominated the market for longer. https://clickpy.clickhouse.com/dashboard/wandb

@andreyvelich the main benefits of the MLflow plugin are the Kubeflow SDK could choose to wrap the MLflow SDK to support the powerful autolog feature and existing workflows leveraging the MLflow SDK would continue to work with minimal changes when pointing to a Model Registry API with experiment tracking.

I see, that makes sene!

@andreyvelich Model Registry isn't CRD based and they already added some of the APIs mentioned in this KEP here: kubeflow/model-registry#1318.

I am trying to understand how we will leverage MLFlow autolog feature in the concept of runs.
I understand that Model Registry has ability to orchestrate Experiment and Runs which are not CRD-based, but at the end users should be able to create Kubeflow CRDs (e.g. TrainJob, SparkJob, Workflow) within their Experiments.

@andreyvelich technically MLflow had 1 million more downloads last month than Weight and Biases.

Also I believe both proof of concepts were done with MLflow and I believe we intend on eventually trying to be compatible with both.

@andreyvelich the autologging happens inside your training code that runs within the job containers, not at the orchestration level.

So for example you create a TrainJob CRD, then Kubeflow starts containers based on that CRD and inside those containers your training code runs with MLflow autologging enabled, which captures metrics and parameters and sends them to the Model Registry backend.

The CRDs orchestrate where/how the code runs and autologging just captures what happens during execution, so they're separate layers that work together. And users don't need to modify their CRDs at all, experiment tracking is just another library in their training code.

BTW, this is similar to the demo I shared earlier with MLflow SDK in Kubeflow SDK.

I see, that looks great! I guess, in case of distributed training (when we execute train function across multiple nodes), we will enable autolog() only for the MASTER NODE, like this:

if dist.rank() == 0: mlflow.autolog()

Exactly, but we still need to disable it on other nodes, so we can just do it this way :

rank = dist.get_rank() mlflow.autolog(disable = rank != 0)

andreyvelich · 2025-08-08T19:07:22Z

proposals/892-experiment-tracking/README.md

+   Kubeflow components without forcing everything through pipelines. This restriction often drives users to seek
+   solutions outside the Kubeflow ecosystem.
+1. **Fragmented experience**: Users must navigate multiple interfaces to correlate run results and evaluation data,
+   depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results).


Suggested change

depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results).

depending on which Kubeflow project they use (e.g., Pipeline runs, Katib experiments, TrainJob results).

andreyvelich · 2025-08-08T19:20:11Z

proposals/892-experiment-tracking/README.md

+
+### Non-Goals
+
+1. **Decoupling Katib's experiments from their current implementation**: While the Kubeflow community can revisit


I think, the concept of Katib Experiment and Pipeline is different.
In Katib, Experiment is just a CRD which defines HP Tuning job: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/hp-tuning/random.yaml

Katib maintains its own database just to allow metrics collector to push metrics (e.g. accuracy, loss): https://www.kubeflow.org/docs/components/katib/reference/architecture/#katib-control-plane-components.

It is much more lightweight compare to KFP.

@andreyvelich it would require more thought of how to integrate it, but at a high-level, the domain models seem to map well:

Katib experiment -> Model Registry experiment

Katib trail -> Model Registry run

Katib trail parameters -> Model Registry run parameters

Katib trial metrics -> Model Registry run metrics

So I think Katib's APIs could stay the same but just also create/reuse an experiment in Model Registry and then create a "run" in the Model Registry experiment per executed "trial".

I think for the purpose of this KEP, aligning on it being theoretically possible for Katib to export this metadata to the domain models proposed for Model Registry would be enough. The Katib maintainers/community can then decide if they'd like to integrate after the Model Registry implementation.

I think, the concept of Katib Experiment is misleading. Katib Experiment is a batch job (e.g. OptimizeJob) that has its own scheduler (e.g. Suggestion) which makes decision whether it needs to create more runs (e.g. Trials).
Ideally, we should refactor Katib API towards OptimizeJob where we have clear definition of hyperparameter optimization job.

@kubeflow/wg-training-leads @astefanutti @franciscojavierarceo Any thoughts ?

Before Experiment API, we did have the StudyJob concept that fully aligned with the original Google Vizier paper.
Ref: https://github.com/google/vizier

andreyvelich · 2025-08-08T23:23:50Z

proposals/892-experiment-tracking/README.md

+- Enable users to leverage MLFlow's familiar APIs while storing data in centralized Kubeflow infrastructure
+- Support automatic logging for popular ML frameworks (TensorFlow, PyTorch, scikit-learn, etc.) through MLFlow's SDK
+
+#### Kubeflow SDK Enhancement


It would be nice to have some code snippet on how those integration will look like.

I'll add this in Kubeflow SDK Implementation under Design Details. The example is from a proof of concept developed by @kramaranya!

andreyvelich · 2025-08-08T23:28:09Z

proposals/892-experiment-tracking/README.md

+   experiments in Katib.
+1. **Users**: A user to associate runs, metrics, artifacts, etc. for auditing. This will generally map to the Kubernetes
+   identity.
+1. **Runs**: An execution in the machine learning workflow. In Pipelines, this maps to a pipeline run. In Katib, this


Shall we have representation of Run depending on the Kubeflow CRDs, i.e. ?

TrainJob OptimizeJob SparkJob Pipeline/Workflow

I can imagine that users might want to create Experiment where they only have training job, or they might want to create more complex Experiment with multiple steps (e.g. data process, training, evals).

So a run could be just a single TrainJob or it could a Pipeline with multiple steps (each step is a nested run). I'm trying to model this after MLflow so it's agnostic to the source that logged it.

andreyvelich · 2025-08-08T23:33:17Z

proposals/892-experiment-tracking/README.md

+
+### Model Registry Domain Models
+
+The expanded Model Registry will include the following domain models, heavily influenced by


We should add schema/ API spec in this KEP.

We can also make link to:
kubeflow/model-registry#1224 (comment)

I'll clarify that the linked comment includes APIs as well.

Signed-off-by: mprahl <[email protected]>

juliusvonkohout · 2025-08-15T11:39:32Z

proposals/897-experiment-tracking/README.md

+   depending on which Kubeflow project they use (e.g., Pipeline runs, Katib experiments, Training Operator results).
+1. **No unified tracking**: Users cannot easily compare runs, share insights, or maintain consistent metadata across the
+   entire Kubeflow ecosystem.
+1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD creates technical debt and limits future


ML-metadata also violates a lot of security best practices and breaks hard multi-tenancy.

and has a lot of CVEs

Scanning ghcr.io/kubeflow/kfp-metadata-envoy:2.5.0 +----------+------+--------+-----+ | Critical | High | Medium | Low | +----------+------+--------+-----+ | 0 | 0 | 28 | 7 | +----------+------+--------+-----+ Scanning ghcr.io/kubeflow/kfp-metadata-writer:2.5.0 +----------+------+--------+-----+ | Critical | High | Medium | Low | +----------+------+--------+-----+ | 13 | 372 | 973 | 845 | +----------+------+--------+-----+ Scanning gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0 +----------+------+--------+-----+ | Critical | High | Medium | Low | +----------+------+--------+-----+ | 0 | 0 | 41 | 18 | +----------+------+--------+-----+

juliusvonkohout · 2025-08-15T11:40:38Z

proposals/897-experiment-tracking/README.md

+interoperability.
+
+### Goals
+


Add per namespace/profile multi-tenancy as goal.

mprahl · 2025-09-05T21:01:36Z

I'm closing this KEP because my team no longer has capacity to take this on. If others want to pursue this, feel free to fork the KEP and I'll be happy to review and advise. 😄

juliusvonkohout · 2025-09-26T13:09:44Z

@mprahl may we keep it open for now? Just to have it tracked.

The stalebot will close it anyway if there is no activity on this topic

andreyvelich · 2025-09-26T14:02:23Z

I agree with @juliusvonkohout!

Maybe we should put out a call for contributors to help us add Experiment Tracking support via MLFlow for Kubeflow sub-projects.
This feels like a really important capability that many of our users are asking for, and moving it forward would have a big impact on usability and Kubeflow adoption.

cc @kubeflow/wg-training-leads @kubeflow/wg-pipeline-leads @kubeflow/kubeflow-steering-committee @kubeflow/wg-manifests-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-data-leads @kubeflow/kubeflow-sdk-team @kubeflow/kubeflow-outreach-committee @jbottum

tarilabs · 2025-09-26T14:17:49Z

Rather than tying it strictly to MlFlow implementation choice, I believe it would be very helpful to add an SPI (strongly inspired to MlFlow Exp/Run to begin with) so that if one day you want to tie other integration in this area you could.

Not to dispute MlFlow king popularity, but in other community discussions other alternatives have also their market-share, so an SPI would allow to prepare the ground for as well additional contributor, to what Andrey just said.

What would be the @kubeflow/kubeflow-steering-committee pov on this?

andreyvelich · 2025-09-26T14:38:23Z

I fully agree - designing an extensible architecture makes sense, since it will let us easily swap between experiment tracking solutions (e.g., MLflow, W&B, or even custom option).
My only question is: in the short to medium term, what approach should we take to deliver the most value to users?

tarilabs · 2025-09-26T14:47:14Z

My only question is: in the short to medium term, what approach should we take to deliver the most value to users?

Very IMHO an SPI that is 1:1 to the MlFlow API (with MlFlow integration as its implementation) in the short term.
I'm aware is very limiting and naive, but at least forces to identify where the boundary for this integration lies. In turn, it should indeed make it easier to "direct" contributors/GSoC students if they want to integrate W&B (found the #892 (comment) ! 😄 ) or other tracking system, next.

rareddy · 2025-09-26T16:38:52Z

Experiment tracking is heavily dependent on Registry and UI to support it for visualizations, and tracking models and versions and metrics. What are thoughts on that when speak out this SPI based integration?

If we say SPI enables them to capture data and lets the users use the native tools they integrated with, for example using MlFlow UI separately? My next question is how do we foresee we bring back the champion model back into Kubeflow Model Registry for deployment or management? or do we need to? For me, this defines the scope of Model registry activities too going forward. Thoughts?

mprahl · 2025-10-09T19:23:08Z

I've reached out to the MLflow community to see their willingness for me to contribute a multi-tenancy feature which would allow us to have a single MLflow instances for a Kubeflow installation. Then the Kubeflow community (could be Pipeline WG) could maintain an MLflow plugin to handle Kubernetes RBAC requirements:
mlflow/mlflow#5844 (comment)

juliusvonkohout · 2025-10-28T14:00:40Z

I've reached out to the MLflow community to see their willingness for me to contribute a multi-tenancy feature which would allow us to have a single MLflow instances for a Kubeflow installation. Then the Kubeflow community (could be Pipeline WG) could maintain an MLflow plugin to handle Kubernetes RBAC requirements: mlflow/mlflow#5844 (comment)

Thank you very much. ping me on slack if you need help.

google-oss-prow bot requested a review from johnugeorge August 1, 2025 19:13

google-oss-prow bot requested a review from terrytangyuan August 1, 2025 19:13

google-oss-prow bot added the size/XL label Aug 1, 2025

mprahl mentioned this pull request Aug 1, 2025

WIP: Propose centralized experiment tracking in Kubeflow mprahl/kubeflow-community#1

Closed

mprahl force-pushed the experiment-tracking branch from 0eefb9d to 1b99df6 Compare August 1, 2025 20:03

tarilabs reviewed Aug 5, 2025

View reviewed changes

andreyvelich reviewed Aug 8, 2025

View reviewed changes

mprahl mentioned this pull request Aug 12, 2025

KEP-897: Centralized experiment tracking store in Kubeflow #897

Open

mprahl force-pushed the experiment-tracking branch from 1b99df6 to 56dd509 Compare August 12, 2025 17:05

mprahl changed the title ~~KEP: Propose centralized experiment tracking in Kubeflow~~ KEP-897: Propose centralized experiment tracking in Kubeflow Aug 12, 2025

Propose centralized experiment tracking in Kubeflow

52bd338

Signed-off-by: mprahl <[email protected]>

mprahl force-pushed the experiment-tracking branch from 56dd509 to 52bd338 Compare August 12, 2025 17:48

mprahl requested a review from andreyvelich August 12, 2025 17:48

MattiaSarti mentioned this pull request Aug 14, 2025

Model Registry Exploration for Charmifying It canonical/bundle-kubeflow#1282

Closed

juliusvonkohout reviewed Aug 15, 2025

View reviewed changes

kramaranya mentioned this pull request Aug 18, 2025

Experiment Tracking for Kubeflow SDK kubeflow/sdk#63

Open

tarilabs mentioned this pull request Aug 27, 2025

give Model two tier naming hierarchy kubeflow/model-registry#1530

Open

mprahl closed this Sep 5, 2025

rareddy mentioned this pull request Sep 12, 2025

Add MLflow SDK support for Model Registry as a Tracking Store, fixes #1225 kubeflow/model-registry#1337

Open

8 tasks

juliusvonkohout reopened this Sep 26, 2025

google-oss-prow bot assigned ederign Sep 26, 2025

ederign removed their assignment Sep 26, 2025

tarilabs mentioned this pull request Sep 29, 2025

KEP-907: Renaming "Model Registry" to reflect Registry and Catalog use-cases #907

Draft

szaher mentioned this pull request Oct 31, 2025

feat(docs): KEP-2779: Track TrainJob progress and expose training metrics kubeflow/trainer#2905

Open


		The proposal tackles these issues by:

		- Expanding Model Registry into a central experiment tracking store for experiments, runs, metrics, and artifacts

	depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results).
	depending on which Kubeflow project they use (e.g., Pipeline runs, Katib experiments, TrainJob results).


		### Non-Goals

		1. Decoupling Katib's experiments from their current implementation: While the Kubeflow community can revisit


		### Model Registry Domain Models

		The expanded Model Registry will include the following domain models, heavily influenced by

KEP-897: Propose centralized experiment tracking in Kubeflow #892

Are you sure you want to change the base?

KEP-897: Propose centralized experiment tracking in Kubeflow #892

Conversation

mprahl commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-prow bot commented Aug 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreyvelich commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreyvelich Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarilabs Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreyvelich Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mprahl commented Aug 1, 2025 •

edited

Loading

andreyvelich commented Aug 8, 2025 •

edited

Loading

andreyvelich Aug 8, 2025 •

edited

Loading

tarilabs Aug 14, 2025 •

edited

Loading

andreyvelich Aug 14, 2025 •

edited

Loading

juliusvonkohout Aug 15, 2025 •

edited

Loading