WIP: Propose centralized experiment tracking in Kubeflow #1

mprahl · 2025-07-25T20:08:16Z

anishasthana · 2025-07-25T23:41:56Z

proposals/NNNN-experiment-tracking/README.md

+   experiment tracking system as part of this proposal. The initial focus is on the SDK and Pipelines experience.
+1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
+   direct MLFlow UI support is not a targeted goal of this proposal.
+1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality


Suggested change

1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality

1. **Automatic experiment tracking from the Kubeflow Trainer**: While users can leverage the new functionality

anishasthana · 2025-07-25T23:42:57Z

proposals/NNNN-experiment-tracking/README.md

+   direct MLFlow UI support is not a targeted goal of this proposal.
+1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
+   through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from
+   the Kubeflow Training operator as part of this proposal.


Suggested change

the Kubeflow Training operator as part of this proposal.

the Kubeflow Trainer as part of this proposal. The community should revisit integrations at a later date.

anishasthana · 2025-07-25T23:44:43Z

proposals/NNNN-experiment-tracking/README.md

+[kubeflow/model-registry#1224](https://github.com/kubeflow/model-registry/issues/1224#issuecomment-3068005968). This is
+meant to be a high-level design with further refinement being done in a Model Registry specific KEP or design document.
+
+1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in


Suggested change

1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in

1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could potentially map to experiments in

* Update teams to match acls repo Signed-off-by: Anish Asthana <[email protected]> * Remove Chase from KOC Signed-off-by: Anish Asthana <[email protected]> --------- Signed-off-by: Anish Asthana <[email protected]>

anishasthana

Should we add a note somewhere in here that we expect this to be followed up by KEPs in KFP and MR for sure, and possibly for Kubeflow Developer Experience?

anishasthana · 2025-07-28T09:37:14Z

proposals/NNNN-experiment-tracking/README.md

+
+Model Registry's current focus is managing metadata for the lifecycle of machine learning models, from registration and
+versioning through deployment and serving. The proposal is to expand the scope to cover experiment tracking to be a
+centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect


Suggested change

centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect

centralized metadata store for Kubeflow. As part of this expansion of scope, we could consider renaming it to reflect

anishasthana · 2025-07-28T09:45:47Z

proposals/NNNN-experiment-tracking/README.md

+   depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results).
+1. **No unified tracking**: Users cannot easily compare runs, share insights, or maintain consistent metadata across the
+   entire Kubeflow ecosystem.
+1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical


Suggested change

1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical

1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD creates technical

anishasthana · 2025-07-28T09:46:03Z

proposals/NNNN-experiment-tracking/README.md

+   ecosystem.
+1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model
+   Registry, allowing pipeline runs and output artifacts to be automatically logged and tracked.
+1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to


Suggested change

1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to

1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD to

anishasthana · 2025-07-28T09:46:17Z

proposals/NNNN-experiment-tracking/README.md

+
+### Risks and Mitigations
+
+1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is


Suggested change

1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is

1. **Migration Challenges**: The proposal involves migrating away from MLMD, which is

anishasthana · 2025-07-28T11:12:25Z

proposals/NNNN-experiment-tracking/README.md

+Model Registry's current authorization model gates access at the API level through a proxy rather than at the individual
+resource or namespace level. This approach prevents the same Model Registry instance from being shared across teams that
+require isolation. This limitation conflicts with Kubeflow Pipelines' multiuser mode, which is the default deployment
+strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model


Suggested change

strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model

strategy for the Kubeflow platform. MLMD has the same limitation, so adding multi-tenancy support to Model

anishasthana · 2025-07-28T11:16:33Z

proposals/NNNN-experiment-tracking/README.md

+
+**Future Enhancements**: In subsequent phases, we may extend the plugin architecture to include:
+
+- **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry.


Suggested change

- **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry.

- **Tracing Support**: Adding tracing capabilities akin to [MLflow](https://mlflow.org/docs/latest/genai/tracing/)would require new domain models in Model Registry.

anishasthana · 2025-07-28T11:16:58Z

proposals/NNNN-experiment-tracking/README.md

+
+#### Kubeflow SDK Implementation
+
+Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,


Suggested change

Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,

Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a native

kramaranya

That looks great @mprahl!

kramaranya · 2025-07-28T06:28:35Z

proposals/NNNN-experiment-tracking/README.md

+1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of
+   experiments, runs, metrics, and artifacts without requiring pipeline execution.
+1. **Provide MLFlow SDK compatibility**: Create an
+   [MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the


It should be MLflow plugin instead, since these extend MLflow's backend storage capabilities rather than the SDK itself

Suggested change

[MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

[MLFlow plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

kramaranya · 2025-07-28T07:34:45Z

proposals/NNNN-experiment-tracking/README.md

+1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
+   through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from
+   the Kubeflow Training operator as part of this proposal.


To clarify: The MLflow plugin allows saving metadata to the Model Registry backend when using MLflow APIs.

"Kubeflow SDK or MLflow SDK plugin" is confusing, since we could either use Kubeflow SDK or MLflow SDK , with MLflow plugin regardless of which SDK you choose. However, I would just keep Kubeflow SDK here to not confuse users.

Suggested change

1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality

through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from

the Kubeflow Training operator as part of this proposal.

1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality

through the Kubeflow SDK with MLflow plugin, we are not implementing automatic experiment tracking directly from

the Kubeflow Training operator as part of this proposal.

kramaranya · 2025-07-28T08:30:52Z

proposals/NNNN-experiment-tracking/README.md

+1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
+   direct MLFlow UI support is not a targeted goal of this proposal.


Why theoretically? The plugin should definitely enable MLflow UI compatibility with Model Registry -- that's the purpose of MLflow plugins. You just need to install the MLflow package and point the UI to the Model Registry backend (with the Model Registry plugin installed), same as you would with other data sources like local files, databases, etc.

Suggested change

1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing

direct MLFlow UI support is not a targeted goal of this proposal.

1. **MLFlow UI support**: Although the MLFlow plugin should theoretically enable MLFlow UI compatibility, providing

direct MLFlow UI support is not a targeted goal of this proposal.

I think splitting the MLFlow plugin into a separate proposal will help as it removes this kind of issue from the critical path

@kramaranya it was mostly due to some of the comments Dhiraj made about some UI issues he encountered when implementing the plugin but I'll remove the word theoretical.

@etirelli I see your point but I like keeping it together to provide the reader the whole vision and to also ensure the new domain models in the Model Registry map well to MLFlow ones to enable this compatibility. Implementation of the MLFlow plugin can be separate and done later depending on the design/implementation of the Kubeflow SDK working group.

kramaranya · 2025-07-28T08:32:18Z

proposals/NNNN-experiment-tracking/README.md

+
+#### MLFlow SDK Compatibility
+
+- Implement MLFlow SDK plugins to connect to the Model Registry backend


Suggested change

- Implement MLFlow SDK plugins to connect to the Model Registry backend

- Implement MLFlow plugins to connect to the Model Registry backend

Thanks! I'll remove SDK from these occurrences.

kramaranya · 2025-07-28T11:21:28Z

proposals/NNNN-experiment-tracking/README.md

+#### Kubeflow SDK Enhancement
+
+- Provide a more native, Kubeflow-centric experience
+- Simplify setup and configuration
+- Integrate with existing Model Registry SDK functionality


How does this sound?

Suggested change

#### Kubeflow SDK Enhancement

- Provide a more native, Kubeflow-centric experience

- Simplify setup and configuration

- Integrate with existing Model Registry SDK functionality

#### Kubeflow SDK Enhancement

- Provide native experiment tracking capabilities within the Kubeflow SDK using MLflow compatibility

- Simplify experiment tracking setup and configuration

- Enable seamless integration with Model Registry for unified metadata management

- Support both local and remote experiment tracking workflows

+1 to the suggestions, except, remove the MLFlow reference

Great suggestion. Thank you @kramaranya!

kramaranya · 2025-07-28T11:27:51Z

proposals/NNNN-experiment-tracking/README.md

+   significant disruption for the MLFlow community if they were to change their SDK plugin stance, making such changes
+   unlikely.
+
+## Design Details


Shall we add a digram to visualize the proposal design, similar to what we had for f2f. What do you think?

Good suggestion! I'll create one before submitting this to the Kubeflow community.

kramaranya · 2025-07-28T11:29:25Z

proposals/NNNN-experiment-tracking/README.md

+
+### SDK Implementation Details
+
+#### MLFlow SDK Model Registry Plugin


Suggested change

#### MLFlow SDK Model Registry Plugin

#### MLFlow Model Registry Plugin

kramaranya · 2025-07-28T11:41:05Z

proposals/NNNN-experiment-tracking/README.md

+Enhancements could include:
+
+- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
+  Kubernetes tokens.
+- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier
+  for users to visualize and manage their experiments.
+- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to
+  select one.


Suggested change

Enhancements could include:

- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating

Kubernetes tokens.

- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier

for users to visualize and manage their experiments.

- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to

select one.

Enhancements could include:

- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating

Kubernetes tokens.

- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier

for users to visualize and manage their experiments.

- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to

select one.

- **Local Experiments**: Enable users to work with local experiments and selectively publish them to the remote Model Registry backend.

Good suggestion! Thank you.

kramaranya · 2025-07-28T11:41:38Z

proposals/NNNN-experiment-tracking/README.md

+
+Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
+Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring
+the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.


Suggested change

the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.

the MLFlow Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.

kramaranya · 2025-07-28T11:42:19Z

proposals/NNNN-experiment-tracking/README.md

+Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
+Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring


Suggested change

Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,

Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring

Building on the MLFlow Model Registry plugin, we can enhance the Kubeflow SDK to provide a native,

Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring

etirelli · 2025-07-28T12:17:31Z

proposals/NNNN-experiment-tracking/README.md

+
+- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts
+  across all Kubeflow components.
+- Providing **MLFlow SDK compatibility**, enabling users to leverage familiar APIs while storing data in Kubeflow.


Maybe rephrase this along the lines of "supporting third party" experimentation tracking. Mention MLFlow as a possible target instead of a goal?

etirelli · 2025-07-28T12:26:38Z

proposals/NNNN-experiment-tracking/README.md

+
+### Goals
+
+1. **Kubeflow centralized experiment tracking**: Create an experiment tracking system that is independent from Kubeflow


maybe merge first and second goals, and rephrase to emphasize separation of concerns and experiment tracking as a first class citizen capability in Kubeflow?

etirelli · 2025-07-28T12:28:37Z

proposals/NNNN-experiment-tracking/README.md

+1. **Transform Model Registry into a unified metadata store**: Evolve the existing Model Registry component into a
+   centralized metadata store that can handle experiments, runs, metrics, and artifacts across the entire Kubeflow
+   ecosystem.
+1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model


rephrase emphasizing refactoring pipelines to externalize experiment tracking, removing the dependency (tech debt) from Pipelines. This allows pipelines to integrate with MR as well as 3rd party trackers.

etirelli · 2025-07-28T12:31:30Z

proposals/NNNN-experiment-tracking/README.md

+1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of
+   experiments, runs, metrics, and artifacts without requiring pipeline execution.
+1. **Provide MLFlow SDK compatibility**: Create an
+   [MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the


I am of two minds about explicitly setting this as a goal, since the plugin is standalone. Having the MLFlow plugin as a separate enhancement might make it easier to approve, both.

etirelli · 2025-07-28T12:52:54Z

proposals/NNNN-experiment-tracking/README.md

+
+As a data scientist working with Kubeflow, I want to log my experiments directly from Jupyter Notebooks without being
+forced to use pipelines, so that I can track my iterative model development process naturally. I should be able to use
+familiar MLFlow SDK APIs to log metrics, parameters, and artifacts, and then view all my experiments in a unified


will we not have kubeflow SDK support? are we talking only about MLFlow?

Good point. I'll split this into two user stories.

dhirajsb

lgtm overall, but imho we don't need to go into details of how to implement model registry resource level access control right now.
Model Registry already support RBAC at the service level, which is a good start for most applications. For architectures that want to restrict access to resources within a shared model registry instance, we can look at adding support for group level access control in model registry in the future.

dhirajsb · 2025-07-28T18:35:05Z

proposals/NNNN-experiment-tracking/README.md

+
+The proposal tackles these issues by:
+
+- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts


Model registry was designed from the ground up to be a metadata store for AI/ML platforms. So adding support for experiments is just a matter of adding new metadata resource types to the existing API.
The work for this has already been completed to a large extent in model registry component in the following issues and related PRs:
Add support for Experiment tracking in Model Registry
Add MLflow SDK support for Model Registry as a store

Thanks! I'll add a section in ### Model Registry Expansion to mention the original intent of how Model Registry was designed.

dhirajsb · 2025-07-28T18:38:33Z

proposals/NNNN-experiment-tracking/README.md

+   [MLFlow storage plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to continue
+   using familiar MLFlow APIs while storing experiment data in Kubeflow's centralized tracking system. This provides
+   seamless integration for existing MLFlow code and positions Kubeflow's SDK as complementary to MLFlow rather than
+   competitive.


Model Registry MLFlow SDK integration will start with Tracking Store plugin initially to support MLflow experiments API.
kubeflow/model-registry#1225
The intent is to allow existing MLflow users the ability to integrate with Kubeflow platform registry. It is not meant to replace the existing Kubeflow Model Registry SDK which will have full support for Kubeflow model registry Experiments, Registry, and Model Catalog APIs.

I'll reword it to mention the Model Registry SDK and to specify the tracking store plugin explicitly.

dhirajsb · 2025-07-28T18:48:25Z

proposals/NNNN-experiment-tracking/README.md

+   competitive.
+1. **Implement multitenancy in Model Registry**: Add multitenancy support to Model Registry to enable isolation at the
+   Kubernetes namespace level like Kubeflow Pipelines does today. This resolves the multitenancy gap that exists with
+   MLMD today.


The current Model Registry deployment architecture supports multiple model registry instances, which can be shared across teams working in multiple namespaces. Tenancy and isolation is achieved through standard Kubernetes RBAC rules to control access to model registry REST API from different namesapces.
But, a model registry instance itself is not aware of or enforce namespace level isolation. This makes access control and authorization an orthogonal and therefore flexible concern.
In other words, Model Registry supports multiple tenants by using an instance per tenant. Applications in a tenant may be distributed across namespaces. A model registry instance or service does not differentiate clients by namespace.
If a pipeline wishes to record the namespace where it was executed, it will need to add that to either an existing metadata property or use a custom metadata property.

I think this is related to the discussion in #1 (comment). So let's consolidate that discussion to that thread.

dhirajsb · 2025-07-28T18:52:42Z

proposals/NNNN-experiment-tracking/README.md

+- Filter by timestamps (before, after)
+- Fuzzy searching on entity names
+
+### Multi-tenancy Implementation


Model Registry already support multiple tenants with dedicated instance per tenant deployment model. Tenants can span multiple namespaces and access control is enforced through Kubernetes RBAC for the Model Registry service.

I personally consider that a workaround rather than supporting multiple tenants and it would be a significant user experience and maintenance burden to require a new Model Registry instance for every isolated Kubeflow namespace. I'd like to try to work together towards a solution on this in this thread: #1 (comment)

dhirajsb · 2025-07-28T19:33:24Z

proposals/NNNN-experiment-tracking/README.md

+would address this gap for all of Kubeflow's metadata needs.
+
+This proposal adds an authorization mode to Model Registry that leverages Kubernetes subject access review. In this
+mode, all entities would be mapped to Kubernetes namespaces on the cluster to provide namespace-level isolation. For


We cannot go back to the namespace scoped limitation of mlmd architecture before. A registered model is not tied to a specific namespace, but shared across all clients within a higher level tenant scope. Model Registry will implement multi-tenancy but use Kubernets user groups and roles to do so, which decouples client location (namespace) from resource location.

The proposal is to have an authorization mode that is namespaced. So in that mode, a namespace would be required on all entities. When not in that mode, it's not a field that is exposed/accepted by the API.

We could start out by having this mode only apply to the new experiment tracking entities but still have the existing entities be global to the instance.

mprahl · 2025-07-28T20:02:02Z

lgtm overall, but imho we don't need to go into details of how to implement model registry resource level access control right now. Model Registry already support RBAC at the service level, which is a good start for most applications. For architectures that want to restrict access to resources within a shared model registry instance, we can look at adding support for group level access control in model registry in the future.

@dhirajsb the reason behind needing to solve the multi-tenancy issue in MR in the near future is that some KFP users are blocked from using KFP v2 due to MLMD being required in KFP v2 and it not supporting multi-tenancy. The main way to install KFP is a single instance per cluster with namespace level isolation. If we switch over from MLMD to Model Registry, we just shift the same gap to another service.

Do you have any other suggestions on how to add RBAC to Model Registry?

proposals/NNNN-experiment-tracking/README.md

Signed-off-by: Yuki Iwai <[email protected]>

briangallagher · 2025-07-30T08:02:30Z

proposals/NNNN-experiment-tracking/README.md

+
+- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
+  Kubernetes tokens.
+- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier


This seems to suggest that the experiments UI will be part of the model Registry UI - is that correct? Or is the intention for the Experiments UI to be a standalone component?

Yes, expanding the MR UI to include experiments was the intent. Thanks for pointing out that this wasn't clear. My latest push adds this clarification.

briangallagher · 2025-07-30T16:58:36Z

proposals/NNNN-experiment-tracking/README.md

+Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. This integration should
+feel cohesive with the existing Model Registry SDK functionality for registering models.
+
+Enhancements could include:


I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs. The 4 points below for me would more naturally live in the model registry SDK as they are all Registry/Experiment specific features.
I think specifying the scope of features for the Model Registry SDK will allow us to know what to build upon in the Kubeflow SDK. I think this will avoid duplication of effort.

I echo @briangallagher 's comment and I would like to avoid duplication of efforts in the Kubeflow SDK.

I understand you want to use the MLFlow plugin for experiment tracking, and that is covered by the great work done by @dhirajsb and @syntaxsdev for the mlflow plugin.

When it comes to RegisteredModel, ModelVersion, ModelArtifact etc and lifecycle of Model Registry user operations, we have already the MR py sdk, and this can be integrated in the Kubeflow SDK scope.

I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs.

The MLFlow plugin won't be part of the MR SDK per se, it's just a module users can install to integrate with MR using MLflow SDK.

The MR SDK will have it's own APIs for Experimentation as well as the existing RegisteredModel resources. So, I agree that duplicating Experimentation support in a separate Kubeflow SDK doesn't add value. Kubeflow SDK could instead focus on making it easier to locate MR services and configure an MR SDK client that users could use to access MR API.

@dhirajsb when you say "The MR SDK will have it's own APIs for Experimentation", are you saying the OpenAPI generated client code or custom methods like log_metric?

If it's the OpenAPI generated code, my thought is that the Kubeflow SDK can be the integration point for Model Registry and the Model Registry MLFlow plugin like the POC. We later create bespoke methods like log_metric in the Model Registry SDK and replace those transparently in the Kubeflow SDK if we need more functionality.

Signed-off-by: mprahl <[email protected]>

mprahl · 2025-08-01T19:14:05Z

Since it's been a week and I believe I've addressed most of the feedback, I opened a PR to upstream to continue the discussion there:
kubeflow#892

anishasthana reviewed Jul 25, 2025

View reviewed changes

Update teams to match acls repo (kubeflow#877)

00e46c5

* Update teams to match acls repo Signed-off-by: Anish Asthana <[email protected]> * Remove Chase from KOC Signed-off-by: Anish Asthana <[email protected]> --------- Signed-off-by: Anish Asthana <[email protected]>

anishasthana reviewed Jul 28, 2025

View reviewed changes

kramaranya reviewed Jul 28, 2025

View reviewed changes

etirelli reviewed Jul 28, 2025

View reviewed changes

mprahl force-pushed the experiment-tracking branch 3 times, most recently from 3587c70 to e6dd073 Compare July 28, 2025 17:56

mprahl requested review from anishasthana, etirelli and kramaranya July 28, 2025 18:03

mprahl force-pushed the experiment-tracking branch from e6dd073 to 0d0833b Compare July 28, 2025 18:28

dhirajsb reviewed Jul 28, 2025

View reviewed changes

mprahl force-pushed the experiment-tracking branch from 0d0833b to 39d8369 Compare July 28, 2025 20:23

mprahl requested a review from dhirajsb July 28, 2025 20:23

anishasthana reviewed Jul 29, 2025

View reviewed changes

proposals/NNNN-experiment-tracking/README.md Outdated Show resolved Hide resolved

tenzen-y steps down from WG AutoML leadership (kubeflow#887)

3d1584e

Signed-off-by: Yuki Iwai <[email protected]>

mprahl force-pushed the experiment-tracking branch 4 times, most recently from 4a75ea2 to 99f1fc4 Compare July 29, 2025 19:37

briangallagher reviewed Jul 30, 2025

View reviewed changes

mprahl force-pushed the experiment-tracking branch from 99f1fc4 to 8b33ad9 Compare July 30, 2025 12:54

briangallagher reviewed Jul 30, 2025

View reviewed changes

mprahl force-pushed the experiment-tracking branch from 8b33ad9 to 336aa5a Compare August 1, 2025 19:11

Propose centralized experiment tracking in Kubeflow

0eefb9d

Signed-off-by: mprahl <[email protected]>

mprahl force-pushed the experiment-tracking branch from 336aa5a to 0eefb9d Compare August 1, 2025 19:12

mprahl closed this Aug 1, 2025

dhirajsb mentioned this pull request Aug 13, 2025

Add support for filtering resource properties and returning unique values kubeflow/model-registry#1460

Closed

jonburdo mentioned this pull request Aug 13, 2025

Add an endpoint to bulk request metric history for experiment runs kubeflow/model-registry#1466

Closed

	1. Automatic experiment tracking from the Kubeflow Training operator: While users can leverage the new functionality
	1. Automatic experiment tracking from the Kubeflow Trainer: While users can leverage the new functionality

	the Kubeflow Training operator as part of this proposal.
	the Kubeflow Trainer as part of this proposal. The community should revisit integrations at a later date.

	1. Experiments: A group of related runs. This would replace experiments in Pipelines and could map to experiments in
	1. Experiments: A group of related runs. This would replace experiments in Pipelines and could potentially map to experiments in

	centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect
	centralized metadata store for Kubeflow. As part of this expansion of scope, we could consider renaming it to reflect

	1. Maintenance challenges: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical
	1. Maintenance challenges: The Kubeflow Pipelines dependency on MLMD creates technical

	1. Remove the MLMD dependency from Pipelines: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to
	1. Remove the MLMD dependency from Pipelines: Decouple Kubeflow Pipelines from MLMD to


		### Risks and Mitigations

		1. Migration Challenges: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is

	strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model
	strategy for the Kubeflow platform. MLMD has the same limitation, so adding multi-tenancy support to Model


		Future Enhancements: In subsequent phases, we may extend the plugin architecture to include:

		- Tracing Support: Adding tracing capabilities would require new domain models in Model Registry.


		#### Kubeflow SDK Implementation

		Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,

	[MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the
	[MLFlow plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

		1. MLFlow UI support: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
		direct MLFlow UI support is not a targeted goal of this proposal.


		#### MLFlow SDK Compatibility

		- Implement MLFlow SDK plugins to connect to the Model Registry backend

	- Implement MLFlow SDK plugins to connect to the Model Registry backend
	- Implement MLFlow plugins to connect to the Model Registry backend


		### SDK Implementation Details

		#### MLFlow SDK Model Registry Plugin

	#### MLFlow SDK Model Registry Plugin
	#### MLFlow Model Registry Plugin

	the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.
	the MLFlow Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.

		Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
		Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring


		### Goals

		1. Kubeflow centralized experiment tracking: Create an experiment tracking system that is independent from Kubeflow


		The proposal tackles these issues by:

		- Expanding Model Registry into a central experiment tracking store for experiments, runs, metrics, and artifacts

WIP: Propose centralized experiment tracking in Kubeflow #1

WIP: Propose centralized experiment tracking in Kubeflow #1

Uh oh!

Conversation

mprahl commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anishasthana left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kramaranya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mprahl commented Jul 25, 2025 •

edited

Loading

mprahl Jul 28, 2025 •

edited

Loading