Skip to content

Conversation

@mprahl
Copy link
Owner

@mprahl mprahl commented Jul 25, 2025

Replaced by kubeflow#892

experiment tracking system as part of this proposal. The initial focus is on the SDK and Pipelines experience.
1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
direct MLFlow UI support is not a targeted goal of this proposal.
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
1. **Automatic experiment tracking from the Kubeflow Trainer**: While users can leverage the new functionality

direct MLFlow UI support is not a targeted goal of this proposal.
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from
the Kubeflow Training operator as part of this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the Kubeflow Training operator as part of this proposal.
the Kubeflow Trainer as part of this proposal. The community should revisit integrations at a later date.

[kubeflow/model-registry#1224](https://github.com/kubeflow/model-registry/issues/1224#issuecomment-3068005968). This is
meant to be a high-level design with further refinement being done in a Model Registry specific KEP or design document.

1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in
1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could potentially map to experiments in

* Update teams to match acls repo

Signed-off-by: Anish Asthana <[email protected]>

* Remove Chase from KOC

Signed-off-by: Anish Asthana <[email protected]>

---------

Signed-off-by: Anish Asthana <[email protected]>
Copy link

@anishasthana anishasthana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note somewhere in here that we expect this to be followed up by KEPs in KFP and MR for sure, and possibly for Kubeflow Developer Experience?


Model Registry's current focus is managing metadata for the lifecycle of machine learning models, from registration and
versioning through deployment and serving. The proposal is to expand the scope to cover experiment tracking to be a
centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect
centralized metadata store for Kubeflow. As part of this expansion of scope, we could consider renaming it to reflect

depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results).
1. **No unified tracking**: Users cannot easily compare runs, share insights, or maintain consistent metadata across the
entire Kubeflow ecosystem.
1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical
1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD creates technical

ecosystem.
1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model
Registry, allowing pipeline runs and output artifacts to be automatically logged and tracked.
1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to
1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD to


### Risks and Mitigations

1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is
1. **Migration Challenges**: The proposal involves migrating away from MLMD, which is

Model Registry's current authorization model gates access at the API level through a proxy rather than at the individual
resource or namespace level. This approach prevents the same Model Registry instance from being shared across teams that
require isolation. This limitation conflicts with Kubeflow Pipelines' multiuser mode, which is the default deployment
strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model
strategy for the Kubeflow platform. MLMD has the same limitation, so adding multi-tenancy support to Model


**Future Enhancements**: In subsequent phases, we may extend the plugin architecture to include:

- **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry.
- **Tracing Support**: Adding tracing capabilities akin to [MLflow](https://mlflow.org/docs/latest/genai/tracing/)would require new domain models in Model Registry.


#### Kubeflow SDK Implementation

Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a native

Copy link

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great @mprahl!

1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of
experiments, runs, metrics, and artifacts without requiring pipeline execution.
1. **Provide MLFlow SDK compatibility**: Create an
[MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be MLflow plugin instead, since these extend MLflow's backend storage capabilities rather than the SDK itself

Suggested change
[MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the
[MLFlow plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

Comment on lines 76 to 78
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from
the Kubeflow Training operator as part of this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: The MLflow plugin allows saving metadata to the Model Registry backend when using MLflow APIs.

"Kubeflow SDK or MLflow SDK plugin" is confusing, since we could either use Kubeflow SDK or MLflow SDK , with MLflow plugin regardless of which SDK you choose. However, I would just keep Kubeflow SDK here to not confuse users.

Suggested change
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from
the Kubeflow Training operator as part of this proposal.
1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality
through the Kubeflow SDK with MLflow plugin, we are not implementing automatic experiment tracking directly from
the Kubeflow Training operator as part of this proposal.

Comment on lines 74 to 75
1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
direct MLFlow UI support is not a targeted goal of this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why theoretically? The plugin should definitely enable MLflow UI compatibility with Model Registry -- that's the purpose of MLflow plugins. You just need to install the MLflow package and point the UI to the Model Registry backend (with the Model Registry plugin installed), same as you would with other data sources like local files, databases, etc.

Suggested change
1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing
direct MLFlow UI support is not a targeted goal of this proposal.
1. **MLFlow UI support**: Although the MLFlow plugin should theoretically enable MLFlow UI compatibility, providing
direct MLFlow UI support is not a targeted goal of this proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think splitting the MLFlow plugin into a separate proposal will help as it removes this kind of issue from the critical path

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kramaranya it was mostly due to some of the comments Dhiraj made about some UI issues he encountered when implementing the plugin but I'll remove the word theoretical.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etirelli I see your point but I like keeping it together to provide the reader the whole vision and to also ensure the new domain models in the Model Registry map well to MLFlow ones to enable this compatibility. Implementation of the MLFlow plugin can be separate and done later depending on the design/implementation of the Kubeflow SDK working group.


#### MLFlow SDK Compatibility

- Implement MLFlow SDK plugins to connect to the Model Registry backend

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Implement MLFlow SDK plugins to connect to the Model Registry backend
- Implement MLFlow plugins to connect to the Model Registry backend

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll remove SDK from these occurrences.

Comment on lines 121 to 125
#### Kubeflow SDK Enhancement

- Provide a more native, Kubeflow-centric experience
- Simplify setup and configuration
- Integrate with existing Model Registry SDK functionality

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this sound?

Suggested change
#### Kubeflow SDK Enhancement
- Provide a more native, Kubeflow-centric experience
- Simplify setup and configuration
- Integrate with existing Model Registry SDK functionality
#### Kubeflow SDK Enhancement
- Provide native experiment tracking capabilities within the Kubeflow SDK using MLflow compatibility
- Simplify experiment tracking setup and configuration
- Enable seamless integration with Model Registry for unified metadata management
- Support both local and remote experiment tracking workflows

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the suggestions, except, remove the MLFlow reference

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. Thank you @kramaranya!

significant disruption for the MLFlow community if they were to change their SDK plugin stance, making such changes
unlikely.

## Design Details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a digram to visualize the proposal design, similar to what we had for f2f. What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'll create one before submitting this to the Kubeflow community.


### SDK Implementation Details

#### MLFlow SDK Model Registry Plugin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### MLFlow SDK Model Registry Plugin
#### MLFlow Model Registry Plugin

Comment on lines +345 to +369
Enhancements could include:

- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
Kubernetes tokens.
- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier
for users to visualize and manage their experiments.
- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to
select one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enhancements could include:
- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
Kubernetes tokens.
- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier
for users to visualize and manage their experiments.
- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to
select one.
Enhancements could include:
- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
Kubernetes tokens.
- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier
for users to visualize and manage their experiments.
- **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to
select one.
- **Local Experiments**: Enable users to work with local experiments and selectively publish them to the remote Model Registry backend.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Thank you.


Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring
the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.
the MLFlow Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs.

Comment on lines 340 to 341
Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native,
Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring
Building on the MLFlow Model Registry plugin, we can enhance the Kubeflow SDK to provide a native,
Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring


- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts
across all Kubeflow components.
- Providing **MLFlow SDK compatibility**, enabling users to leverage familiar APIs while storing data in Kubeflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase this along the lines of "supporting third party" experimentation tracking. Mention MLFlow as a possible target instead of a goal?


### Goals

1. **Kubeflow centralized experiment tracking**: Create an experiment tracking system that is independent from Kubeflow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe merge first and second goals, and rephrase to emphasize separation of concerns and experiment tracking as a first class citizen capability in Kubeflow?

1. **Transform Model Registry into a unified metadata store**: Evolve the existing Model Registry component into a
centralized metadata store that can handle experiments, runs, metrics, and artifacts across the entire Kubeflow
ecosystem.
1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrase emphasizing refactoring pipelines to externalize experiment tracking, removing the dependency (tech debt) from Pipelines. This allows pipelines to integrate with MR as well as 3rd party trackers.

1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of
experiments, runs, metrics, and artifacts without requiring pipeline execution.
1. **Provide MLFlow SDK compatibility**: Create an
[MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of two minds about explicitly setting this as a goal, since the plugin is standalone. Having the MLFlow plugin as a separate enhancement might make it easier to approve, both.


As a data scientist working with Kubeflow, I want to log my experiments directly from Jupyter Notebooks without being
forced to use pipelines, so that I can track my iterative model development process naturally. I should be able to use
familiar MLFlow SDK APIs to log metrics, parameters, and artifacts, and then view all my experiments in a unified

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we not have kubeflow SDK support? are we talking only about MLFlow?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll split this into two user stories.

@mprahl mprahl force-pushed the experiment-tracking branch 3 times, most recently from 3587c70 to e6dd073 Compare July 28, 2025 17:56
@mprahl mprahl force-pushed the experiment-tracking branch from e6dd073 to 0d0833b Compare July 28, 2025 18:28
Copy link

@dhirajsb dhirajsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, but imho we don't need to go into details of how to implement model registry resource level access control right now.
Model Registry already support RBAC at the service level, which is a good start for most applications. For architectures that want to restrict access to resources within a shared model registry instance, we can look at adding support for group level access control in model registry in the future.


The proposal tackles these issues by:

- **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model registry was designed from the ground up to be a metadata store for AI/ML platforms. So adding support for experiments is just a matter of adding new metadata resource types to the existing API.
The work for this has already been completed to a large extent in model registry component in the following issues and related PRs:
Add support for Experiment tracking in Model Registry
Add MLflow SDK support for Model Registry as a store

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll add a section in ### Model Registry Expansion to mention the original intent of how Model Registry was designed.

[MLFlow storage plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to continue
using familiar MLFlow APIs while storing experiment data in Kubeflow's centralized tracking system. This provides
seamless integration for existing MLFlow code and positions Kubeflow's SDK as complementary to MLFlow rather than
competitive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model Registry MLFlow SDK integration will start with Tracking Store plugin initially to support MLflow experiments API.
kubeflow/model-registry#1225
The intent is to allow existing MLflow users the ability to integrate with Kubeflow platform registry. It is not meant to replace the existing Kubeflow Model Registry SDK which will have full support for Kubeflow model registry Experiments, Registry, and Model Catalog APIs.

Copy link
Owner Author

@mprahl mprahl Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reword it to mention the Model Registry SDK and to specify the tracking store plugin explicitly.

competitive.
1. **Implement multitenancy in Model Registry**: Add multitenancy support to Model Registry to enable isolation at the
Kubernetes namespace level like Kubeflow Pipelines does today. This resolves the multitenancy gap that exists with
MLMD today.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current Model Registry deployment architecture supports multiple model registry instances, which can be shared across teams working in multiple namespaces. Tenancy and isolation is achieved through standard Kubernetes RBAC rules to control access to model registry REST API from different namesapces.
But, a model registry instance itself is not aware of or enforce namespace level isolation. This makes access control and authorization an orthogonal and therefore flexible concern.
In other words, Model Registry supports multiple tenants by using an instance per tenant. Applications in a tenant may be distributed across namespaces. A model registry instance or service does not differentiate clients by namespace.
If a pipeline wishes to record the namespace where it was executed, it will need to add that to either an existing metadata property or use a custom metadata property.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to the discussion in #1 (comment). So let's consolidate that discussion to that thread.

- Filter by timestamps (before, after)
- Fuzzy searching on entity names

### Multi-tenancy Implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model Registry already support multiple tenants with dedicated instance per tenant deployment model. Tenants can span multiple namespaces and access control is enforced through Kubernetes RBAC for the Model Registry service.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally consider that a workaround rather than supporting multiple tenants and it would be a significant user experience and maintenance burden to require a new Model Registry instance for every isolated Kubeflow namespace. I'd like to try to work together towards a solution on this in this thread: #1 (comment)

would address this gap for all of Kubeflow's metadata needs.

This proposal adds an authorization mode to Model Registry that leverages Kubernetes subject access review. In this
mode, all entities would be mapped to Kubernetes namespaces on the cluster to provide namespace-level isolation. For

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot go back to the namespace scoped limitation of mlmd architecture before. A registered model is not tied to a specific namespace, but shared across all clients within a higher level tenant scope. Model Registry will implement multi-tenancy but use Kubernets user groups and roles to do so, which decouples client location (namespace) from resource location.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is to have an authorization mode that is namespaced. So in that mode, a namespace would be required on all entities. When not in that mode, it's not a field that is exposed/accepted by the API.

We could start out by having this mode only apply to the new experiment tracking entities but still have the existing entities be global to the instance.

@mprahl
Copy link
Owner Author

mprahl commented Jul 28, 2025

lgtm overall, but imho we don't need to go into details of how to implement model registry resource level access control right now. Model Registry already support RBAC at the service level, which is a good start for most applications. For architectures that want to restrict access to resources within a shared model registry instance, we can look at adding support for group level access control in model registry in the future.

@dhirajsb the reason behind needing to solve the multi-tenancy issue in MR in the near future is that some KFP users are blocked from using KFP v2 due to MLMD being required in KFP v2 and it not supporting multi-tenancy. The main way to install KFP is a single instance per cluster with namespace level isolation. If we switch over from MLMD to Model Registry, we just shift the same gap to another service.

Do you have any other suggestions on how to add RBAC to Model Registry?

@mprahl mprahl force-pushed the experiment-tracking branch from 0d0833b to 39d8369 Compare July 28, 2025 20:23
@mprahl mprahl requested a review from dhirajsb July 28, 2025 20:23
@mprahl mprahl force-pushed the experiment-tracking branch 4 times, most recently from 4a75ea2 to 99f1fc4 Compare July 29, 2025 19:37

- **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating
Kubernetes tokens.
- **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to suggest that the experiments UI will be part of the model Registry UI - is that correct? Or is the intention for the Experiments UI to be a standalone component?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, expanding the MR UI to include experiments was the intent. Thanks for pointing out that this wasn't clear. My latest push adds this clarification.

@mprahl mprahl force-pushed the experiment-tracking branch from 99f1fc4 to 8b33ad9 Compare July 30, 2025 12:54
Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. This integration should
feel cohesive with the existing Model Registry SDK functionality for registering models.

Enhancements could include:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs. The 4 points below for me would more naturally live in the model registry SDK as they are all Registry/Experiment specific features.
I think specifying the scope of features for the Model Registry SDK will allow us to know what to build upon in the Kubeflow SDK. I think this will avoid duplication of effort.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I echo @briangallagher 's comment and I would like to avoid duplication of efforts in the Kubeflow SDK.

I understand you want to use the MLFlow plugin for experiment tracking, and that is covered by the great work done by @dhirajsb and @syntaxsdev for the mlflow plugin.

When it comes to RegisteredModel, ModelVersion, ModelArtifact etc and lifecycle of Model Registry user operations, we have already the MR py sdk, and this can be integrated in the Kubeflow SDK scope.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs.

The MLFlow plugin won't be part of the MR SDK per se, it's just a module users can install to integrate with MR using MLflow SDK.

The MR SDK will have it's own APIs for Experimentation as well as the existing RegisteredModel resources. So, I agree that duplicating Experimentation support in a separate Kubeflow SDK doesn't add value. Kubeflow SDK could instead focus on making it easier to locate MR services and configure an MR SDK client that users could use to access MR API.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhirajsb when you say "The MR SDK will have it's own APIs for Experimentation", are you saying the OpenAPI generated client code or custom methods like log_metric?

If it's the OpenAPI generated code, my thought is that the Kubeflow SDK can be the integration point for Model Registry and the Model Registry MLFlow plugin like the POC. We later create bespoke methods like log_metric in the Model Registry SDK and replace those transparently in the Kubeflow SDK if we need more functionality.

@mprahl mprahl force-pushed the experiment-tracking branch from 8b33ad9 to 336aa5a Compare August 1, 2025 19:11
@mprahl mprahl force-pushed the experiment-tracking branch from 336aa5a to 0eefb9d Compare August 1, 2025 19:12
@mprahl
Copy link
Owner Author

mprahl commented Aug 1, 2025

Since it's been a week and I believe I've addressed most of the feedback, I opened a PR to upstream to continue the discussion there:
kubeflow#892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants