-
Notifications
You must be signed in to change notification settings - Fork 0
WIP: Propose centralized experiment tracking in Kubeflow #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| experiment tracking system as part of this proposal. The initial focus is on the SDK and Pipelines experience. | ||
| 1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing | ||
| direct MLFlow UI support is not a targeted goal of this proposal. | ||
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality | |
| 1. **Automatic experiment tracking from the Kubeflow Trainer**: While users can leverage the new functionality |
| direct MLFlow UI support is not a targeted goal of this proposal. | ||
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality | ||
| through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from | ||
| the Kubeflow Training operator as part of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| the Kubeflow Training operator as part of this proposal. | |
| the Kubeflow Trainer as part of this proposal. The community should revisit integrations at a later date. |
| [kubeflow/model-registry#1224](https://github.com/kubeflow/model-registry/issues/1224#issuecomment-3068005968). This is | ||
| meant to be a high-level design with further refinement being done in a Model Registry specific KEP or design document. | ||
|
|
||
| 1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could map to experiments in | |
| 1. **Experiments**: A group of related runs. This would replace experiments in Pipelines and could potentially map to experiments in |
* Update teams to match acls repo Signed-off-by: Anish Asthana <[email protected]> * Remove Chase from KOC Signed-off-by: Anish Asthana <[email protected]> --------- Signed-off-by: Anish Asthana <[email protected]>
anishasthana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a note somewhere in here that we expect this to be followed up by KEPs in KFP and MR for sure, and possibly for Kubeflow Developer Experience?
|
|
||
| Model Registry's current focus is managing metadata for the lifecycle of machine learning models, from registration and | ||
| versioning through deployment and serving. The proposal is to expand the scope to cover experiment tracking to be a | ||
| centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| centralized metadata store for Kubeflow. As part of this expansion of scope, we should consider renaming it to reflect | |
| centralized metadata store for Kubeflow. As part of this expansion of scope, we could consider renaming it to reflect |
| depending on which Kubeflow component they use (e.g., Pipeline runs, Katib experiments, Training Operator results). | ||
| 1. **No unified tracking**: Users cannot easily compare runs, share insights, or maintain consistent metadata across the | ||
| entire Kubeflow ecosystem. | ||
| 1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD (Machine Learning Metadata) creates technical | |
| 1. **Maintenance challenges**: The Kubeflow Pipelines dependency on MLMD creates technical |
| ecosystem. | ||
| 1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model | ||
| Registry, allowing pipeline runs and output artifacts to be automatically logged and tracked. | ||
| 1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD (Machine Learning Metadata) to | |
| 1. **Remove the MLMD dependency from Pipelines**: Decouple Kubeflow Pipelines from MLMD to |
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| 1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Migration Challenges**: The proposal involves migrating away from MLMD (Machine Learning Metadata), which is | |
| 1. **Migration Challenges**: The proposal involves migrating away from MLMD, which is |
| Model Registry's current authorization model gates access at the API level through a proxy rather than at the individual | ||
| resource or namespace level. This approach prevents the same Model Registry instance from being shared across teams that | ||
| require isolation. This limitation conflicts with Kubeflow Pipelines' multiuser mode, which is the default deployment | ||
| strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| strategy for the Kubeflow platform. MLMD (ML Metadata) has the same limitation, so adding multi-tenancy support to Model | |
| strategy for the Kubeflow platform. MLMD has the same limitation, so adding multi-tenancy support to Model |
|
|
||
| **Future Enhancements**: In subsequent phases, we may extend the plugin architecture to include: | ||
|
|
||
| - **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Tracing Support**: Adding tracing capabilities would require new domain models in Model Registry. | |
| - **Tracing Support**: Adding tracing capabilities akin to [MLflow](https://mlflow.org/docs/latest/genai/tracing/)would require new domain models in Model Registry. |
|
|
||
| #### Kubeflow SDK Implementation | ||
|
|
||
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native, | |
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a native |
kramaranya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great @mprahl!
| 1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of | ||
| experiments, runs, metrics, and artifacts without requiring pipeline execution. | ||
| 1. **Provide MLFlow SDK compatibility**: Create an | ||
| [MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be MLflow plugin instead, since these extend MLflow's backend storage capabilities rather than the SDK itself
| [MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the | |
| [MLFlow plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the |
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality | ||
| through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from | ||
| the Kubeflow Training operator as part of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: The MLflow plugin allows saving metadata to the Model Registry backend when using MLflow APIs.
"Kubeflow SDK or MLflow SDK plugin" is confusing, since we could either use Kubeflow SDK or MLflow SDK , with MLflow plugin regardless of which SDK you choose. However, I would just keep Kubeflow SDK here to not confuse users.
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality | |
| through the Kubeflow SDK or MLFlow SDK plugin, we are not implementing automatic experiment tracking directly from | |
| the Kubeflow Training operator as part of this proposal. | |
| 1. **Automatic experiment tracking from the Kubeflow Training operator**: While users can leverage the new functionality | |
| through the Kubeflow SDK with MLflow plugin, we are not implementing automatic experiment tracking directly from | |
| the Kubeflow Training operator as part of this proposal. |
| 1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing | ||
| direct MLFlow UI support is not a targeted goal of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why theoretically? The plugin should definitely enable MLflow UI compatibility with Model Registry -- that's the purpose of MLflow plugins. You just need to install the MLflow package and point the UI to the Model Registry backend (with the Model Registry plugin installed), same as you would with other data sources like local files, databases, etc.
| 1. **MLFlow UI support**: Although the MLFlow SDK plugin should theoretically enable MLFlow UI compatibility, providing | |
| direct MLFlow UI support is not a targeted goal of this proposal. | |
| 1. **MLFlow UI support**: Although the MLFlow plugin should theoretically enable MLFlow UI compatibility, providing | |
| direct MLFlow UI support is not a targeted goal of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think splitting the MLFlow plugin into a separate proposal will help as it removes this kind of issue from the critical path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kramaranya it was mostly due to some of the comments Dhiraj made about some UI issues he encountered when implementing the plugin but I'll remove the word theoretical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etirelli I see your point but I like keeping it together to provide the reader the whole vision and to also ensure the new domain models in the Model Registry map well to MLFlow ones to enable this compatibility. Implementation of the MLFlow plugin can be separate and done later depending on the design/implementation of the Kubeflow SDK working group.
|
|
||
| #### MLFlow SDK Compatibility | ||
|
|
||
| - Implement MLFlow SDK plugins to connect to the Model Registry backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Implement MLFlow SDK plugins to connect to the Model Registry backend | |
| - Implement MLFlow plugins to connect to the Model Registry backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll remove SDK from these occurrences.
| #### Kubeflow SDK Enhancement | ||
|
|
||
| - Provide a more native, Kubeflow-centric experience | ||
| - Simplify setup and configuration | ||
| - Integrate with existing Model Registry SDK functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this sound?
| #### Kubeflow SDK Enhancement | |
| - Provide a more native, Kubeflow-centric experience | |
| - Simplify setup and configuration | |
| - Integrate with existing Model Registry SDK functionality | |
| #### Kubeflow SDK Enhancement | |
| - Provide native experiment tracking capabilities within the Kubeflow SDK using MLflow compatibility | |
| - Simplify experiment tracking setup and configuration | |
| - Enable seamless integration with Model Registry for unified metadata management | |
| - Support both local and remote experiment tracking workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the suggestions, except, remove the MLFlow reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. Thank you @kramaranya!
| significant disruption for the MLFlow community if they were to change their SDK plugin stance, making such changes | ||
| unlikely. | ||
|
|
||
| ## Design Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a digram to visualize the proposal design, similar to what we had for f2f. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! I'll create one before submitting this to the Kubeflow community.
|
|
||
| ### SDK Implementation Details | ||
|
|
||
| #### MLFlow SDK Model Registry Plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #### MLFlow SDK Model Registry Plugin | |
| #### MLFlow Model Registry Plugin |
| Enhancements could include: | ||
|
|
||
| - **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating | ||
| Kubernetes tokens. | ||
| - **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier | ||
| for users to visualize and manage their experiments. | ||
| - **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to | ||
| select one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Enhancements could include: | |
| - **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating | |
| Kubernetes tokens. | |
| - **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier | |
| for users to visualize and manage their experiments. | |
| - **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to | |
| select one. | |
| Enhancements could include: | |
| - **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating | |
| Kubernetes tokens. | |
| - **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier | |
| for users to visualize and manage their experiments. | |
| - **Discovery Services**: Automatically detect available Model Registry installations on the cluster and allow users to | |
| select one. | |
| - **Local Experiments**: Enable users to work with local experiments and selectively publish them to the remote Model Registry backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Thank you.
|
|
||
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native, | ||
| Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring | ||
| the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| the MLFlow SDK Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. | |
| the MLFlow Model Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. |
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native, | ||
| Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Building on the MLFlow SDK Model Registry plugin, we can enhance the Kubeflow SDK to provide a more native, | |
| Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring | |
| Building on the MLFlow Model Registry plugin, we can enhance the Kubeflow SDK to provide a native, | |
| Kubeflow-centric experience. The initial implementation would focus on simplifying setup by automatically configuring |
|
|
||
| - **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts | ||
| across all Kubeflow components. | ||
| - Providing **MLFlow SDK compatibility**, enabling users to leverage familiar APIs while storing data in Kubeflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rephrase this along the lines of "supporting third party" experimentation tracking. Mention MLFlow as a possible target instead of a goal?
|
|
||
| ### Goals | ||
|
|
||
| 1. **Kubeflow centralized experiment tracking**: Create an experiment tracking system that is independent from Kubeflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe merge first and second goals, and rephrase to emphasize separation of concerns and experiment tracking as a first class citizen capability in Kubeflow?
| 1. **Transform Model Registry into a unified metadata store**: Evolve the existing Model Registry component into a | ||
| centralized metadata store that can handle experiments, runs, metrics, and artifacts across the entire Kubeflow | ||
| ecosystem. | ||
| 1. **Pipelines integration**: Integrate Kubeflow Pipelines with the centralized experiment tracking system in Model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrase emphasizing refactoring pipelines to externalize experiment tracking, removing the dependency (tech debt) from Pipelines. This allows pipelines to integrate with MR as well as 3rd party trackers.
| 1. **Extend SDK capabilities for experiment tracking**: Enhance the Kubeflow SDK to support direct logging of | ||
| experiments, runs, metrics, and artifacts without requiring pipeline execution. | ||
| 1. **Provide MLFlow SDK compatibility**: Create an | ||
| [MLFlow SDK plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to leverage the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am of two minds about explicitly setting this as a goal, since the plugin is standalone. Having the MLFlow plugin as a separate enhancement might make it easier to approve, both.
|
|
||
| As a data scientist working with Kubeflow, I want to log my experiments directly from Jupyter Notebooks without being | ||
| forced to use pipelines, so that I can track my iterative model development process naturally. I should be able to use | ||
| familiar MLFlow SDK APIs to log metrics, parameters, and artifacts, and then view all my experiments in a unified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we not have kubeflow SDK support? are we talking only about MLFlow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll split this into two user stories.
3587c70 to
e6dd073
Compare
e6dd073 to
0d0833b
Compare
dhirajsb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall, but imho we don't need to go into details of how to implement model registry resource level access control right now.
Model Registry already support RBAC at the service level, which is a good start for most applications. For architectures that want to restrict access to resources within a shared model registry instance, we can look at adding support for group level access control in model registry in the future.
|
|
||
| The proposal tackles these issues by: | ||
|
|
||
| - **Expanding Model Registry** into a central experiment tracking store for experiments, runs, metrics, and artifacts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model registry was designed from the ground up to be a metadata store for AI/ML platforms. So adding support for experiments is just a matter of adding new metadata resource types to the existing API.
The work for this has already been completed to a large extent in model registry component in the following issues and related PRs:
Add support for Experiment tracking in Model Registry
Add MLflow SDK support for Model Registry as a store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll add a section in ### Model Registry Expansion to mention the original intent of how Model Registry was designed.
| [MLFlow storage plugin](https://mlflow.org/docs/latest/ml/plugins/#storage-plugins) that allows users to continue | ||
| using familiar MLFlow APIs while storing experiment data in Kubeflow's centralized tracking system. This provides | ||
| seamless integration for existing MLFlow code and positions Kubeflow's SDK as complementary to MLFlow rather than | ||
| competitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Registry MLFlow SDK integration will start with Tracking Store plugin initially to support MLflow experiments API.
kubeflow/model-registry#1225
The intent is to allow existing MLflow users the ability to integrate with Kubeflow platform registry. It is not meant to replace the existing Kubeflow Model Registry SDK which will have full support for Kubeflow model registry Experiments, Registry, and Model Catalog APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll reword it to mention the Model Registry SDK and to specify the tracking store plugin explicitly.
| competitive. | ||
| 1. **Implement multitenancy in Model Registry**: Add multitenancy support to Model Registry to enable isolation at the | ||
| Kubernetes namespace level like Kubeflow Pipelines does today. This resolves the multitenancy gap that exists with | ||
| MLMD today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Model Registry deployment architecture supports multiple model registry instances, which can be shared across teams working in multiple namespaces. Tenancy and isolation is achieved through standard Kubernetes RBAC rules to control access to model registry REST API from different namesapces.
But, a model registry instance itself is not aware of or enforce namespace level isolation. This makes access control and authorization an orthogonal and therefore flexible concern.
In other words, Model Registry supports multiple tenants by using an instance per tenant. Applications in a tenant may be distributed across namespaces. A model registry instance or service does not differentiate clients by namespace.
If a pipeline wishes to record the namespace where it was executed, it will need to add that to either an existing metadata property or use a custom metadata property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to the discussion in #1 (comment). So let's consolidate that discussion to that thread.
| - Filter by timestamps (before, after) | ||
| - Fuzzy searching on entity names | ||
|
|
||
| ### Multi-tenancy Implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Registry already support multiple tenants with dedicated instance per tenant deployment model. Tenants can span multiple namespaces and access control is enforced through Kubernetes RBAC for the Model Registry service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally consider that a workaround rather than supporting multiple tenants and it would be a significant user experience and maintenance burden to require a new Model Registry instance for every isolated Kubeflow namespace. I'd like to try to work together towards a solution on this in this thread: #1 (comment)
| would address this gap for all of Kubeflow's metadata needs. | ||
|
|
||
| This proposal adds an authorization mode to Model Registry that leverages Kubernetes subject access review. In this | ||
| mode, all entities would be mapped to Kubernetes namespaces on the cluster to provide namespace-level isolation. For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot go back to the namespace scoped limitation of mlmd architecture before. A registered model is not tied to a specific namespace, but shared across all clients within a higher level tenant scope. Model Registry will implement multi-tenancy but use Kubernets user groups and roles to do so, which decouples client location (namespace) from resource location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal is to have an authorization mode that is namespaced. So in that mode, a namespace would be required on all entities. When not in that mode, it's not a field that is exposed/accepted by the API.
We could start out by having this mode only apply to the new experiment tracking entities but still have the existing entities be global to the instance.
@dhirajsb the reason behind needing to solve the multi-tenancy issue in MR in the near future is that some KFP users are blocked from using KFP v2 due to MLMD being required in KFP v2 and it not supporting multi-tenancy. The main way to install KFP is a single instance per cluster with namespace level isolation. If we switch over from MLMD to Model Registry, we just shift the same gap to another service. Do you have any other suggestions on how to add RBAC to Model Registry? |
0d0833b to
39d8369
Compare
Signed-off-by: Yuki Iwai <[email protected]>
4a75ea2 to
99f1fc4
Compare
|
|
||
| - **Authentication**: Provide helpers for handling authentication with the Model Registry service, such as generating | ||
| Kubernetes tokens. | ||
| - **Integrated UI Access**: Provide direct links to the Model Registry UI from within the Kubeflow SDK, making it easier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to suggest that the experiments UI will be part of the model Registry UI - is that correct? Or is the intention for the Experiments UI to be a standalone component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, expanding the MR UI to include experiments was the intent. Thanks for pointing out that this wasn't clear. My latest push adds this clarification.
99f1fc4 to
8b33ad9
Compare
| Registry plugin and creating convenient wrapper functions around the supported MLFlow SDK APIs. This integration should | ||
| feel cohesive with the existing Model Registry SDK functionality for registering models. | ||
|
|
||
| Enhancements could include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs. The 4 points below for me would more naturally live in the model registry SDK as they are all Registry/Experiment specific features.
I think specifying the scope of features for the Model Registry SDK will allow us to know what to build upon in the Kubeflow SDK. I think this will avoid duplication of effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I echo @briangallagher 's comment and I would like to avoid duplication of efforts in the Kubeflow SDK.
I understand you want to use the MLFlow plugin for experiment tracking, and that is covered by the great work done by @dhirajsb and @syntaxsdev for the mlflow plugin.
When it comes to RegisteredModel, ModelVersion, ModelArtifact etc and lifecycle of Model Registry user operations, we have already the MR py sdk, and this can be integrated in the Kubeflow SDK scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the distinction between what the Kubeflow SDK should provide versus what the model registry SDK will provide. My understand is that the MR SDK will contain the mlflow plugin, plus an updated client sdk to reflect the new REST APIs.
The MLFlow plugin won't be part of the MR SDK per se, it's just a module users can install to integrate with MR using MLflow SDK.
The MR SDK will have it's own APIs for Experimentation as well as the existing RegisteredModel resources. So, I agree that duplicating Experimentation support in a separate Kubeflow SDK doesn't add value. Kubeflow SDK could instead focus on making it easier to locate MR services and configure an MR SDK client that users could use to access MR API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhirajsb when you say "The MR SDK will have it's own APIs for Experimentation", are you saying the OpenAPI generated client code or custom methods like log_metric?
If it's the OpenAPI generated code, my thought is that the Kubeflow SDK can be the integration point for Model Registry and the Model Registry MLFlow plugin like the POC. We later create bespoke methods like log_metric in the Model Registry SDK and replace those transparently in the Kubeflow SDK if we need more functionality.
8b33ad9 to
336aa5a
Compare
Signed-off-by: mprahl <[email protected]>
336aa5a to
0eefb9d
Compare
|
Since it's been a week and I believe I've addressed most of the feedback, I opened a PR to upstream to continue the discussion there: |
Replaced by kubeflow#892