-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to discover all running models #130
Comments
Thank you for this comment @fbricon ! Per the original proposal document for Model Registry, this is accounted, especially for audit purposes. We have the ServingEnvironment, InferenceService entities in the OpenAPI and mapped, for this scope. I realize now:
Ultimately the "valid endpoints" source-of-truth depends on the serving runtime used. I realize now:
For context:
sequenceDiagram
actor U as UI Dashboard
participant K as Kubernetes
participant MC as ODH Model Controller
participant MR as Model Registry
U->>+MR: Retrieve indexed model version
MR-->>-U: Indexed model version
U->>K: Create InferenceService (ISVC)
Note right of U: Annotate/Label the ISVC with indexed <br/> model information, like RegisteredModel and <br/>ModelVersion IDs.
Note right of K: Here all operators/controllers in charge to deploy<br/> the model will make<br/> their actions, e.g., KServe or ModelMesh.
loop Every ISVC creation/deletion/update
K-->>+MC: Send notification
MC->>+K: Retrieve affected ISVC in the cluster
K-->>-MC: ISVC resource
MC->>+MR: Create/Update InferenceService in Model Registry
Note left of MR: InferenceService records in Model Registry<br/>are used to keep track of every deployment that<br/>occurred in the monitored Kubernetes cluster.
MR-->>-MC: InferenceService record
MC-->>-K: Update ISVC with Model Registry record ID
end
(edit: link fixup, typo fix) |
That could be a very useful example, we could create a custom folder that will contain a bare-minimal controller that implements that logic only. Given that controllers/mr_inferenceservice_controller is already pretty much isolated I am not expecting too much effort |
A similar requirement came in for a possible integration with Backstage. I am not sure I understood the proposal above, is there a way to solve this for the Kubeflow offering without an operator? should we deploy yet another container alongside REST Server for this, I typically would like to see something working OOB rather than configuring something explicitly by the user. |
thanks for looping this @rareddy , The requirements are more naturally and clearly emerging recently, here is what I captured so far:
Beyond these general requirements,
Given the Architecture proposal advanced by @ederign in KF community meeting 2024-08-06 (mailing-list post), So in conclusion, my recommendation is to tackle this capabilities in the Model Registry BFF, as that would be the most natural fit considering all the most recent directions. btw @ederign assuming this, what would be the best way to formalize this BFF functionality/requirment, please? |
@tarilabs, you are right. Having multiple clients consume our APIs is precisely one reason we designed the BFF. Having VS Code and Backstage consuming our BFF would be awesome. @tarilabs Currently, we are planning to 'talk' with Kubernetes only to fetch the MR endpoint. After getting the MR endpoint, the BFF will do REST calls to the Model Registry REST API to do all operations/data that are currently needed in the MR Web UI. I want to double-check if the requirements you described can be fulfilled by Model Registry REST API. Or would the BFF be required to 'talk' with another Kubeflow project (Kserve, perhaps) to provide all data needed for them? If MR REST API can provide all the data needed, a good starting point for our discussion would be understanding the endpoints and JSON schema needed for backstage and VS Code. Then, we can check if there is an overlap with the APIs that we are currently planning for the Web UI or if we need a new endpoint. I'm happy to implement those in the community. If Model Registry REST API cannot fulfill those requirements, the BFF will be required to 'talk' with other Kubeflow projects; I suggest we hold a design session to discuss the implications of this for our architecture (orthogonal use cases). Either way, I'm working towards a PR to add Open API + Swagger definition for the current APIs. I'll send something this week! |
I just want to clarify I did not imply "talking to other projects", but i.e.: something like This is required for the R2 flow, and further to support a user story when a presently running model, to be catalog/index'd on Model Registry. The rest sounds aligned to me, and happy to discuss live anytime! |
I just had a quick call with @tarilabs, and we agree that BFF is the best option for this use case. So what we need to move forward is:
|
@tarilabs I thought MR created InferenceService entities and with the above use of reconciling we are collating the deployment info which could then directly be exposed through MR REST API. Since we are going to do reconciler for StorageInitializer why not just use that? I understand the BFF proposition, but thinking about how would external access to Backstage components need to deal with two different endpoints, security etc. |
To baseline the discussion:
With the above premised:
But trying to walk in those shoes anyway, even if we exploit the auditory logical model entries for the fresh snapshot purpose, it won't solve for the requirement of knowing Models deployed which are not indexed/catalogued in Model Registry. For these reasons, I believe the BFF approach as I mentioned in #130 (comment) is to me the most appropriate. To me, we need:
.
I'm not sure I understood this comment. .
This is a matter of Deployment model of BFF, and if it becomes "an issue" to me this would be a blocker well beyond backstage integration worth of being resolved fully. . Hope these are relevant comments for considerations, and hope putting them in writing was helpful but I expect this is a conversation easier to have also in the meetings! |
I'm not sure I understand who will be responsible for querying KServe's Isvc (kubectl get isvc). Will it be the model-registry, under the hood? Users? If the latter, my understanding (from discussions with @guimou), is those resources will most likely be under namespaces unlikely to be available to regular users |
@fbricon this discussion is indeed to avoiding having to ask users to kubectl get isvc. This discussion, as the comment are showing, is about implementation choices for how to do it within Model Registry scope, between what was recently presented (BFF) and previously available reconcile loop (intended for Auditing). Hope this clarifies. |
@fbricon @rareddy, In short, the gist of what we are discussing is the BFF becoming the API for such services. {VSCode/Backstage} => REST call => BFF = abstracts, coordinate and format data => {K8s resources | Model Registry APIs} For sure, we will going to need to discuss security and other implications, but first, we need to agree if the BFF will be the 'API' for those external integrations. |
[pull] main from kubeflow:main
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
#130 (comment) this is being worked on as part of InferenceService reconciler and related tasks, last: |
Is your feature request related to a problem? Please describe.
From a tooling standpoint, we need the ability to discover all running LLM endpoints, so we can pick one and use it as an AI assistant in an IDE (using the continue.dev extension in VS Code/IntelliJ for instance)
Describe the solution you'd like
The model endpoints should be listed with at least their label,type, and api url e.g.
Describe alternatives you've considered
AFAIK, there's no other way to discover running inference engines at the moment.
cc @amfred
The text was updated successfully, but these errors were encountered: