Skip to content

[EPIC] Runtime/Engine #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EItanya opened this issue Mar 11, 2025 · 6 comments
Open

[EPIC] Runtime/Engine #132

EItanya opened this issue Mar 11, 2025 · 6 comments

Comments

@EItanya
Copy link
Contributor

EItanya commented Mar 11, 2025

Description

Runtime/Engine improvements comprises all of the work that has to be done to the core agent framework.

Autogen laid the groundwork for an expressive, declarative API to configure agents/workflows. However, there are certain additions we will have to make in order for the system to solve all of the use-cases we're envisioning.

Multi-Agent systems

The first release of kagent supports single agents in order, but more complex systems build up of many agents will be a fast follow.

True Graph Execution

In order to unlock all use-cases, true declarative graph style execution needs to be added to autogen. There is already an issue here upstream.

Workflows

A workflow is a pre-determined set of steps a set of agents to perform. This workflow can either be a simple linear progression, or a DAG (directed acrylic graph). The system we design should support both types.

Support for multiple LLM providers

Currently we only support OpenAI as our LLM provider, this was an expedient decision when testing and releasing, but in future this will no longer be required, and testing different scenarios with different models will be prudent.

@jrbe228
Copy link

jrbe228 commented Mar 17, 2025

True Graph Execution

In order to unlock all use-cases, true declarative graph style execution needs to be added to autogen. There is already an issue here upstream.

Workflows

A workflow is a pre-determined set of steps a set of agents to perform. This workflow can either be a simple linear progression, or a DAG (directed acrylic graph). The system we design should support both types.

  1. Is it possible to merge the above categories into a single category "DAGs"? Saying "graphs" opens the door to cyclic graphs, which I believe is out-of-scope for the project. Also the term "workflows" is applied to so many situations... it might be confusing.
  2. Is it possible to use an existing DAG engine? Integration with existing engines may have already been evaluated. I'm new to the Autogen and Kagent roadmaps. In theory Kagent might integrate with a declarative, K8s-native DAG engine.

Support for multiple LLM providers

  1. Looking forward to it! Especially this one.

However, there are certain additions we will have to make in order for the system to solve all of the use-cases we're envisioning.

  1. Like the independent scaling use case! Currently the runtime deploys as 3 containers within 1 pod -

Image

Eventually we would want 1 container per pod (plus any helpful sidecars) for independent scaling and resilience. Then pod-to-pod communication happens via ClusterIP services. I started a PR for that purpose.

@EItanya
Copy link
Contributor Author

EItanya commented Mar 19, 2025

Hi @jrbe228, thanks so much for your interest in these features! I agree with you about DAGs and we have been in active talks with the AutoGen community to get that work done.

In terms of the model providers, we definitely wanna add Azure ASAP, it's very high on our list. Hopefully we will have an official priority list shortly which we can share.

In terms of splitting up the containers into multiple pods, it's definitely an interesting idea but I'm not sure in this case it's as simple as just splitting it up. I'm not saying that it's not worth doing, but I don't think the added complexity is worthwhile in the short term. Currently the focus is on ease of use, and a single pod can be great for that. Would definitely love to hear feedback from others about what they think, but not something we're looking to do right now.

@jrbe228
Copy link

jrbe228 commented Mar 20, 2025

Multi-container pods can make sense in some situations. But it would be rare for a UI + Controller + App system. I would be surprised if you can find a similar example in production. Regardless we can wait for community feedback. If you decide to try migration to multiple pods, my hope is the process would be easy. Mostly changing hostnames to target ClusterIP services instead of localhost.

@EItanya
Copy link
Contributor Author

EItanya commented Mar 20, 2025

I 100% agree. Mostly in the short term I'm worried about 2 things:

  1. Simplicity
  2. Testing

Multiple containers in a pod is definitely not ideal, but it can be simpler when trying out a new system. Since we aren't prod-ready yet I think that's an ok trade-off.

Our testing is definitely not good enough yet to make me confident in any systemic changes like that. We are hoping to build out that framework soon, but until then we'd prefer to keep the change set smaller.

We definitely appreciate the input though, and I agree that this would be a much better end goal for us.

In fact, would you mind creating a new issue for this specifically, it doesn't feel like it exactly fits here, this issues is really meant to address additions we'd like to make on the Autogen side of the house, not necessarily the deployment details. I think it deserves its own detailed issue/design.

@jrbe228
Copy link

jrbe228 commented Mar 20, 2025

Those reasons make sense. Basic functionality is more important than scaling for new projects. Early adopters should have a good experience during first contact with Kagent.

New issue created! Future discussion can continue there.

@psschwei
Copy link

psschwei commented Apr 3, 2025

Is it possible to use an existing DAG engine?

I'm also interested in the answer to this question. The CNCF landscape has a few workflow style tools already, would be great IMO to utilize one of them if possible. Or is there a hard dependency on Autogen for how workflows are implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants