[EPIC] Runtime/Engine #132

EItanya · 2025-03-11T18:01:52Z

Description

Runtime/Engine improvements comprises all of the work that has to be done to the core agent framework.

Autogen laid the groundwork for an expressive, declarative API to configure agents/workflows. However, there are certain additions we will have to make in order for the system to solve all of the use-cases we're envisioning.

Multi-Agent systems

The first release of kagent supports single agents in order, but more complex systems build up of many agents will be a fast follow.

True Graph Execution

In order to unlock all use-cases, true declarative graph style execution needs to be added to autogen. There is already an issue here upstream.

Workflows

A workflow is a pre-determined set of steps a set of agents to perform. This workflow can either be a simple linear progression, or a DAG (directed acrylic graph). The system we design should support both types.

Support for multiple LLM providers

Currently we only support OpenAI as our LLM provider, this was an expedient decision when testing and releasing, but in future this will no longer be required, and testing different scenarios with different models will be prudent.

The text was updated successfully, but these errors were encountered:

jrbe228 · 2025-03-17T19:46:31Z

True Graph Execution

In order to unlock all use-cases, true declarative graph style execution needs to be added to autogen. There is already an issue here upstream.

Workflows

A workflow is a pre-determined set of steps a set of agents to perform. This workflow can either be a simple linear progression, or a DAG (directed acrylic graph). The system we design should support both types.

Is it possible to merge the above categories into a single category "DAGs"? Saying "graphs" opens the door to cyclic graphs, which I believe is out-of-scope for the project. Also the term "workflows" is applied to so many situations... it might be confusing.
Is it possible to use an existing DAG engine? Integration with existing engines may have already been evaluated. I'm new to the Autogen and Kagent roadmaps. In theory Kagent might integrate with a declarative, K8s-native DAG engine.

Support for multiple LLM providers

Looking forward to it! Especially this one.

However, there are certain additions we will have to make in order for the system to solve all of the use-cases we're envisioning.

Like the independent scaling use case! Currently the runtime deploys as 3 containers within 1 pod -

Eventually we would want 1 container per pod (plus any helpful sidecars) for independent scaling and resilience. Then pod-to-pod communication happens via ClusterIP services. I started a PR for that purpose.

EItanya · 2025-03-19T22:38:36Z

Hi @jrbe228, thanks so much for your interest in these features! I agree with you about DAGs and we have been in active talks with the AutoGen community to get that work done.

In terms of the model providers, we definitely wanna add Azure ASAP, it's very high on our list. Hopefully we will have an official priority list shortly which we can share.

In terms of splitting up the containers into multiple pods, it's definitely an interesting idea but I'm not sure in this case it's as simple as just splitting it up. I'm not saying that it's not worth doing, but I don't think the added complexity is worthwhile in the short term. Currently the focus is on ease of use, and a single pod can be great for that. Would definitely love to hear feedback from others about what they think, but not something we're looking to do right now.

jrbe228 · 2025-03-20T12:38:19Z

Multi-container pods can make sense in some situations. But it would be rare for a UI + Controller + App system. I would be surprised if you can find a similar example in production. Regardless we can wait for community feedback. If you decide to try migration to multiple pods, my hope is the process would be easy. Mostly changing hostnames to target ClusterIP services instead of localhost.

EItanya · 2025-03-20T12:58:55Z

I 100% agree. Mostly in the short term I'm worried about 2 things:

Simplicity
Testing

Multiple containers in a pod is definitely not ideal, but it can be simpler when trying out a new system. Since we aren't prod-ready yet I think that's an ok trade-off.

Our testing is definitely not good enough yet to make me confident in any systemic changes like that. We are hoping to build out that framework soon, but until then we'd prefer to keep the change set smaller.

We definitely appreciate the input though, and I agree that this would be a much better end goal for us.

In fact, would you mind creating a new issue for this specifically, it doesn't feel like it exactly fits here, this issues is really meant to address additions we'd like to make on the Autogen side of the house, not necessarily the deployment details. I think it deserves its own detailed issue/design.

jrbe228 · 2025-03-20T15:36:57Z

Those reasons make sense. Basic functionality is more important than scaling for new projects. Early adopters should have a good experience during first contact with Kagent.

New issue created! Future discussion can continue there.

psschwei · 2025-04-03T17:25:15Z

Is it possible to use an existing DAG engine?

I'm also interested in the answer to this question. The CNCF landscape has a few workflow style tools already, would be great IMO to utilize one of them if possible. Or is there a hard dependency on Autogen for how workflows are implemented?

EItanya mentioned this issue Mar 19, 2025

Use independent pods / deployments for each container service #180

Open

This was referenced Mar 20, 2025

Add support for AWS Bedrock and Azure Open AI #174

Closed

Use independent pods / deployments for each container service #184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Runtime/Engine #132

[EPIC] Runtime/Engine #132

EItanya commented Mar 11, 2025

jrbe228 commented Mar 17, 2025 •

edited

Loading

True Graph Execution

Workflows

EItanya commented Mar 19, 2025

jrbe228 commented Mar 20, 2025

EItanya commented Mar 20, 2025

jrbe228 commented Mar 20, 2025

psschwei commented Apr 3, 2025

[EPIC] Runtime/Engine #132

[EPIC] Runtime/Engine #132

Comments

EItanya commented Mar 11, 2025

Description

Multi-Agent systems

True Graph Execution

Workflows

Support for multiple LLM providers

jrbe228 commented Mar 17, 2025 • edited Loading

True Graph Execution

Workflows

EItanya commented Mar 19, 2025

jrbe228 commented Mar 20, 2025

EItanya commented Mar 20, 2025

jrbe228 commented Mar 20, 2025

psschwei commented Apr 3, 2025

jrbe228 commented Mar 17, 2025 •

edited

Loading