[PROPOSAL] Communication structure #474

JanPokorny · 2025-03-27T15:01:21Z

JanPokorny
Mar 27, 2025
Maintainer

Communication structure

Motivation

The communication with a modern LLM agent is quite complex. While the most basic use case is "send request - receive reply", we quickly run into more complex scenarios, such as:

continuing the conversation after agent answer
agent asking for clarification before starting
agent asking for more info during execution (e.g. function calls / interrupts)
human in the loop (e.g. ask before running tool)
agent notifying of partial results (e.g. LLM streaming, tool calling)
long-running agents that can send a message to the caller anytime

This proposal aims to provide a general framework for communication with agents, without imposing rigid structure on how the agents are supposed to work.

Proposal

⚠️ For this proposal, we build on another proposal for message body format and schema, defining MessageBody and MessageBodySchema. This proposal is however independent of it, given that it only requires some sort of message body format and schema, be it the proposed one or e.g. JSON Schema. To be able to be specific in examples, we will use the proposed one.

Data structures

We define these data structures:

"Message" - defined by originating party, type, and body.
"Message schema" - defined by originating party, type, body schema, and the name of the next state.
"Run" - a sequence of messages accompanied by a communication schema.
"Communication schema" - finite state machine definition, which determines the current state, which in turn determines what messages are valid as continuations of this run. All runs start in the idle state, awaiting user message.

It needs to be understood that these data structures are quite "virtual" / "abstract". They do not correspond directly to any physical communication channels, as these will always have limitations and protocol-related details, like refused messages, timeouts, etc. "Run" can be thought of as a shared/distributed data structure which is kept in sync between the two parties using some sort of communication channel (which is not defined in this proposal).

This is how we define the types:

interface Message {
    party: 'client' | 'server';
    type: string;
    body: MessageBody;
}

interface Run {
    id: string;
    messages: Message[];
    currentState: string;
}

interface MessageSchema {
    party: 'client' |'server';
    type: string;
    schema: MessageBodySchema;
    nextState: string;
}

type CommunicationSchema = { [state: string]: MessageSchema[] }

High-level overview

The client and server (caller and agent) both know the CommunicationSchema. It's understood that each Run starts with no messages and in the idle state. Then, in the idle state of CommunicationSchema, it's defined what messages are valid to be sent in the idle state -- essentially starting the conversation.

A simple agent may just switch between idle and running states, but more complex agents may have many more possible states, like: done, waiting_for_user, waiting_for_function_call, etc. The idea is that at each point in the conversation, only some messages are valid -- for example when waiting for "human in the loop" confirmation, the only valid way is a message to confirm / deny the request, and not e.g. another query for the agent.

Examples

Chat agent

This is a communication schema for a simple chat agent. It has two states: idle and running. In the idle state, the agent can receive a message from the user. In the running state, the agent generates a response and sends it back.

{
    "idle": [
        {
            "party": "client",
            "type": "user_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ],
    "running": [
        {
            "party": "server",
            "type": "agent_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "idle"
        }
    ]
}

Researcher agent

Researcher agent does not support long-lasting conversations, so after providing the reply, it transitions to an empty done state.

{
    "idle": [
        {
            "party": "client",
            "type": "user_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ],
    "running": [
        {
            "party": "server",
            "type": "agent_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                },
                {
                    "name": "/sources/*",
                    "contentType": "text/x-uri",
                    "required": false
                }
            ],
            "nextState": "done"
        }
    ],
    "done": []
}

Function-calling agent

Function-calling means executing code on the client. This agent acts like a chat agent, but has a special state for function calls, which request the client to execute a function with a given arguments. The client must respond with the result of the function call in order for the agent to continue.

{
    "idle": [
        {
            "party": "client",
            "type": "user_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ],
    "running": [
        {
            "party": "server",
            "type": "agent_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "idle"
        },
        {
            "party": "server",
            "type": "function_call",
            "schema": [
                {
                    "name": "/function",
                    "contentType": "application/json",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ]
}

Interruptable chat agent

This agent can be interrupted by the user while generating a message.

{
    "idle": [
        {
            "party": "client",
            "type": "user_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ],
    "running": [
        {
            "party": "server",
            "type": "agent_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "idle"
        },
        {
            "party": "client",
            "type": "cancel",
            "schema": [],
            "nextState": "idle"
        }
    ]
}

Long-running agent

A long-running agent still needs to be started up by a client message, but then it can run indefinitely and report information about its progress.

{
    "idle": [
        {
            "party": "client",
            "type": "user_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ],
    "running": [
        {
            "party": "server",
            "type": "agent_message",
            "schema": [
                {
                    "contentType": "text/plain",
                    "required": true
                }
            ],
            "nextState": "running"
        }
    ]
}

SDK support

Given that we have defined a shared / distributed data structure, the problem of communication breaks down into synchronization of the data structure. The SDK would support this over several protocols (HTTP SSE, WebSocket, etc.), so that the caller could use the best protocol for the given situation, while the agent won't have to implement all of them. From the agent's implementation point of view, the SDK would do the heavy lifting and the agent would be directly provided with the run messages and state.

Let's discuss

To reemphasize, this proposal only provides a general communication framework over a shared "run" data structure, not a way to map messages to actual communication channels like HTTP / WebSocket etc.

Is this structure general enough? Or is it perhaps too general? Can you think of agents that would be awkward / impossible to implement with this?
Is the concept of conversation having a "state" (in the finite state machine sense) understandable?
Is this a good way to model agent interactions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BeeAI

[PROPOSAL] Communication structure #474

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

BeeAI

[PROPOSAL] Communication structure #474

JanPokorny Mar 27, 2025 Maintainer

Communication structure

Motivation

Proposal

Data structures

High-level overview

Examples

Chat agent

Researcher agent

Function-calling agent

Interruptable chat agent

Long-running agent

SDK support

Let's discuss

Replies: 0 comments

JanPokorny
Mar 27, 2025
Maintainer