[PROPOSAL] Communication structure #474
JanPokorny
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Communication structure
Motivation
The communication with a modern LLM agent is quite complex. While the most basic use case is "send request - receive reply", we quickly run into more complex scenarios, such as:
This proposal aims to provide a general framework for communication with agents, without imposing rigid structure on how the agents are supposed to work.
Proposal
Data structures
We define these data structures:
idle
state, awaiting user message.It needs to be understood that these data structures are quite "virtual" / "abstract". They do not correspond directly to any physical communication channels, as these will always have limitations and protocol-related details, like refused messages, timeouts, etc. "Run" can be thought of as a shared/distributed data structure which is kept in sync between the two parties using some sort of communication channel (which is not defined in this proposal).
This is how we define the types:
High-level overview
The client and server (caller and agent) both know the
CommunicationSchema
. It's understood that eachRun
starts with no messages and in theidle
state. Then, in theidle
state ofCommunicationSchema
, it's defined what messages are valid to be sent in theidle
state -- essentially starting the conversation.A simple agent may just switch between
idle
andrunning
states, but more complex agents may have many more possible states, like:done
,waiting_for_user
,waiting_for_function_call
, etc. The idea is that at each point in the conversation, only some messages are valid -- for example when waiting for "human in the loop" confirmation, the only valid way is a message to confirm / deny the request, and not e.g. another query for the agent.Examples
Chat agent
This is a communication schema for a simple chat agent. It has two states:
idle
andrunning
. In theidle
state, the agent can receive a message from the user. In therunning
state, the agent generates a response and sends it back.Researcher agent
Researcher agent does not support long-lasting conversations, so after providing the reply, it transitions to an empty
done
state.Function-calling agent
Function-calling means executing code on the client. This agent acts like a chat agent, but has a special state for function calls, which request the client to execute a function with a given arguments. The client must respond with the result of the function call in order for the agent to continue.
Interruptable chat agent
This agent can be interrupted by the user while generating a message.
Long-running agent
A long-running agent still needs to be started up by a client message, but then it can run indefinitely and report information about its progress.
SDK support
Given that we have defined a shared / distributed data structure, the problem of communication breaks down into synchronization of the data structure. The SDK would support this over several protocols (HTTP SSE, WebSocket, etc.), so that the caller could use the best protocol for the given situation, while the agent won't have to implement all of them. From the agent's implementation point of view, the SDK would do the heavy lifting and the agent would be directly provided with the run messages and state.
Let's discuss
To reemphasize, this proposal only provides a general communication framework over a shared "run" data structure, not a way to map messages to actual communication channels like HTTP / WebSocket etc.
Beta Was this translation helpful? Give feedback.
All reactions