[PROPOSAL] Message body structure #473
JanPokorny
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Message body structure
Motivation
We want to support various message formats real-world agents may use: however, we don't want to "overshoot" it. While JSON Schema may come to mind as a broadly supported standard for defining input and output schemas, it's quite powerful and supports complex cases like tagged unions,
not
,allOf
,oneOf
, etc. This makes reasoning over it hard -- for example in the case of a partial response streaming, or when deciding if one agent's output is compatible with another agent's input. Furthermore, plain JSON isn't the best to show in the UI -- it always needs some sort of a presentation layer. And last but no least, we don't always want to inline all data in the response -- especially stuff like multimedia files will be better handled by saving it to an S3-like storage and passing just the URL around.For this reason, we present a custom, semi-structured message body format, including a way to define schemas over this format. This format would be used to define both input and output schemas for agents, as well as schemas for notifications, interrupts and other intermediate communication.
Proposal
MessageBody
MessageBody
consists of parts, roughly corresponding to files / blobs / objects. Each part has an optional name, content type, and either inline content or content URL.When a name is provided, it must be a valid POSIX absolute file path -- meaning that it starts with a slash and contains only characters A-Z, a-z, 0-9, dot, dash, underscore, and slash, with two consecutive slashes and not ending in a slash. Example:
/foo/bar.png
. The provided names must be unique within a given message.A message is a single ordered list of parts. Parts without names typically represent the main content to be displayed to the user (like text, images, audio), while named parts can serve as attachments, metadata, or state information.
The filesystem-like structure means that when necessary, nested data can be expressed (e.g.
/sources/1/urls/5
), but the structure still acts as a flat list for practical purposes -- like presentation in the UI, or when feeding into an LLM. Alternatively, one can just useapplication/json
part and store structured data there.MessageBodySchema
MessageBodySchema
defines what kind of message is expected. The format is similar toMessage
, but theMessageBodySchemaPart
s only define aname
andcontentType
with standard glob wildcards*
,**
and{a,b}
usable in names and content types.In order for messages to be valid, all of the parts must match at least one
MessageBodySchemaPart
in theMessageBodySchema
. Furthermore, if aMessageBodySchemaPart
isrequired
, it must be matched by at least one part in theMessage
.Examples
Chat agent
A simple chat agent may define the output schema as:
...with example message:
Multi-modal chat agent
When the chat agent supports images, it may define the output schema as:
Example:
Software-writing agent
A software-writing agent, which produces files, may define the output schema as:
Example:
Researcher agent
A researcher agent may provide a long write-up with sources, which are represented as attachments:
Example:
SDK support
It's clear that this format is too "low-level" for structured validators like Zod. It's expected that the SDK would provide convenience wrappers for validation and unification of
content
(encoding, remote vs. inline). We are also considering providing some "filesystem-like" operations to simplify working with the structure, e.g.messageBody.list("/sources/*")
.Let's discuss!
Beta Was this translation helpful? Give feedback.
All reactions