[PROPOSAL] Message body structure #473

JanPokorny · 2025-03-27T13:49:53Z

JanPokorny
Mar 27, 2025
Maintainer

Message body structure

Motivation

We want to support various message formats real-world agents may use: however, we don't want to "overshoot" it. While JSON Schema may come to mind as a broadly supported standard for defining input and output schemas, it's quite powerful and supports complex cases like tagged unions, not, allOf, oneOf, etc. This makes reasoning over it hard -- for example in the case of a partial response streaming, or when deciding if one agent's output is compatible with another agent's input. Furthermore, plain JSON isn't the best to show in the UI -- it always needs some sort of a presentation layer. And last but no least, we don't always want to inline all data in the response -- especially stuff like multimedia files will be better handled by saving it to an S3-like storage and passing just the URL around.

For this reason, we present a custom, semi-structured message body format, including a way to define schemas over this format. This format would be used to define both input and output schemas for agents, as well as schemas for notifications, interrupts and other intermediate communication.

Proposal

`MessageBody`

MessageBody consists of parts, roughly corresponding to files / blobs / objects. Each part has an optional name, content type, and either inline content or content URL.

When a name is provided, it must be a valid POSIX absolute file path -- meaning that it starts with a slash and contains only characters A-Z, a-z, 0-9, dot, dash, underscore, and slash, with two consecutive slashes and not ending in a slash. Example: /foo/bar.png. The provided names must be unique within a given message.

A message is a single ordered list of parts. Parts without names typically represent the main content to be displayed to the user (like text, images, audio), while named parts can serve as attachments, metadata, or state information.

The filesystem-like structure means that when necessary, nested data can be expressed (e.g. /sources/1/urls/5), but the structure still acts as a flat list for practical purposes -- like presentation in the UI, or when feeding into an LLM. Alternatively, one can just use application/json part and store structured data there.

interface MessageBodyPart {
    name?: string;
    contentType: string = 'text/plain';
} & ({
    content: string;
    contentEncoding?: 'plain' | 'base64' = 'plain';
} | {
    contentUrl: string;
})

type MessageBody = MessageBodyPart[];

`MessageBodySchema`

MessageBodySchema defines what kind of message is expected. The format is similar to Message, but the MessageBodySchemaParts only define a name and contentType with standard glob wildcards *, ** and {a,b} usable in names and content types.

In order for messages to be valid, all of the parts must match at least one MessageBodySchemaPart in the MessageBodySchema. Furthermore, if a MessageBodySchemaPart is required, it must be matched by at least one part in the Message.

interface MessageBodySchemaPart {
    name: string = "/**";
    contentType: string = "*/*";
    required: boolean = false;
}

type MessageBodySchema = MessageBodySchemaPart[];

Examples

Chat agent

A simple chat agent may define the output schema as:

[
    {
        "contentType": "text/plain",
        "required": true
    }
]

...with example message:

[
    {
        "content": "Hello, world!"
    }
]

Multi-modal chat agent

When the chat agent supports images, it may define the output schema as:

[
    {
        "contentType": "text/plain",
        "required": true
    },
    {
        "contentType": "image/*",
        "required": false
    }
]

Example:

[
    {
        "content": "This is a cute cat:"
    },
    {
        "contentType": "image/png",
        "contentUrl": "https://s3.example.com/12345678901234567890/image.png"
    },
    {
        "content": "Nice, huh?"
    }
]

Software-writing agent

A software-writing agent, which produces files, may define the output schema as:

[
    {
        "contentType": "text/plain",
        "required": true
    },
    {
        "name": "/files/**",
        "contentType": "*/*",
        "required": false
    }
]

[
    {
        "name": "/document/*/part/*",
        "contentType": "{text/plain,image/png}",
        "required": true
    }
]

Example:

[
    {
        "content": "I have created a hello world project in Python."
    },
    {
        "name": "/files/hello_world.py",
        "contentType": "text/x-python",
        "content": "print('Hello, world!')",
    }
]

Researcher agent

A researcher agent may provide a long write-up with sources, which are represented as attachments:

[
    {
        "contentType": "text/markdown",
        "required": true
    },
    {
        "contentType": "image/*",
        "required": false
    },
    {
        "name": "/sources/*",
        "contentType": "text/x-uri",
        "required": false
    }
]

Example:

[
    {
        "contentType": "text/markdown",
        "content": "Cats are known to be [cute](source:1) and [funny](source:2).",
    },
    {
        "name": "/sources/1",
        "contentType": "text/x-uri",
        "content": "https://example.com/cat-facts"
    },
    {
        "name": "/sources/2",
        "contentType": "text/x-uri",
        "content": "https://example.com/it-came-to-me-in-a-dream"
    }
]

[
    {
        "name": "/documents/*",
        "contentType": "text/markdown",
        "required": true
    },
    {
        "name": "/instruction",
        "contentType": "text/plain",
        "required": true
    }
]

SDK support

It's clear that this format is too "low-level" for structured validators like Zod. It's expected that the SDK would provide convenience wrappers for validation and unification of content (encoding, remote vs. inline). We are also considering providing some "filesystem-like" operations to simplify working with the structure, e.g. messageBody.list("/sources/*").

Let's discuss!

Is this structure and schema language clear and understandable?
Are there any potentially missing use cases (types of agents that would be hard / impossible to express with this)?
Do you think this has advantages or disadvantages over other formats?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BeeAI

[PROPOSAL] Message body structure #473

{{title}}

Replies: 0 comments

Select a reply

BeeAI

[PROPOSAL] Message body structure #473

JanPokorny Mar 27, 2025 Maintainer

Message body structure

Motivation

Proposal

MessageBody

MessageBodySchema

Examples

Chat agent

Multi-modal chat agent

Software-writing agent

Researcher agent

SDK support

Let's discuss!

Replies: 0 comments

JanPokorny
Mar 27, 2025
Maintainer

`MessageBody`

`MessageBodySchema`