This repository contains a comprehensive demonstration of the Solace Agent Mesh (SAM) evaluation framework. The primary focus of this framework is to evaluate the behavior of agents. It also provides tools for benchmarking Large Language Models (LLMs) within the SAM ecosystem.
Our evaluation framework offers two primary capabilities:
This feature allows you to test your agents by tracing the path they take to complete a task and verifying that their final response is correct. This can be treated as a "unit test" for your agents, providing a clear pass/fail grade.
This feature is designed for comparing the performance of different LLMs. You can run the same set of tasks with two or more LLMs and score them based on efficiency and the quality of their responses.
The following items are required when creating your own evaluations. To simply run the included demo, you can skip this section and proceed to the Running the Demo tutorial below.
- SAM Initialization: The Solace Agent Mesh (SAM) must be initialized (
sam init). - Broker: A running Solace PubSub+ broker is required. The framework does not support "dev mode".
- REST Gateway: The
sam-rest-gatewayplugin must be installed. You can find and install it usingsam plugin catalog.
This section provides a tutorial for running the included demo.
-
Set up your environment variables:
Ensure that your
.envfile is populated with all the necessary values. -
Create and activate a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
-
Run the evaluation suite:
sam eval demo/test_suite_config.jsonThis command will execute the test cases defined in
demo/test_suite_config.jsonand save the results in theresults/directory.
To run an evaluation suite, use the following command:
sam eval <path/to/test/suite>This command will execute the test cases defined in the test suite configurations and save the results in the results/ directory.
The templates/ directory provides starter files to help you create your own evaluations. For more details on the structure and purpose of these files, please refer to the templates/ folder.