Solace Agent Mesh (SAM) Evaluation Framework

This repository contains a comprehensive demonstration of the Solace Agent Mesh (SAM) evaluation framework. The primary focus of this framework is to evaluate the behavior of agents. It also provides tools for benchmarking Large Language Models (LLMs) within the SAM ecosystem.

Features

Our evaluation framework offers two primary capabilities:

1. Agent Evaluation

This feature allows you to test your agents by tracing the path they take to complete a task and verifying that their final response is correct. This can be treated as a "unit test" for your agents, providing a clear pass/fail grade.

2. LLM Benchmarking

This feature is designed for comparing the performance of different LLMs. You can run the same set of tasks with two or more LLMs and score them based on efficiency and the quality of their responses.

Prerequisites

The following items are required when creating your own evaluations. To simply run the included demo, you can skip this section and proceed to the Running the Demo tutorial below.

SAM Initialization: The Solace Agent Mesh (SAM) must be initialized (sam init).
Broker: A running Solace PubSub+ broker is required. The framework does not support "dev mode".
REST Gateway: The sam-rest-gateway plugin must be installed. You can find and install it using sam plugin catalog.

Running the Demo

This section provides a tutorial for running the included demo.

Installation

Set up your environment variables:

Ensure that your .env file is populated with all the necessary values.

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install the required packages:
```
pip install -r requirements.txt
```
Run the evaluation suite:
```
sam eval demo/test_suite_config.json
```
This command will execute the test cases defined in demo/test_suite_config.json and save the results in the results/ directory.

Usage

To run an evaluation suite, use the following command:

sam eval <path/to/test/suite>

This command will execute the test cases defined in the test suite configurations and save the results in the results/ directory.

Templates

The templates/ directory provides starter files to help you create your own evaluations. For more details on the structure and purpose of these files, please refer to the templates/ folder.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
demo		demo
results/sam_eval_test		results/sam_eval_test
templates		templates
.env		.env
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solace Agent Mesh (SAM) Evaluation Framework

Features

1. Agent Evaluation

2. LLM Benchmarking

Prerequisites

Running the Demo

Installation

Usage

Templates

About

Uh oh!

Releases 1

Languages

Hugo-Pare/sam-evals-demo

Folders and files

Latest commit

History

Repository files navigation

Solace Agent Mesh (SAM) Evaluation Framework

Features

1. Agent Evaluation

2. LLM Benchmarking

Prerequisites

Running the Demo

Installation

Usage

Templates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages