Skip to content

Aftabbs/Evaluating-AI-Agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Evaluating AI Agents

image

This repository showcases our journey in developing a robust Data Analyst Agent, enhanced with routing, tracing, and comprehensive evaluation mechanisms. Our primary tools include Arize AI's Phoenix and integrations with OpenTelemetry (OTel).

Project Overview

  1. Data Analyst Agent Development: We began by creating an AI agent capable of performing data analysis tasks, leveraging advanced machine learning models to interpret and process data effectively.

  2. Routing and Tracing Integration: To manage and monitor the agent's operations, we incorporated routing mechanisms and implemented tracing using OpenTelemetry. This setup allows for detailed tracking of the agent's decision-making pathways and performance metrics.

  3. Trajectory Evaluation and Convergence Scoring: Understanding the efficiency of the agent's problem-solving process is crucial. We introduced trajectory evaluation to assess the agent's reasoning paths and calculate convergence scores, which measure the consistency and optimality of the agent's actions over time. citeturn0search1

  4. Structured Evaluations and Monitoring: Building upon the evaluation framework, we established structured methods to continuously assess and monitor the agent's performance, ensuring reliability and facilitating ongoing improvements.

Key Components

  • Arize Phoenix Integration: Phoenix serves as our AI observability platform, enabling effective experimentation, evaluation, and troubleshooting. It seamlessly integrates with OpenTelemetry, providing comprehensive tracing and evaluation capabilities. citeturn0search9

  • OpenTelemetry (OTel) Implementation: We utilized OTel for standardized tracing across our AI systems. The arize-otel package offers a convenient setup for OpenTelemetry, streamlining the tracing process and ensuring compatibility with Phoenix. citeturn0search15

  • Convergence Score Calculation: To quantify the agent's efficiency, we compute a convergence score. This metric evaluates whether the agent can respond to queries in an optimal number of steps, providing insights into its operational effectiveness. citeturn0search1


image

image

image

image

image

image

AI Agent Structure

  • Router (Segregates task's and tools based on use cases)
  • Skills (ex: RAG)
  • Memory And State (Storing each response as a memory)

Router

image

Skills

image

Memory and State

image

Example Agent Build (Data Analyst Agent)

image

Tracing the Agent

image

Adding Router and Skill Evaluations

image

image

Adding Trajectory Evaluations

image

image

Adding Structure to Evaluations

image

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Arize AI is an AI observability and LLM evaluation platform built to enable more successful AI in production.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors