Skip to content

Automated testing and benchmarking for code generation agents.

Notifications You must be signed in to change notification settings

neuraloverflow/agenteval

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Get started

To get started:

create .env with OPENAI_API_KEY

yarn install

node evals.js

The eval in eals/eval-001 will be run ten times. The results will be saved to ./output.

Eval structure

Each eval contains:

  • app: The codebase before transformation.
  • prompt.py: A description of the transformation to be made.
  • solution: The canonical solution with the complete codebase transformed.

Purpose

Using integration tests, you can automate the testing of your agent or run Monte Carlo simulations.

AgentEval

Demo

AgentEval.Demo.mp4

About

Automated testing and benchmarking for code generation agents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 89.0%
  • HTML 11.0%