Building AI agents and the evaluation systems that make them reliable.
Currently building open-source agent benchmarks at UC Berkeley.
UC Berkeley MIDS · Amgen AI Innovation Lab · AgentBeats core contributor
| Project | What I Built |
|---|---|
| AgentBeats Evaluation Pipeline | First automated evaluation pipeline for CORE-Bench. Designed step-level metrics to replace binary scoring, improving agent accuracy from 51% to 63% on reproducing scientific paper results. |
| LLM Reasoning Efficiency Study | Quantified chain-of-thought verbosity vs accuracy tradeoffs. Found that additional reasoning tokens yield 2x the accuracy benefit on complex problems vs simple ones. |
| RAG Campus Assistant | Built custom Scrapy pipelines to transform unstructured campus data into a searchable knowledge base. Migrated to OpenAI Assistants API within days of its November 2023 beta release. |
Previously Amgen AI Innovation Lab — Designed evaluation frameworks for drug discovery agents using LLM judges, embedding metrics, and synthetic test generation. Increased test coverage by 59%.
CS @ CSUCI (magna cum laude) · CA

