tildahh

Building AI agents and the evaluation systems that make them reliable.

Currently building open-source agent benchmarks at UC Berkeley.

UC Berkeley MIDS · Amgen AI Innovation Lab · AgentBeats core contributor

Featured Work

Project	What I Built
AgentBeats Evaluation Pipeline	First automated evaluation pipeline for CORE-Bench. Designed step-level metrics to replace binary scoring, improving agent accuracy from 51% to 63% on reproducing scientific paper results.
LLM Reasoning Efficiency Study	Quantified chain-of-thought verbosity vs accuracy tradeoffs. Found that additional reasoning tokens yield 2x the accuracy benefit on complex problems vs simple ones.
RAG Campus Assistant	Built custom Scrapy pipelines to transform unstructured campus data into a searchable knowledge base. Migrated to OpenAI Assistants API within days of its November 2023 beta release.

Previously Amgen AI Innovation Lab — Designed evaluation frameworks for drug discovery agents using LLM judges, embedding metrics, and synthetic test generation. Increased test coverage by 59%.

_{CS @ CSUCI (magna cum laude) · CA}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tildahh

Achievements

Achievements

Block or report tildahh

Featured Work

Pinned Loading

Uh oh!