sbarman25
's Collections
Agentic
updated
GAIA: a benchmark for General AI Assistants
Paper
•
2311.12983
•
Published
•
187
Viewer
•
Updated
•
932
•
730
•
173
Viewer
•
Updated
•
253
•
281
•
94
AppAgent: Multimodal Agents as Smartphone Users
Paper
•
2312.13771
•
Published
•
52
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Paper
•
2401.01614
•
Published
•
22
WebVoyager: Building an End-to-End Web Agent with Large Multimodal
Models
Paper
•
2401.13919
•
Published
•
27
LARP: Language-Agent Role Play for Open-World Games
Paper
•
2312.17653
•
Published
•
31
Viewer
•
Updated
•
1.23k
•
2.33k
•
47
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper
•
2402.01622
•
Published
•
34
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for
Verifiers of Reasoning Chains
Paper
•
2402.00559
•
Published
•
3