-
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Paper • 2410.05243 • Published • 18 -
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Paper • 2401.01614 • Published • 22 -
osunlp/UGround
Image-Text-to-Text • Updated • 6.67k • 19 -
osunlp/UGround-V1-2B
Image-Text-to-Text • Updated • 376 • 5
Boyu Gou
BoyuNLP
AI & ML interests
AI Agents, Foundation Models, GUI Agents
Recent Activity
updated
a model
29 minutes ago
osunlp/UGround-V1-72B
updated
a collection
about 1 hour ago
UGround
upvoted
a
paper
about 17 hours ago
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Organizations
Collections
1
models
None public yet
datasets
None public yet