Tianbao Xie's picture

Tianbao Xie PRO

tianbaoxiexxx

·

https://tianbaoxie.com

AI & ML interests

NLP, AI, RL, Robotics

Recent Activity

liked a model 4 days ago

xlangai/Aguvis-7B-720P

upvoted an article 8 days ago

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

upvoted a paper 9 days ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

View all activity

Organizations

tianbaoxiexxx's activity

upvoted an article 8 days ago

Article

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

By

•

8 days ago

• 11

upvoted a paper 9 days ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 15 days ago • 78

upvoted a paper 30 days ago

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Paper • 2412.09605 • Published 30 days ago • 27

upvoted 2 papers about 1 month ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 124

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 59

upvoted 2 papers about 2 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 71

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published Nov 15, 2024 • 31

upvoted 2 papers 2 months ago

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 46

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 32

upvoted 3 papers 4 months ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 61

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 140

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 44

upvoted 3 papers 5 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 136

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 98

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

upvoted 2 papers 6 months ago

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 33

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Paper • 2407.10956 • Published Jul 15, 2024 • 6

upvoted a paper 7 months ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 53

upvoted an article 8 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15, 2024

• 171

upvoted a paper 8 months ago

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19, 2024 • 54