2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91 • 7
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91 • 7
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 11 days ago • 40
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91 • 7
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 10 days ago • 91 • 7
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Paper • 2410.18558 • Published Oct 24, 2024 • 18
Distill Visual Chart Reasoning Ability from LLMs to MLLMs Paper • 2410.18798 • Published Oct 24, 2024 • 20
Distill Visual Chart Reasoning Ability from LLMs to MLLMs Paper • 2410.18798 • Published Oct 24, 2024 • 20 • 5
Distill Visual Chart Reasoning Ability from LLMs to MLLMs Paper • 2410.18798 • Published Oct 24, 2024 • 20 • 5
Can Knowledge Editing Really Correct Hallucinations? Paper • 2410.16251 • Published Oct 21, 2024 • 54
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89