Collections
Discover the best community collections!
Collections including paper arxiv:2404.05719
-
Towards a World-English Language Model for On-Device Virtual Assistants
Paper • 2403.18783 • Published • 4 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 125 -
ReALM: Reference Resolution As Language Modeling
Paper • 2403.20329 • Published • 21 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83
-
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 19 -
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 74 -
DragAnything: Motion Control for Anything using Entity Representation
Paper • 2403.07420 • Published • 13 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 31
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 19 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 5 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3
-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 42 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 53 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 19
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 187 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 48 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 83 -
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Paper • 2409.02897 • Published • 45
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 45 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 44 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 19