Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.05719

Papers - Mobile - User Entity Context Understanding

ReALM: Reference Resolution As Language Modeling

Paper • 2403.20329 • Published Mar 29, 2024 • 21
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83

Towards a World-English Language Model for On-Device Virtual Assistants

Paper • 2403.18783 • Published Mar 27, 2024 • 4
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 125
ReALM: Reference Resolution As Language Modeling

Paper • 2403.20329 • Published Mar 29, 2024 • 21
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83

How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7, 2024 • 19
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12, 2024 • 74
DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12, 2024 • 13
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 31

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 5
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

vikhyatk/moondream1

Text Generation • Updated Feb 7, 2024 • 117k • 485
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16, 2024 • 42
Generative Representational Instruction Tuning

Paper • 2402.09906 • Published Feb 15, 2024 • 53
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 104
BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 19

about 7 hours ago

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 11
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 47

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 187
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Paper • 2404.02575 • Published Apr 3, 2024 • 48
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published Sep 4, 2024 • 45

Text to image papers

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 45
VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 44
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

Paper • 2401.01256 • Published Jan 2, 2024 • 19

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs