A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper • 2312.08578 • Published Dec 14, 2023 • 16
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks Paper • 2312.08583 • Published Dec 14, 2023 • 9
Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
TinySAM: Pushing the Envelope for Efficient Segment Anything Model Paper • 2312.13789 • Published Dec 21, 2023 • 13
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation Paper • 2312.17276 • Published Dec 27, 2023 • 15
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Paper • 2204.05862 • Published Apr 12, 2022 • 2
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 79
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4, 2024 • 62
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA Paper • 2312.03732 • Published Nov 28, 2023 • 8
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 70
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon Paper • 2401.03462 • Published Jan 7, 2024 • 27
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation Paper • 2401.04468 • Published Jan 9, 2024 • 48
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9, 2024 • 25
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 42
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages Paper • 2401.05811 • Published Jan 11, 2024 • 6
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 9
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9, 2024 • 14
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16, 2024 • 36
DiffusionGPT: LLM-Driven Text-to-Image Generation System Paper • 2401.10061 • Published Jan 18, 2024 • 29
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 54
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 37
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents Paper • 2401.12963 • Published Jan 23, 2024 • 12
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23, 2024 • 86
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Paper • 2311.16502 • Published Nov 27, 2023 • 35
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30, 2024 • 17
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31, 2024 • 20
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 22
A Long Way to Go: Investigating Length Correlations in RLHF Paper • 2310.03716 • Published Oct 5, 2023 • 9
Transforming and Combining Rewards for Aligning Large Language Models Paper • 2402.00742 • Published Feb 1, 2024 • 11
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 49
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15, 2024 • 36
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17
S-LoRA: Serving Thousands of Concurrent LoRA Adapters Paper • 2311.03285 • Published Nov 6, 2023 • 28
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 88
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29, 2024 • 19
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 39
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 125
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Paper • 2402.09844 • Published Feb 15, 2024 • 20
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 243
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22, 2024 • 126
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 75
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 127
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 77
Pheme: Efficient and Conversational Speech Generation Paper • 2401.02839 • Published Jan 5, 2024 • 17
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20, 2024 • 36
Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions Paper • 2310.07301 • Published Oct 11, 2023 • 1
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published May 18, 2024 • 17
SimPO: Simple Preference Optimization with a Reference-Free Reward Paper • 2405.14734 • Published May 23, 2024 • 11
Aligning Large Multimodal Models with Factually Augmented RLHF Paper • 2309.14525 • Published Sep 25, 2023 • 30
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Paper • 2310.11511 • Published Oct 17, 2023 • 75
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2, 2024 • 31
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6, 2024 • 28
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 12
RewardBench: Evaluating Reward Models for Language Modeling Paper • 2403.13787 • Published Mar 20, 2024 • 21
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 87
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 31
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices Paper • 2312.16886 • Published Dec 28, 2023 • 19
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6, 2024 • 13
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 66
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval Paper • 2407.19669 • Published Jul 29, 2024 • 23
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27, 2024 • 13
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published Sep 5, 2024 • 19
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 56
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis Paper • 2409.07129 • Published Sep 11, 2024 • 7
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 76
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published Sep 20, 2024 • 40
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Paper • 2410.08146 • Published Oct 10, 2024
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21, 2024 • 44
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17, 2024 • 53
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 83
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper • 2410.18451 • Published Oct 24, 2024 • 16
Continuous Speech Synthesis using per-token Latent Diffusion Paper • 2410.16048 • Published Oct 21, 2024 • 29
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published Oct 29, 2024 • 13
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published Oct 25, 2024 • 17
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 40
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29, 2024 • 17
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks Paper • 2410.22391 • Published Oct 29, 2024 • 22
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30, 2024 • 24
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 53
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 113
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 58
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
SpiRit-LM: Interleaved Spoken and Written Language Model Paper • 2402.05755 • Published Feb 8, 2024 • 13
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 29 days ago • 136
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 24 days ago • 121
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 20
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 54
Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 8
Self-Consistency Improves Chain of Thought Reasoning in Language Models Paper • 2203.11171 • Published Mar 21, 2022 • 3
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 8 days ago • 71
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 3 days ago • 176
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 2 days ago • 42