Exclibur
's Collections
Interest
updated
CompCap: Improving Multimodal Large Language Models with Composite
Captions
Paper
•
2412.05243
•
Published
•
18
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
45
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at
Scale
Paper
•
2412.05237
•
Published
•
47
Exploring Multi-Grained Concept Annotations for Multimodal Large
Language Models
Paper
•
2412.05939
•
Published
•
14
Chimera: Improving Generalist Model with Domain-Specific Experts
Paper
•
2412.05983
•
Published
•
9
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Paper
•
2412.06673
•
Published
•
11
Video Motion Transfer with Diffusion Transformers
Paper
•
2412.07776
•
Published
•
17
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Paper
•
2412.03548
•
Published
•
17
Frame Representation Hypothesis: Multi-Token LLM Interpretability and
Concept-Guided Text Generation
Paper
•
2412.07334
•
Published
•
16
StreamChat: Chatting with Streaming Video
Paper
•
2412.08646
•
Published
•
18
SAME: Learning Generic Language-Guided Visual Navigation with
State-Adaptive Mixture of Experts
Paper
•
2412.05552
•
Published
•
4
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary
Embedding Distillation
Paper
•
2412.09585
•
Published
•
10
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
•
2412.08635
•
Published
•
42
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
•
2412.08737
•
Published
•
52
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
Long-term Streaming Video and Audio Interactions
Paper
•
2412.09596
•
Published
•
92
VideoICL: Confidence-based Iterative In-context Learning for
Out-of-Distribution Video Understanding
Paper
•
2412.02186
•
Published
•
22
Fourier Position Embedding: Enhancing Attention's Periodic Extension for
Length Generalization
Paper
•
2412.17739
•
Published
•
39
Large Concept Models: Language Modeling in a Sentence Representation
Space
Paper
•
2412.08821
•
Published
•
13