17 33 232

Xin Li PRO

lixin4ever

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a model 1 day ago

DAMO-NLP-SG/VideoRefer-7B-stage2.5

upvoted a paper 3 days ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

updated a collection 4 days ago

VideoRefer

View all activity

Organizations

lixin4ever's activity

upvoted a paper 3 days ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published 8 days ago • 71

upvoted a paper 8 days ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 10 days ago • 91

upvoted a paper 9 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 11 days ago • 40

upvoted 2 collections about 1 month ago

PixMo

Collection

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated 5 days ago • 53

Inf-CL

Collection

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated 7 days ago • 3

upvoted a collection 2 months ago

OpenCoder Datasets

Collection

OpenCoder datasets! • 6 items • Updated Nov 15, 2024 • 38

upvoted a paper 2 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20

upvoted 6 papers 3 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16, 2024 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 91

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38

upvoted 3 papers 4 months ago

upvoted a paper 6 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 56

upvoted 3 papers 7 months ago

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7, 2024 • 27

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 95

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 39