-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 34 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 40 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 26
Oğuzhan Ercan
oguzhanercan
AI & ML interests
Computer Vision, Generative Vision, first trajectory bender
Recent Activity
updated
a collection
1 day ago
Image-Video General Tasks
updated
a collection
2 days ago
Face Generation-Swap-Contol-Edit
updated
a collection
2 days ago
Image-Video General Tasks
Organizations
None yet
Collections
18
models
None public yet