view post Post 417 LLaVA-Miniπ₯ A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences Model: ICTNLP/llava-mini-llama-3.1-8bPaper: LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (2501.03895)β¨ Matches LLaVA-v1.5 using just 1 vision token β¨ Delivers <40ms response timeβ¨ Reduces vision tokens while maintaining strong visual understanding See translation
view post Post 2235 Excited to see Alibaba DAMO Academy release a multimodel dataset for vision language pretraining on the hubπ₯Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)Dataset: DAMO-NLP-SG/multimodal_textbookβ¨ 6.5M images + 0.8B text from 22k hours of instructional videosβ¨ Covers subjects like math, physics, and chemistryβ¨ Apache 2.0 See translation
Hub π Running 13 π₯ China AI policy research π€ Running 11 π Watermark Demo Demo of watermarking with gradio Running on CPU Upgrade 14 ππ Llm Race To The Top View Chatbot Arena ELO of top models increasing Running on CPU Upgrade 145 π¬ Open LLM Progress Tracker
Running on CPU Upgrade 14 ππ Llm Race To The Top View Chatbot Arena ELO of top models increasing
Fun Spaces β¨ Running on A10G 440 βοΈ LEDITS Running on A10G 4.69k π΅ MusicGen Running on Zero 4.9k π IllusionDiffusion Generate stunning high quality illusion artwork Running on CPU Upgrade 9.06k π©βπ¨ AI Comic Factory Create your own AI comic with a single prompt
Running on CPU Upgrade 9.06k π©βπ¨ AI Comic Factory Create your own AI comic with a single prompt