HuggingFace-CN-community (Hugging Face Chinese Localization)

AdinaY

posted an update 1 day ago

Post

423

LLaVA-Mini🔥 A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences
Model: ICTNLP/llava-mini-llama-3.1-8b
Paper: LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (2501.03895)
✨ Matches LLaVA-v1.5 using just 1 vision token
✨ Delivers <40ms response time
✨ Reduces vision tokens while maintaining strong visual understanding

AdinaY

posted an update 5 days ago

Post

2236

Excited to see Alibaba DAMO Academy release a multimodel dataset for vision language pretraining on the hub🔥

Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)
Dataset: DAMO-NLP-SG/multimodal_textbook

✨ 6.5M images + 0.8B text from 22k hours of instructional videos
✨ Covers subjects like math, physics, and chemistry
✨ Apache 2.0

AdinaY

posted an update 16 days ago

Post

3578

The Chinese community is shipping 🚢

DeepSeek V3 (685 B MoE) has quietly released on the hub!
Base: deepseek-ai/DeepSeek-V3-Base
Instruct: deepseek-ai/DeepSeek-V3

Can’t wait to see what’s next!

1 reply

·

AdinaY

posted an update 18 days ago

Post

2995

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

AdinaY

posted an update 26 days ago

Post

547

Megrez-3B-Omni 🔥 an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem.
Model: Infinigence/Megrez-3B-Omni
Demo: Infinigence/Megrez-3B-Omni
✨Supports analysis of image, text, and audio modalities
✨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries
✨Outperforms in scene understanding and OCR across major benchmarks

AdinaY

posted an update about 1 month ago

Post

886

Updates from the Chinese community last week 🔥

LLM:
✨ Sailor 2 , multilingual model supporting 10+ South Asian languages by Sea AI Lab. https://huggingface.co/sailor2

MLLM:
✨InternVL 2.5 , new open multimodal LLM by OpenGVLab
https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c
✨Qwen2-VL 2B/7B/72B base model, the latest iteration of our Qwen-VL model by Alibaba Qwen
Qwen/qwen2-vl-66cee7455501d7126940800d

Video model:
✨HunyuanVideo , 13B open video model by Tencent
tencent/HunyuanVideo

Reasoning model:
✨ LLaMA-O1 🦙 base & supervised model; pretrain & finetune datasets and demo all released
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7

Audio model:
✨Fish Speech 1.5, Text-to-speech in 13 languages, trained on 1M+ hours of audio by FishAudio
fishaudio/fish-speech-1.5
✨ClearVoice, An advanced voice processing framework by Alibaba Tongyi SpeechAI https://huggingface.co/alibabasglab

More details 👉 https://huggingface.co/zh-ai-community

AdinaY

posted an update about 1 month ago

Post

1583

Sailor 2 🚢 open multilingual model for Southeast Asia by Sea AI Lab🔥
https://huggingface.co/sailor2
sail/Sailor2-20B-Chat

✨ Fully open code & ALL datasets 🙌
✨ 1B/ 8B/20B base & chat expanded on Qwen2.5
✨ Apache 2.0
✨ Supports 15 languages including English, Chinese, Burmese, Cebuano, Ilocano, Indonesian, Javanese, Khmer, Lao, Malay, Sundanese, Tagalog, Thai, Vietnamese, and Waray🇬🇧🇨🇳🇱🇦🇲🇾🇲🇲🇻🇳🇹🇭

AdinaY

posted an update about 1 month ago

Post

1483

2023 & 2024 Top Downloaded (all time) Open Models on the hub are both from the Chinese community 👀

2023 👉 Bge base by BAAI
BAAI/bge-base-en-v1.5
2024 👉 Qwen 2.5 by Alibaba Qwen
Qwen/Qwen2.5-1.5B-Instruct

Can’t wait to see what incredible models the Chinese community will bring in 2025🚀

✨ Follow https://huggingface.co/zh-ai-community to get the latest updates from the Chinese community
✨ Explore the 2024 Year in Review huggingface/open-source-ai-year-in-review-2024

AdinaY

posted an update about 1 month ago

Post

1346

HunyuanVideo 📹 The new open video generation model by Tencent!
👉 tencent/HunyuanVideo
zh-ai-community/video-models-666afd86cfa4e4dd1473b64c
✨ 13B parameters: Probably the largest open video model to date
✨ Unified architecture for image & video generation
✨ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite
✨ Delivers stunning visuals, diverse motion, and unparalleled stability
🔓 Fully open with code & weights

AdinaY

posted an update about 1 month ago

Post

1120

Zhipu AI, the Chinese generative AI startup behind CogVideo, just launched their first productized AI Agent - AutoGLM 🔥
👉 https://agent.aminer.cn

With simple text or voice commands, it:
✨ Simulates phone operations effortlessly
✨ Autonomously handles 50+ step tasks
✨ Seamlessly operates across apps

Powered by Zhipu's "Decoupled Interface" and "Self-Evolving Learning Framework" to achieve major performance gains in Phone Use and Web Browser Use!

Meanwhile, GLM4-Edge is now on Hugging Face hub🚀
👉 THUDM/glm-edge-6743283c5809de4a7b9e0b8b
Packed with advanced dialogue + multimodal models:
📱 1.5B / 2B models: Built for mobile & in-car systems
💻 4B / 5B models: Optimized for PCs

AdinaY

posted an update about 1 month ago

Post

1606

🌊 The wave of reasoning models from the Chinese community has arrived!

🚀 Marco-o1 by AIDC, Alibaba
👉 AIDC-AI/Marco-o1

✨ QwQ by Qwen, Alibaba
👉 Qwen/qwq-674762b79b75eac01735070a

🌟 Skywork-o1 by Kunlun Tech
👉 Skywork/skywork-o1-open-67453df58e12f6c3934738d0

🔥 Xkev/Llama-3.2V-11B-cot by PKU Yuan group
👉 Xkev/Llama-3.2V-11B-cot

💡 DeepSeek-R1-Lite-Preview by DeepSeek AI
👉 https://chat.deepseek.com/

🔍 InternThinker Preview by Shanghai AI Lab
👉 https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp

📘 k0-math by Moonshot AI
🚀 https://kimi.moonshot.cn/ ( coming soon! )

Who's next? 👀
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7