Jia-Ying Lin's picture

18 93

Jia-Ying Lin

linekin

·

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with 👍 2 days ago

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗 We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. ➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. 𝗧𝗟;𝗗𝗥: 🏛️ Architecture changes: ⇒ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. 🥇 As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post 👉 https://huggingface.co/blog/modernbert

reacted to m-ric's post with ❤️ 2 days ago

Since I published it on GitHub a few days ago, Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯 ➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort! The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use. We will make it work better, and fully open. ✨ Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

liked a model 3 days ago

NousResearch/Hermes-3-Llama-3.2-3B-GGUF

View all activity

Organizations

None yet

Collections 1

models

None public yet

datasets

None public yet