view article Article Superposition in Transformers: A Novel Way of Building Mixture of Experts By BenChaliah • 7 days ago • 14
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated 6 days ago • 21
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published Oct 1, 2024 • 14
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 225
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 262
view article Article Memory-efficient Diffusion Transformers with Quanto and Diffusers Jul 30, 2024 • 61
view article Article Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖 By m-ric • Jun 20, 2024 • 26
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 43
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 34
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 69
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 254
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2, 2024 • 104