Deepseek V3 (All Versions) Collection Deepseek V3 - available in bf16, original, and GGUF formats, with support for 2, 3, 4, 5, 6 and 8-bit quantized versions. • 3 items • Updated 3 days ago • 21
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published Dec 10, 2024 • 26
Reasoning Datasets Collection Reasoning datasets that are trending 🔥 • 10 items • Updated 8 days ago • 16
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated 3 days ago • 78
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 23 days ago • 122
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 20 days ago • 208
Hymba Collection A series of Hybrid Small Language Models. • 2 items • Updated about 12 hours ago • 25
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 20 days ago • 198
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 101
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 12
view article Article How to build a custom text classifier without days of human labeling By sdiazlor • Oct 17, 2024 • 55
⛈️ Llama-3.1 Storm Models Collection Fine-tuned Llama 3.1 8B model with superior reasoning, conversation abilities, and function calling! • 3 items • Updated Aug 25, 2024 • 15
Code Evaluation Collection Collection of Papers on Code Evaluation (from code generation language models) • 45 items • Updated Oct 29, 2024 • 15
Llama-3.1 Quantization Collection Neural Magic quantized Llama-3.1 models • 22 items • Updated Nov 22, 2024 • 42