🐺🐦⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark By wolfram • about 14 hours ago • 1
TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation By imomayiz • 1 day ago • 15
Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation By RapidataAI • 1 day ago • 12
Building Effective Agents with Anthropic’s Best Practices and smolagents ❤️ By Sri-Vigneshwar-DJ • 7 days ago • 4
Superposition in Transformers: A Novel Way of Building Mixture of Experts By BenChaliah • 7 days ago • 14
**Building a System That Can Build Systems: Toward a Self-Replicating Ecosystem Framework** By adityagaharawar • 8 days ago • 1
Fine-tune a SmolLM on domain-specific synthetic data from a LLM By davidberenstein1957 • 8 days ago • 29
✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang • 8 days ago • 11
🐺🐦⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark By wolfram • about 14 hours ago • 1
TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation By imomayiz • 1 day ago • 15
Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation By RapidataAI • 1 day ago • 12
Building Effective Agents with Anthropic’s Best Practices and smolagents ❤️ By Sri-Vigneshwar-DJ • 7 days ago • 4
Superposition in Transformers: A Novel Way of Building Mixture of Experts By BenChaliah • 7 days ago • 14
**Building a System That Can Build Systems: Toward a Self-Replicating Ecosystem Framework** By adityagaharawar • 8 days ago • 1
Fine-tune a SmolLM on domain-specific synthetic data from a LLM By davidberenstein1957 • 8 days ago • 29
✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang • 8 days ago • 11