Blog, Articles, and discussions

Preference Optimization for Vision Language Models

By July 10, 2024 • 55

Community Articles

🐺🐦‍⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark

about 14 hours ago

Python Is All You Need? Introducing Dria-Agent-α

about 17 hours ago

Search the Web with AI

TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation

Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation

🅰️ℹ️ 1️⃣0️⃣1️⃣ What is HtmlRAG, Multimodal RAG and Agentic RAG?

AI-Powered Content Creation for Release Notes Using KaibanJS

Synthetic Data Generation with FastData and Hugging Face

Crowd-sourced Open Preference Dataset for Text-to-Image Generation

Accelerating Language Model Inference with Mixture of Attentions

🌁#82: AI and ML in Real Life

Announcing NVIDIA Cosmos World Foundation Models

How to Automate Reddit Comment Generation with AI Agents in KaibanJS

Fine-tune SmolLM's on custom synthetic data

Building Effective Agents with Anthropic’s Best Practices and smolagents ❤️

AI in 2025: A Combinatorial Explosion of Possibilities, but NOT AGI

Superposition in Transformers: A Novel Way of Building Mixture of Experts

Building a System That Can Build Systems: Toward a Self-Replicating Ecosystem Framework

Fine-tune a SmolLM on domain-specific synthetic data from a LLM

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

Putting RL back in RLHF

By June 12, 2024 • 66

Constitutional AI with Open LLMs

By February 1, 2024 • 13

Preference Tuning LLMs with Direct Preference Optimization Methods

By January 18, 2024 • 41

The N Implementation Details of RLHF with PPO

By October 24, 2023 • 24

Finetune Stable Diffusion Models with DDPO via TRL

By September 29, 2023 guest • 7

Fine-tune Llama 2 with DPO

By August 8, 2023 • 37

StackLLaMA: A hands-on guide to train LLaMA with RLHF

By April 5, 2023 • 23

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

By March 9, 2023 • 37

Red-Teaming Large Language Models

By February 24, 2023 • 18

What Makes a Dialog Agent Useful?

By January 24, 2023 • 1

Illustrating Reinforcement Learning from Human Feedback (RLHF)

By December 9, 2022 • 127

Community Articles

🐺🐦‍⬛ LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark

about 14 hours ago

Python Is All You Need? Introducing Dria-Agent-α

about 17 hours ago

Search the Web with AI

TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation

Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation

🅰️ℹ️ 1️⃣0️⃣1️⃣ What is HtmlRAG, Multimodal RAG and Agentic RAG?

AI-Powered Content Creation for Release Notes Using KaibanJS

Synthetic Data Generation with FastData and Hugging Face

Crowd-sourced Open Preference Dataset for Text-to-Image Generation

Accelerating Language Model Inference with Mixture of Attentions

🌁#82: AI and ML in Real Life

Announcing NVIDIA Cosmos World Foundation Models

How to Automate Reddit Comment Generation with AI Agents in KaibanJS

Fine-tune SmolLM's on custom synthetic data

Building Effective Agents with Anthropic’s Best Practices and smolagents ❤️

AI in 2025: A Combinatorial Explosion of Possibilities, but NOT AGI

Superposition in Transformers: A Novel Way of Building Mixture of Experts

Building a System That Can Build Systems: Toward a Self-Replicating Ecosystem Framework

Fine-tune a SmolLM on domain-specific synthetic data from a LLM

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use