138 100 179

Moritz Laurer

MoritzLaurer

https://www.linkedin.com/in/moritz-laurer/

AI & ML interests

None yet

Recent Activity

posted an update about 1 hour ago

FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself! 📏 The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs. 🤖 Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document. 🧪 The authors tested different prompt templates on held-out data to ensure their generalization. 📚 It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations. 💾 You can now download and reuse these prompt templates via the prompt-templates library! 🔄 The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this! Links 👇 - prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/ - all templates on the HF Hub: https://huggingface.co/datasets/MoritzLaurer/facts-grounding-prompts - FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

reacted to merve's post with ❤️ about 21 hours ago

What a beginning to this year in open ML 🤠 Let's unwrap! https://huggingface.co/collections/merve/jan-10-releases-677fe34177759de0edfc9714 Multimodal 🖼️ > ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts > moondream2 is out with new capabilities like outputting structured data and gaze detection! > Dataset: Alibaba DAMO lab released multimodal textbook — 22k hours worth of samples from instruction videos 🤯 > Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge! LLMs 💬 > Microsoft released Phi-4, sota open-source 14B language model 🔥 > Dolphin is back with Dolphin 3.0 Llama 3.1 8B 🐬🐬 > Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment > SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct 💭 > Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview 📕 > Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs 📕 > Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences 👩🏻‍💻 Embeddings 🔖 > @MoritzLaurer released zero-shot version of ModernBERT large 👏 > KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B Image/Video Generation ⏯️ > NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts 🔥 > Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!) > Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M Others > Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression > Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding

posted an update 2 days ago

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support: 🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning. 🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub. 🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts. ⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation. Read the release notes and other resources here 👇 Release: https://github.com/huggingface/trl/releases/tag/v0.13.0 Mergekit: https://github.com/arcee-ai/mergekit Mixture of judges paper: https://huggingface.co/papers/2409.20370

View all activity

Articles

Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap

Jun 19, 2024

• 11

Synthetic data: save money, time and carbon with open source

Feb 16, 2024

• 54

Organizations

Posts 14

Post

FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

📏 The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

🤖 Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

🧪 The authors tested different prompt templates on held-out data to ensure their generalization.

📚 It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

💾 You can now download and reuse these prompt templates via the prompt-templates library!

🔄 The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this!

Links 👇
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

Post

1426

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here 👇
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)

View all posts