Don't throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding Paper ā¢ 2309.15028 ā¢ Published Sep 26, 2023 ā¢ 1
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Paper ā¢ 2310.02255 ā¢ Published Oct 3, 2023 ā¢ 2
Crystal: Introspective Reasoners Reinforced with Self-Feedback Paper ā¢ 2310.04921 ā¢ Published Oct 7, 2023 ā¢ 1
NaturalProofs: Mathematical Theorem Proving in Natural Language Paper ā¢ 2104.01112 ā¢ Published Mar 24, 2021
Generated Knowledge Prompting for Commonsense Reasoning Paper ā¢ 2110.08387 ā¢ Published Oct 15, 2021
Minds versus Machines: Rethinking Entailment Verification with Language Models Paper ā¢ 2402.03686 ā¢ Published Feb 6, 2024 ā¢ 1
NaturalProver: Grounded Mathematical Proof Generation with Language Models Paper ā¢ 2205.12910 ā¢ Published May 25, 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering Paper ā¢ 2210.03078 ā¢ Published Oct 6, 2022 ā¢ 1
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper ā¢ 2406.09279 ā¢ Published Jun 13, 2024 ā¢ 2
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text Paper ā¢ 2410.04265 ā¢ Published Oct 5, 2024
Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper ā¢ 2412.04403 ā¢ Published Dec 5, 2024 ā¢ 2
Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper ā¢ 2412.04403 ā¢ Published Dec 5, 2024 ā¢ 2
Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper ā¢ 2412.04403 ā¢ Published Dec 5, 2024 ā¢ 2 ā¢ 2