The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper β’ 2403.17031 β’ Published Mar 24, 2024 β’ 3
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Paper β’ 2410.18252 β’ Published Oct 23, 2024 β’ 5
TΓLU 3: Pushing Frontiers in Open Language Model Post-Training Paper β’ 2411.15124 β’ Published Nov 22, 2024 β’ 58
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper β’ 2412.15035 β’ Published 23 days ago β’ 4
view post Post 4595 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation π 6 6 π₯ 4 4 π 1 1 + Reply
view post Post 388 @s3nh Hey man check your discord! Got some news. See translation 4 replies Β· π 1 1 + Reply
Think Beyond Size: Adaptive Prompting for More Effective Reasoning Paper β’ 2410.08130 β’ Published Oct 10, 2024 β’ 1
view post Post 5749 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply Β· β€οΈ 3 3 π 2 2 + Reply
view post Post 3789 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation π₯ 4 4 π 1 1 + Reply
view post Post 2766 anychatsupports chatgpt, gemini, perplexity, claude, meta llama, grok all in one apptry it out there: akhaliq/anychat β€οΈ 7 7 π 3 3 π₯ 2 2 + Reply
RedPajama: an Open Dataset for Training Large Language Models Paper β’ 2411.12372 β’ Published Nov 19, 2024 β’ 48
Cascade-DETR: Delving into High-Quality Universal Object Detection Paper β’ 2307.11035 β’ Published Jul 20, 2023
Behavior Contrastive Learning for Unsupervised Skill Discovery Paper β’ 2305.04477 β’ Published May 8, 2023
Rethinking Memory and Communication Cost for Efficient Large Language Model Training Paper β’ 2310.06003 β’ Published Oct 9, 2023 β’ 2
SemiReward: A General Reward Model for Semi-supervised Learning Paper β’ 2310.03013 β’ Published Oct 4, 2023 β’ 1
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory Paper β’ 2404.11163 β’ Published Apr 17, 2024
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper β’ 2405.19893 β’ Published May 30, 2024 β’ 31