-
ZKJ.com
- Beijing, China
- https://blog.parsing.nl
- https://orcid.org/0000-0002-3838-640X
Highlights
- Pro
Stars
Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.
My learning notes/codes for ML SYS.
FlashMLA: Efficient MLA decoding kernels
A very simple GRPO implement for reproducing r1-like LLM thinking.
Fully open data curation for reasoning models
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
A financial agent for investment research
Pretraining code for a large-scale depth-recurrent language model
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Fully open reproduction of DeepSeek-R1
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Code implementation of synthetic continued pretraining
SGLang is a fast serving framework for large language models and vision language models.
Free, simple, fast interactive diagrams for any GitHub repository
Materials for EACL2024 tutorial: Transformer-specific Interpretability
Official implementation for "GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Models" (stay tuned & more will be updated)
RUCAIBox / GPO
Forked from txy77/GPOAbout The official GitHub page for ''Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers'' Resources