Skip to content
Draft

2.5 rl0 #1004

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
77f91c3
all changes from olmo3 but for olmo2.5
mnoukhov Aug 14, 2025
13c057b
example script
mnoukhov Aug 14, 2025
ee61222
fix path and uv lock
mnoukhov Aug 14, 2025
1423264
olmo2 retrofit naming
mnoukhov Aug 15, 2025
ed2ec83
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 16, 2025
a663287
updated script
mnoukhov Aug 19, 2025
abe3902
makefile delete old image
mnoukhov Aug 19, 2025
7d74b69
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 19, 2025
e5002cc
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 19, 2025
8dcf71c
resumable
mnoukhov Aug 20, 2025
f9b82f2
logging oe eval to wandb when using new oe-eval-interal
mnoukhov Aug 20, 2025
2ea2e37
fix for 4 nodes maybe
mnoukhov Aug 20, 2025
f2d6e97
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 20, 2025
93b88e2
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 21, 2025
667963b
revert change, 3 - 1 node still not working
mnoukhov Aug 21, 2025
e2925ec
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 21, 2025
98af8e3
wandb run step arg
mnoukhov Aug 21, 2025
18e3f7c
custom vllm in pyproject no need to clone
mnoukhov Aug 21, 2025
5820756
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 22, 2025
512651d
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 25, 2025
c30b5b8
Merge branch 'main' into log-oe-eval-wandb
mnoukhov Aug 25, 2025
ebcac11
vllm is extra dependency
mnoukhov Aug 26, 2025
6cbafa3
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 26, 2025
5d0e81a
Merge branch 'main' of github.com:allenai/open-instruct into log-oe-e…
mnoukhov Aug 26, 2025
1e5e1f9
make vllm a dependency either way but do local vllm as extra
mnoukhov Aug 27, 2025
24c8dc8
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 27, 2025
77048d2
back to basics, make setup to git clone
mnoukhov Aug 27, 2025
cb87b45
editable
mnoukhov Aug 27, 2025
d0b6bfc
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 28, 2025
6cac122
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 28, 2025
db7e3d4
merge in main (#962)
jacob-morrison Aug 28, 2025
3ed8657
debug script
mnoukhov Aug 29, 2025
0c432cd
smaller run on one node
mnoukhov Sep 3, 2025
782337c
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 3, 2025
1c609b8
attention type fix
mnoukhov Sep 3, 2025
20354e3
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 4, 2025
ef9e855
synchronous weight sync
mnoukhov Sep 5, 2025
bd28584
start generate thread trigger event
mnoukhov Sep 5, 2025
e43aa8e
single weight sync and generate thread
mnoukhov Sep 5, 2025
5aef3bd
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 6, 2025
1bea281
sync weight sync
mnoukhov Sep 6, 2025
ce3fec0
cleanup
mnoukhov Sep 8, 2025
ee243ef
fix env var check
mnoukhov Sep 8, 2025
755ac15
temporary logging
mnoukhov Sep 8, 2025
782ac53
disable log stats
mnoukhov Sep 8, 2025
ad89b37
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 8, 2025
54ed043
fix lock file and revert extra logging
mnoukhov Sep 8, 2025
d9dd800
un-revert weight sync
mnoukhov Sep 8, 2025
3a833c0
olmo dapo
mnoukhov Sep 8, 2025
c4497c5
olmo simple thinker
mnoukhov Sep 9, 2025
b7bd670
Merge branch 'main' into log-oe-eval-wandb
mnoukhov Sep 9, 2025
013c6b7
undo formatting
mnoukhov Sep 9, 2025
1b69161
Merge branch 'log-oe-eval-wandb' of github.com:allenai/open-instruct …
mnoukhov Sep 11, 2025
54c9a39
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 11, 2025
551f58c
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 14, 2025
4b3dd54
good r1-zero script and olmo simple thinker template
mnoukhov Sep 14, 2025
3f21704
2 epochs
mnoukhov Sep 14, 2025
5455f64
deepseek evals
mnoukhov Sep 15, 2025
f188425
shorter run
mnoukhov Sep 19, 2025
7177b0a
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 19, 2025
e00ef62
filtering vllm top p
mnoukhov Sep 19, 2025
cfc9b8d
fix copy since we need the folder
mnoukhov Sep 21, 2025
68fe5ef
Merge branch 'olmo2-retrofit' of github.com:allenai/open-instruct int…
mnoukhov Sep 21, 2025
bec6c40
generate script
mnoukhov Sep 23, 2025
c4ec086
test run of RL 0
mnoukhov Sep 24, 2025
e85463a
Merge branch 'main' of github.com:allenai/open-instruct into 2.5-rl0
mnoukhov Sep 25, 2025
6c152b5
fix oe eval and eval on 0
mnoukhov Sep 25, 2025
3a844c1
whoami without jq
mnoukhov Sep 25, 2025
f7c572f
correct whoami
mnoukhov Sep 25, 2025
5f6c75d
simpler template
mnoukhov Sep 27, 2025
9aaae46
gpu multiplier
mnoukhov Sep 27, 2025
0c2a1c6
new hyperparams
mnoukhov Sep 27, 2025
a9527ed
actually nochat template
mnoukhov Sep 28, 2025
4a27a61
nearly there
mnoukhov Sep 28, 2025
361618c
final script maybe
mnoukhov Oct 2, 2025
d0945b6
intermediate commit
mnoukhov Oct 10, 2025
ff60dad
makefile
mnoukhov Oct 10, 2025
c28b8b2
Merge branch 'main' of github.com:allenai/open-instruct into 2.5-rl0
mnoukhov Oct 10, 2025
d928a7c
lets go
mnoukhov Oct 10, 2025
af2f2f3
active refilling
mnoukhov Oct 15, 2025
cebee9e
replenish filtered prompts
mnoukhov Oct 15, 2025
605b6d8
fix
mnoukhov Oct 16, 2025
982e27b
single concat batch
mnoukhov Oct 16, 2025
c3d93e5
another fix
mnoukhov Oct 16, 2025
bbfec99
hero run
mnoukhov Oct 16, 2025
e86813a
mask don't filter truncations
mnoukhov Oct 16, 2025
5d38fec
fix
mnoukhov Oct 16, 2025
d65732f
convert script
mnoukhov Oct 17, 2025
585f3e7
debug script
mnoukhov Oct 26, 2025
ef83951
Merge branch 'main' of github.com:allenai/open-instruct into 2.5-rl0
mnoukhov Oct 26, 2025
843e830
olmo3 and new transformers, vllm
mnoukhov Oct 26, 2025
9edb406
new script
mnoukhov Oct 26, 2025
b492b14
rlzero fix
mnoukhov Oct 27, 2025
5bbf88a
new image
mnoukhov Oct 27, 2025
b9eee00
no positive resampling
mnoukhov Oct 28, 2025
608e654
add exclude list to state and deepspeed 0.17.3
mnoukhov Oct 29, 2025
4cac939
exclude list state
mnoukhov Oct 30, 2025
6b4c7ab
12k generation
mnoukhov Oct 30, 2025
f2ba0b2
logging how many groups are filtered / resampled
mnoukhov Oct 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,4 @@ dmypy.json
cache/
local_dataset_cache/
scratch/
vllm_olmo2.5/
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ COPY configs configs
COPY scripts scripts
COPY mason.py mason.py
# Copy oe-eval-internal if it exists (wildcard pattern won't fail if missing)
COPY oe-eval-interna[l] oe-eval-internal/
COPY oe-eval-internal oe-eval-internal
COPY open_instruct open_instruct

# Add build arguments for git information
Expand Down
12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: style quality
.PHONY: style quality docker

# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = open_instruct
Expand All @@ -16,3 +16,13 @@ style-check: ## *fail* if anything needs rewriting

quality-check: ## *fail* if any rewrite was needed
uv run ruff check --exit-non-zero-on-fix $(check_dirs)

setup:
git clone -b shanea/olmo2-retrofit https://github.com/2015aroras/vllm.git vllm_olmo2.5

docker:
DOCKER_BUILDKIT=1 docker build -f Dockerfile --build-arg UV_CACHE_DIR=$(UV_CACHE_DIR) -t open_instruct_rlzero .
# if you are internally at AI2, you can create an image like this:
$(eval beaker_user := $(shell beaker account whoami --format json | jq -r '.[0].name'))
beaker image delete $(beaker_user)/open_instruct_rlzero
beaker image create open_instruct_rlzero -n open_instruct_rlzero -w ai2/$(beaker_user)
33 changes: 33 additions & 0 deletions generate_olmo25.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash

MODEL_NAME_OR_PATH="/weka/oe-training-default/ai2-llm/checkpoints/tylerr/long-context/olmo25_7b_lc_64k_6T_M100B_round5-sparkle_6634-pre_s2pdf_gzip2080_cweN-yake-all-olmo_packing_yarn-fullonly_50B-fb13a737/step11921-hf"
# DATASET="mnoukhov/DAPO-Math-14k-Processed-RLVR"
DATASET="TTTXXX01/MATH_3000_Filtered"
EXP_NAME="generate_olmo25_teng3k"

python mason.py \
--task_name ${EXP_NAME} \
--cluster ai2/jupiter \
--image ${1:-michaeln/open_instruct_olmo2_retrofit} \
--workspace ai2/tulu-thinker \
--priority high \
--pure_docker_mode \
--preemptible \
--gpus 2 \
--num_nodes 1 \
--max_retries 0 \
--budget ai2/oe-adapt \
--env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
--env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \
-- \
python scripts/data/rlvr/filtering_vllm.py \
--model $MODEL_NAME_OR_PATH \
--dataset $DATASET \
--split train \
--temperature 0.7 \
--top_p 0.95 \
--offset 0 \
--size 100000 \
--chat_template olmo_thinker_r1_style_nochat \
--output-file filtered_datasets/olmo25_7b_lc_dapo.jsonl \
--number_samples 16
27 changes: 27 additions & 0 deletions open_instruct/dataset_transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,33 @@ def visualize_token_role(tokens: list[int], masks: list[int], tokenizer: PreTrai
"{% endif %}"
"{% endfor %}"
),
"olmo_thinker_r1_style_nochat": (
"Solve the following math problem step by step. "
"Reason about the question in <think> </think> tags "
"then provide the final answer in <answer> </answer> tags "
"so the full response is <think> reasoning process here </think> "
"<answer> answer here </answer>."
"\n\n"
"{% for message in messages %}"
"{{ '\n\n' if not loop.first else '' }}"
"{{ message['content'] + '\n' }}"
"{% if loop.last and add_generation_prompt %}"
"{{ 'Solving step by step\n<think>' }}"
"{% endif %}"
"{% endfor %}"
),
"olmo_thinker_dapo": (
"Solve the following math problem step by step. "
"The last line of your response should be the answer to the problem in form Answer: $Answer (without quotes) where $Answer is the answer to the problem."
"\n\n"
"{% for message in messages %}"
"{{ '\n\n' if not loop.first else '' }}"
"{{ message['content'] + '\n' }}"
"{% if loop.last and add_generation_prompt %}"
"{{ '\nRemember to put your answer on its own line after \"Answer:\"' }}"
"{% endif %}"
"{% endfor %}"
),
# template is taken from https://arxiv.org/abs/2501.12948.
"r1_simple_chat": (
"A conversation between User and Assistant. "
Expand Down
Loading