Ray actor loop with local envs #81

ollmer · 2025-10-06T17:52:55Z

Ray-based implementation of `ActorLoop` that replaces multiprocessing and in memory queues.

Task Execution

Uses ray.remote() instead of multiprocessing.Process
Initializes Ray with configurable worker count and dashboard
Tasks execute rollout policy in separate processes, 1 process per CPU. One ray task handles async_batch_size problems in an async loop simultaneously.

Load Balancing

Tracks number of tasks assigned per LLM URL
Submits tasks to least busy LLM
Checks capacity constraints per LLM before submission

Queue Management

Replaces SharedMemoryQueue with in-memory lists, as Ray handles passing results between processes on its own
Uses ray.wait() to poll for finished tasks (up to 100 at a time)
Groups results by problem ID before returning

Monitoring

Logs task latencies, Ray overhead, token throughput, number of failed rollouts
Reports per-LLM utilization

Method Overrides

start_backend(): Initialize Ray runtime
have_capacity(): Check task count + per-LLM limits
submit_problem(): Create Ray tasks for each attempt
get_new_results(): Poll Ray and return completed groups
stop_tasks(): Shutdown Ray
Queue size methods adapted for in-memory lists tracking

Configuration

Enabled via cfg.use_ray=true in config. Selected automatically in run_actor_loop().

MCP Server config

server startup command replaced with the shorter one that expects the mcp-run-python module to be installed. Lack of installation during startups speeds up actor loop significantly as this startup one time per each task.

ollmer · 2025-10-14T09:06:22Z

We need to investigate these, probably deno instances dying. This is a multi-node full training run

Should not be a huge issue as we spawn a whole new deno instance for every new task. So the single failure will result in only one failed trace. But we should definitely monitor number of such cases when running actor.

ollmer · 2025-10-14T09:14:39Z

I've added field rollout_errors to the metrics sent to wandb to monitor number of such errors

rafapi · 2025-10-30T15:20:33Z

pipelinerl/actor.py

+            )
+            llm = self.llms_by_url[llm_url]
+            task_ref = self.ray_remote.remote(self.cfg_dict, llm, problem_batch, self.problem_id)
+            time.sleep(1.0) # TODO: remove this


Let's not forget to remove this. It's capping us at 1 batch/sec per LLM I guess?

No, it's to spread out in time tasks submissions to workers. We submit 1 task a second up to 255 with current configuration, then they all run in parallel. Assuming avg task latency ~100 sec, we effectively running around 100 tasks in parallel with even with this slowdown,

pipelinerl/actor.py

NicolasAG and others added 30 commits June 16, 2025 20:37

add README

bb00d91

increase env session inactivity timout

dc81770

update readme

e60d4c1

move miniwob to domains/

f9e45c2

fix

8cdbd06

fix path

5510982

return RuntimeError instead of HTTPException because not pickable

07e858c

add env_call_timeout

5e56896

update gpu fractions

c06b768

set kl coef to 0

b1ad285

Merge remote-tracking branch 'origin/main' into debug_miniwob

6bbe977

update max seq len

c8ac64d

revert to json instead of tool use agent

b87a6d1

update README

824d841

debug overflow counter

8d170ec

fix prompts

21a1b2a

update readme

05b6794

flag tape as invalid instead of raising http errors

ef6b2b0

use redis

0abc2b0

track task names instead of data splits

d3f6889

fix

9c319e3

remove unused var in new tapeagent remote_env

92c8a93

use BaseMetrics

edf4d00

fix

28749e0

keep track of time taken

a4f9f79

send per step times to wandb

8a6120f

processed_entries_queue_popped_data

3d57d2e

faster preprocess

4fbc5c7

more logging

91acbc4

better namming

fb5a0bd

ollmer added 7 commits October 13, 2025 11:41

save tape in the rollout fun

154ae63

Merge branch 'main' into envs_speed_debug

2b4068f

remove debug print to separate file

0370f3c

remove old mcp rollout function, leave only new one

096896c

address review comments

135a3c5

new rollouts in actor, both async and multiprocess

d8f3e68

remove unused env server

b09e42e

log number of failed rollouts to wandb for monitoring

35e5272

ollmer added 17 commits October 14, 2025 13:41

better error msg

3a93547

use embedded envs by default

1c56779

support sync mode with only one rollout per worker at a time

555deb1

sync rollout function for miniwob with embedded env

bae92d5

miniwob config with ray actor, sync mode, embedded env

a39f8b4

port changes from debug_miniwob branch

4aad548

format and store ray worker logs

6d12975

catch tape saving errors

c38d987

fix

9ffa38c

update miniwob conf

6b55c6b

better logging

3fca683

fix ray reinit with flag

cd07c3e

fix stopping test loop

3a0b709

update miniwob vllm params

8101531

more rollout timing metrics

30e486f

common actor loop for submitting and retrieveing tassks

2e3a218

fix

65405b9

rafapi reviewed Oct 30, 2025

View reviewed changes

pipelinerl/actor.py Outdated Show resolved Hide resolved

remove train stats crutch

32a3102

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ray actor loop with local envs #81

Ray actor loop with local envs #81

Uh oh!

ollmer commented Oct 6, 2025 •

edited

Loading

Uh oh!

ollmer commented Oct 14, 2025 •

edited

Loading

Uh oh!

ollmer commented Oct 14, 2025

Uh oh!

rafapi Oct 30, 2025

Uh oh!

ollmer Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ray actor loop with local envs #81

Are you sure you want to change the base?

Ray actor loop with local envs #81

Uh oh!

Conversation

ollmer commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ray-based implementation of ActorLoop that replaces multiprocessing and in memory queues.

Method Overrides

Configuration

MCP Server config

Uh oh!

ollmer commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ollmer commented Oct 14, 2025

Uh oh!

rafapi Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ollmer Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ollmer commented Oct 6, 2025 •

edited

Loading

Ray-based implementation of `ActorLoop` that replaces multiprocessing and in memory queues.

ollmer commented Oct 14, 2025 •

edited

Loading