feat: Add minimal SeedRL-style example for SWE-bench#10
feat: Add minimal SeedRL-style example for SWE-bench#10joshgreaves wants to merge 4 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
| response = llm_clients.LLMResponse( | ||
| chat_completion_response=openai.types.chat.chat_completion.ChatCompletion( | ||
| id=str(uuid.uuid4()), | ||
| choices=[ | ||
| openai.types.chat.chat_completion.Choice( | ||
| message=openai.types.chat.chat_completion_message.ChatCompletionMessage( | ||
| content=output_text, |
There was a problem hiding this comment.
The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?
Prompt for AI Agents:
In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse
construction is duplicated inside the _run_inference and _run_batched_inference logic.
Refactor by adding a new private helper method on Learner (e.g., def
_build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens:
int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse
using self.model_name and uuid/time as currently done. Replace the inline response
construction at lines 324-347 and 465-488 with calls to this helper, passing the
calculated output_text and token counts, and update any variable names accordingly to
remove duplication.
Finding type: Conciseness
There was a problem hiding this comment.
Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2
There was a problem hiding this comment.
Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.
| # Get generation parameters from requests (use first request's params) | ||
| temperature = batch[0].request.temperature or 1.0 | ||
| do_sample = True if batch[0].request.temperature else False | ||
|
|
||
| # Generate completions for the entire batch | ||
| with torch.no_grad(): | ||
| outputs = self.model.generate( | ||
| input_ids=batched_input_ids, | ||
| attention_mask=batched_attention_mask, | ||
| max_new_tokens=2048, | ||
| temperature=temperature, | ||
| do_sample=do_sample, | ||
| pad_token_id=self.tokenizer.pad_token_id, |
There was a problem hiding this comment.
The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.
Finding type: Breaking Changes
Addresses PR #10 review feedback: - Add build_openai_compatible_llm_response() helper in llm_clients.py with full type hints and docstring for constructing LLMResponse from raw generation outputs - Refactor examples/02_local_llm.py to use the new helper, removing duplicate OpenAI object construction boilerplate - Refactor examples/03_seedrl_swebench.py to use the new helper in both single and batched inference paths - Fix batched inference contract: group requests by generation parameters (temperature, do_sample) and process each group separately to ensure parameters are respected per request while preserving original order - Pass compileall and ruff linter checks Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
| temperature=req.request.temperature or 1.0, | ||
| do_sample=True if req.request.temperature else False, |
There was a problem hiding this comment.
Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.
| temperature=req.request.temperature or 1.0, | |
| do_sample=True if req.request.temperature else False, | |
| temperature=1.0 if req.request.temperature is None else req.request.temperature, | |
| do_sample=False if req.request.temperature == 0.0 else True, |
Finding type: Logical Bugs
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdded comprehensive coding guidelines, introduced a helper function for OpenAI-compatible LLM response construction, refactored an example script to use the helper, and created a new SeedRL-style asynchronous example featuring batch inference with actor-learner architecture for SWE-bench environments. Changes
Sequence DiagramsequenceDiagram
participant A as Actor
participant L as Learner (main)
participant Q as Request Queue
participant BT as Batch Thread
participant M as LLM Model
participant E as SWE-bench Env
A->>L: request_inference(actor_id, LLMRequest)
L->>Q: enqueue InferenceRequest
L-->>A: return Future[LLMResponse]
BT->>Q: check queue (timeout or max_batch_size)
BT->>BT: group requests by generation params
BT->>BT: batch and pad requests
BT->>M: generate (batched) in thread pool
BT->>BT: compute C51 value head outputs
BT->>BT: build OpenAI-compatible responses
BT->>L: resolve futures with LLMResponses
L-->>A: future.result() = LLMResponse
A->>E: step(action)
E-->>A: observation, reward, done
A->>A: loop until done or max_steps
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/03_seedrl_swebench.py`:
- Around line 546-549: Guard the nullable observation before calling
request_inference: check ts.observation is not None in the loop around the call
to self.learner.request_inference(self.actor_id, ts.observation) and only invoke
request_inference when it is non-null; if it is None, either continue the loop
(skip/await next timestep) or assign a safe default action and/or log a warning
so tokenization in request_inference is never called with a None observation.
- Around line 249-250: The pairing of requests to responses uses zip(batch,
responses) which silently truncates on length mismatch; change these to
zip(batch, responses, strict=True) at the three occurrences (the loop using req,
response and calling req.response_future.set_result, and the other two analogous
loops around lines 371 and 476) so a ValueError is raised immediately on
misalignment and prevents leaving unresolved response_future promises; no other
API changes needed—just add strict=True to the zip calls where batch and
responses are iterated together.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc134caf-8f41-4e76-8c18-dd7d6f1a967f
📒 Files selected for processing (4)
Claude.mdexamples/02_local_llm.pyexamples/03_seedrl_swebench.pysrc/ares/llms/llm_clients.py
| for req, response in zip(batch, responses): | ||
| req.response_future.set_result(response) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Python version constraints:"
fd -i 'pyproject.toml' --exec rg -n "requires-python|target-version" {}
echo
echo "zip() call sites in examples/03_seedrl_swebench.py:"
rg -nP '\bzip\s*\(' examples/03_seedrl_swebench.pyRepository: withmartian/ares
Length of output: 396
🏁 Script executed:
# Read the relevant sections of the file to understand the context
echo "=== Context around line 249 ==="
sed -n '240,260p' examples/03_seedrl_swebench.py
echo
echo "=== Context around line 371 ==="
sed -n '365,380p' examples/03_seedrl_swebench.py
echo
echo "=== Context around line 476 ==="
sed -n '470,485p' examples/03_seedrl_swebench.py
# Check if there are any assertions or length checks in the file
echo
echo "=== Length/size assertions in the file ==="
rg -n 'len\(.*\)|assert|ValueError' examples/03_seedrl_swebench.py | head -20Repository: withmartian/ares
Length of output: 3059
🏁 Script executed:
# Find the _run_batched_inference function
echo "=== _run_batched_inference implementation ==="
rg -A 15 'def _run_batched_inference' examples/03_seedrl_swebench.py
echo
echo "=== _run_single_param_batch implementation ==="
rg -A 15 'def _run_single_param_batch' examples/03_seedrl_swebench.py
# Check how tokenized_inputs is created
echo
echo "=== Tokenization context around line 476 ==="
sed -n '450,475p' examples/03_seedrl_swebench.pyRepository: withmartian/ares
Length of output: 2658
🏁 Script executed:
# Look at more context to see tokenized_inputs creation
echo "=== Full context for tokenized_inputs creation ==="
sed -n '380,475p' examples/03_seedrl_swebench.py | tail -50Repository: withmartian/ares
Length of output: 2347
🏁 Script executed:
# Find where tokenized_inputs is defined/created
echo "=== Search for tokenized_inputs definition ==="
rg -n 'tokenized_inputs' examples/03_seedrl_swebench.py
echo
echo "=== Full _run_single_param_batch function ==="
sed -n '378,520p' examples/03_seedrl_swebench.pyRepository: withmartian/ares
Length of output: 6150
Use strict=True in zip() to fail fast on request/response misalignment.
At lines 249, 371, and 476, zip() without strict=True silently truncates if batch and response lengths mismatch. This leaves unresolved futures (line 252-256's exception handler shows the authors are aware of this risk), which can hang indefinitely in the async learner flow. The docstrings promise "one per request in the same order," so failures to honor this contract should surface immediately.
💡 Proposed fix
- for req, response in zip(batch, responses):
+ for req, response in zip(batch, responses, strict=True):
req.response_future.set_result(response)
@@
- for (orig_idx, _), response in zip(group, group_responses):
+ for (orig_idx, _), response in zip(group, group_responses, strict=True):
all_responses.append((orig_idx, response))
@@
- for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs)):
+ for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs, strict=True)):Python 3.12 (per pyproject.toml) supports strict=True (available since 3.10).
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 249-249: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/03_seedrl_swebench.py` around lines 249 - 250, The pairing of
requests to responses uses zip(batch, responses) which silently truncates on
length mismatch; change these to zip(batch, responses, strict=True) at the three
occurrences (the loop using req, response and calling
req.response_future.set_result, and the other two analogous loops around lines
371 and 476) so a ValueError is raised immediately on misalignment and prevents
leaving unresolved response_future promises; no other API changes needed—just
add strict=True to the zip calls where batch and responses are iterated
together.
| while not ts.last() and step_count < self.max_steps: | ||
| # Request inference from the centralized learner | ||
| action = await self.learner.request_inference(self.actor_id, ts.observation) | ||
|
|
There was a problem hiding this comment.
Guard against None observations before requesting inference.
ts.observation is nullable by contract, but this path passes it directly into request_inference(). If it is None, inference crashes in tokenization.
💡 Proposed fix
while not ts.last() and step_count < self.max_steps:
+ if ts.observation is None:
+ raise RuntimeError(
+ f"[Actor {self.actor_id}] Missing observation before inference request."
+ )
# Request inference from the centralized learner
action = await self.learner.request_inference(self.actor_id, ts.observation)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/03_seedrl_swebench.py` around lines 546 - 549, Guard the nullable
observation before calling request_inference: check ts.observation is not None
in the loop around the call to self.learner.request_inference(self.actor_id,
ts.observation) and only invoke request_inference when it is non-null; if it is
None, either continue the loop (skip/await next timestep) or assign a safe
default action and/or log a warning so tokenization in request_inference is
never called with a None observation.
Generated description
graph LR main_("main"):::added Learner_start_("Learner.start"):::added Actor_run_episode_("Actor.run_episode"):::added Learner_request_inference_("Learner.request_inference"):::added Learner_run_batched_inference_("Learner._run_batched_inference"):::added Learner_run_inference_("Learner._run_inference"):::added build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added C51ValueHead_forward_("C51ValueHead.forward"):::added main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_ main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_ Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_ Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_ Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_ Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_ Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_ Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxAdds a minimal SeedRL-style architecture example for the
aresmodule's LLM integration, demonstrating a centralizedLearnerfor batched inference and distributedActors for running SWE-bench environment episodes. Introduces a new helper function,build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.Learnerfor batched LLM inference and distributedActors to run environment episodes. This includes a new helper functionbuild_openai_compatible_llm_responseto standardize LLM responses, which is also retrofitted into an existing example.Modified files (3)
Latest Contributors(1)
Claude.mddocument outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.Modified files (1)
Latest Contributors(0)
Summary by CodeRabbit
New Features
Documentation
Refactor