Skip to content

feat: Add minimal SeedRL-style example for SWE-bench#10

Open
joshgreaves wants to merge 4 commits intomainfrom
feat/add-seedrl-example
Open

feat: Add minimal SeedRL-style example for SWE-bench#10
joshgreaves wants to merge 4 commits intomainfrom
feat/add-seedrl-example

Conversation

@joshgreaves
Copy link
Copy Markdown
Contributor

@joshgreaves joshgreaves commented Jan 12, 2026

Generated description

graph LR
main_("main"):::added
Learner_start_("Learner.start"):::added
Actor_run_episode_("Actor.run_episode"):::added
Learner_request_inference_("Learner.request_inference"):::added
Learner_run_batched_inference_("Learner._run_batched_inference"):::added
Learner_run_inference_("Learner._run_inference"):::added
build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added
C51ValueHead_forward_("C51ValueHead.forward"):::added
main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_
main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_
Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_
Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_
Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_
Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_
Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_
Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px
Loading

Adds a minimal SeedRL-style architecture example for the ares module's LLM integration, demonstrating a centralized Learner for batched inference and distributed Actors for running SWE-bench environment episodes. Introduces a new helper function, build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.

TopicDetails
SeedRL Example Implements a minimal SeedRL-style architecture for the SWE-bench environment, featuring a centralized Learner for batched LLM inference and distributed Actors to run environment episodes. This includes a new helper function build_openai_compatible_llm_response to standardize LLM responses, which is also retrofitted into an existing example.
Modified files (3)
  • examples/02_local_llm.py
  • examples/03_seedrl_swebench.py
  • src/ares/llms/llm_clients.py
Latest Contributors(1)
UserCommitDate
[email protected]Add-example-2-local-LL...January 09, 2026
Coding Guidelines Adds a new Claude.md document outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.
Modified files (1)
  • Claude.md
Latest Contributors(0)
UserCommitDate
This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

  • New Features

    • Added SWE-bench example demonstrating async inference batching with value estimation
    • Improved LLM response construction for consistent output handling
  • Documentation

    • Added comprehensive coding standards and guidelines documentation
  • Refactor

    • Simplified LLM response creation across examples

Comment thread examples/03_seedrl_swebench.py Outdated
Comment on lines +324 to +330
response = llm_clients.LLMResponse(
chat_completion_response=openai.types.chat.chat_completion.ChatCompletion(
id=str(uuid.uuid4()),
choices=[
openai.types.chat.chat_completion.Choice(
message=openai.types.chat.chat_completion_message.ChatCompletionMessage(
content=output_text,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?

Prompt for AI Agents:

In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse
construction is duplicated inside the _run_inference and _run_batched_inference logic.
Refactor by adding a new private helper method on Learner (e.g., def
_build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens:
int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse
using self.model_name and uuid/time as currently done. Replace the inline response
construction at lines 324-347 and 465-488 with calls to this helper, passing the
calculated output_text and token counts, and update any variable names accordingly to
remove duplication.

Fix in Cursor


Finding type: Conciseness

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.

Comment on lines +413 to +425
# Get generation parameters from requests (use first request's params)
temperature = batch[0].request.temperature or 1.0
do_sample = True if batch[0].request.temperature else False

# Generate completions for the entire batch
with torch.no_grad():
outputs = self.model.generate(
input_ids=batched_input_ids,
attention_mask=batched_attention_mask,
max_new_tokens=2048,
temperature=temperature,
do_sample=do_sample,
pad_token_id=self.tokenizer.pad_token_id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.


Finding type: Breaking Changes

Rowan and others added 2 commits January 13, 2026 17:28
Addresses PR #10 review feedback:

- Add build_openai_compatible_llm_response() helper in llm_clients.py
  with full type hints and docstring for constructing LLMResponse from
  raw generation outputs
- Refactor examples/02_local_llm.py to use the new helper, removing
  duplicate OpenAI object construction boilerplate
- Refactor examples/03_seedrl_swebench.py to use the new helper in both
  single and batched inference paths
- Fix batched inference contract: group requests by generation parameters
  (temperature, do_sample) and process each group separately to ensure
  parameters are respected per request while preserving original order
- Pass compileall and ruff linter checks

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Comment on lines +286 to +287
temperature=req.request.temperature or 1.0,
do_sample=True if req.request.temperature else False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.

Suggested change
temperature=req.request.temperature or 1.0,
do_sample=True if req.request.temperature else False,
temperature=1.0 if req.request.temperature is None else req.request.temperature,
do_sample=False if req.request.temperature == 0.0 else True,

Finding type: Logical Bugs

@rsmith49
Copy link
Copy Markdown
Contributor

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Added comprehensive coding guidelines, introduced a helper function for OpenAI-compatible LLM response construction, refactored an example script to use the helper, and created a new SeedRL-style asynchronous example featuring batch inference with actor-learner architecture for SWE-bench environments.

Changes

Cohort / File(s) Summary
Documentation
Claude.md
New coding guidelines covering commenting philosophy (WHY over WHAT), type hints, async patterns, docstrings, logging, error handling, naming conventions, and project-specific patterns for containers, agents, and SWE-bench testing.
LLM Response Helper
src/ares/llms/llm_clients.py
Added build_openai_compatible_llm_response() helper function to construct OpenAI-compatible ChatCompletion responses with auto-generated IDs, timestamps, and token usage tracking.
Example Refactoring
examples/02_local_llm.py
Simplified response construction by replacing manual LLMResponse building with call to new build_openai_compatible_llm_response() helper; removed redundant imports for time, uuid, and OpenAI-specific modules.
SeedRL Example
examples/03_seedrl_swebench.py
New example implementing SeedRL-style async architecture with Learner class performing continuous batched inference via background thread, Actor class executing SWE-bench episodes, C51ValueHead module for value prediction, and InferenceRequest dataclass for request-response mediation.

Sequence Diagram

sequenceDiagram
    participant A as Actor
    participant L as Learner (main)
    participant Q as Request Queue
    participant BT as Batch Thread
    participant M as LLM Model
    participant E as SWE-bench Env

    A->>L: request_inference(actor_id, LLMRequest)
    L->>Q: enqueue InferenceRequest
    L-->>A: return Future[LLMResponse]
    
    BT->>Q: check queue (timeout or max_batch_size)
    BT->>BT: group requests by generation params
    BT->>BT: batch and pad requests
    BT->>M: generate (batched) in thread pool
    BT->>BT: compute C51 value head outputs
    BT->>BT: build OpenAI-compatible responses
    BT->>L: resolve futures with LLMResponses
    
    L-->>A: future.result() = LLMResponse
    
    A->>E: step(action)
    E-->>A: observation, reward, done
    A->>A: loop until done or max_steps
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Async requests now batch with care,
Learners and actors beyond compare,
Responses built with helper's grace,
SWE-bench runs at nimble pace!
Guidelines carved in Claude's wise scroll,
Code hops forward toward its goal.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main purpose of the PR: adding a SeedRL-style example for SWE-bench, which is demonstrated by the new examples/03_seedrl_swebench.py file containing the Learner and Actor architecture.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-seedrl-example

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/03_seedrl_swebench.py`:
- Around line 546-549: Guard the nullable observation before calling
request_inference: check ts.observation is not None in the loop around the call
to self.learner.request_inference(self.actor_id, ts.observation) and only invoke
request_inference when it is non-null; if it is None, either continue the loop
(skip/await next timestep) or assign a safe default action and/or log a warning
so tokenization in request_inference is never called with a None observation.
- Around line 249-250: The pairing of requests to responses uses zip(batch,
responses) which silently truncates on length mismatch; change these to
zip(batch, responses, strict=True) at the three occurrences (the loop using req,
response and calling req.response_future.set_result, and the other two analogous
loops around lines 371 and 476) so a ValueError is raised immediately on
misalignment and prevents leaving unresolved response_future promises; no other
API changes needed—just add strict=True to the zip calls where batch and
responses are iterated together.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc134caf-8f41-4e76-8c18-dd7d6f1a967f

📥 Commits

Reviewing files that changed from the base of the PR and between 93733b8 and f5f9328.

📒 Files selected for processing (4)
  • Claude.md
  • examples/02_local_llm.py
  • examples/03_seedrl_swebench.py
  • src/ares/llms/llm_clients.py

Comment on lines +249 to +250
for req, response in zip(batch, responses):
req.response_future.set_result(response)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Python version constraints:"
fd -i 'pyproject.toml' --exec rg -n "requires-python|target-version" {}

echo
echo "zip() call sites in examples/03_seedrl_swebench.py:"
rg -nP '\bzip\s*\(' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 396


🏁 Script executed:

# Read the relevant sections of the file to understand the context
echo "=== Context around line 249 ==="
sed -n '240,260p' examples/03_seedrl_swebench.py

echo
echo "=== Context around line 371 ==="
sed -n '365,380p' examples/03_seedrl_swebench.py

echo
echo "=== Context around line 476 ==="
sed -n '470,485p' examples/03_seedrl_swebench.py

# Check if there are any assertions or length checks in the file
echo
echo "=== Length/size assertions in the file ==="
rg -n 'len\(.*\)|assert|ValueError' examples/03_seedrl_swebench.py | head -20

Repository: withmartian/ares

Length of output: 3059


🏁 Script executed:

# Find the _run_batched_inference function
echo "=== _run_batched_inference implementation ==="
rg -A 15 'def _run_batched_inference' examples/03_seedrl_swebench.py

echo
echo "=== _run_single_param_batch implementation ==="
rg -A 15 'def _run_single_param_batch' examples/03_seedrl_swebench.py

# Check how tokenized_inputs is created
echo
echo "=== Tokenization context around line 476 ==="
sed -n '450,475p' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 2658


🏁 Script executed:

# Look at more context to see tokenized_inputs creation
echo "=== Full context for tokenized_inputs creation ==="
sed -n '380,475p' examples/03_seedrl_swebench.py | tail -50

Repository: withmartian/ares

Length of output: 2347


🏁 Script executed:

# Find where tokenized_inputs is defined/created
echo "=== Search for tokenized_inputs definition ==="
rg -n 'tokenized_inputs' examples/03_seedrl_swebench.py

echo
echo "=== Full _run_single_param_batch function ==="
sed -n '378,520p' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 6150


Use strict=True in zip() to fail fast on request/response misalignment.

At lines 249, 371, and 476, zip() without strict=True silently truncates if batch and response lengths mismatch. This leaves unresolved futures (line 252-256's exception handler shows the authors are aware of this risk), which can hang indefinitely in the async learner flow. The docstrings promise "one per request in the same order," so failures to honor this contract should surface immediately.

💡 Proposed fix
-                for req, response in zip(batch, responses):
+                for req, response in zip(batch, responses, strict=True):
                     req.response_future.set_result(response)
@@
-            for (orig_idx, _), response in zip(group, group_responses):
+            for (orig_idx, _), response in zip(group, group_responses, strict=True):
                 all_responses.append((orig_idx, response))
@@
-        for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs)):
+        for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs, strict=True)):

Python 3.12 (per pyproject.toml) supports strict=True (available since 3.10).

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 249-249: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/03_seedrl_swebench.py` around lines 249 - 250, The pairing of
requests to responses uses zip(batch, responses) which silently truncates on
length mismatch; change these to zip(batch, responses, strict=True) at the three
occurrences (the loop using req, response and calling
req.response_future.set_result, and the other two analogous loops around lines
371 and 476) so a ValueError is raised immediately on misalignment and prevents
leaving unresolved response_future promises; no other API changes needed—just
add strict=True to the zip calls where batch and responses are iterated
together.

Comment on lines +546 to +549
while not ts.last() and step_count < self.max_steps:
# Request inference from the centralized learner
action = await self.learner.request_inference(self.actor_id, ts.observation)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against None observations before requesting inference.

ts.observation is nullable by contract, but this path passes it directly into request_inference(). If it is None, inference crashes in tokenization.

💡 Proposed fix
         while not ts.last() and step_count < self.max_steps:
+            if ts.observation is None:
+                raise RuntimeError(
+                    f"[Actor {self.actor_id}] Missing observation before inference request."
+                )
             # Request inference from the centralized learner
             action = await self.learner.request_inference(self.actor_id, ts.observation)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/03_seedrl_swebench.py` around lines 546 - 549, Guard the nullable
observation before calling request_inference: check ts.observation is not None
in the loop around the call to self.learner.request_inference(self.actor_id,
ts.observation) and only invoke request_inference when it is non-null; if it is
None, either continue the loop (skip/await next timestep) or assign a safe
default action and/or log a warning so tokenization in request_inference is
never called with a None observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants