feat: Add minimal SeedRL-style example for SWE-bench by joshgreaves · Pull Request #10 · withmartian/ares

joshgreaves · 2026-01-12T23:34:55Z

Generated description

graph LR
main_("main"):::added
Learner_start_("Learner.start"):::added
Actor_run_episode_("Actor.run_episode"):::added
Learner_request_inference_("Learner.request_inference"):::added
Learner_run_batched_inference_("Learner._run_batched_inference"):::added
Learner_run_inference_("Learner._run_inference"):::added
build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added
C51ValueHead_forward_("C51ValueHead.forward"):::added
main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_
main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_
Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_
Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_
Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_
Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_
Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_
Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Adds a minimal SeedRL-style architecture example for the ares module's LLM integration, demonstrating a centralized Learner for batched inference and distributed Actors for running SWE-bench environment episodes. Introduces a new helper function, build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.

Topic Details

SeedRL Example

Implements a minimal SeedRL-style architecture for the SWE-bench environment, featuring a centralized Learner for batched LLM inference and distributed Actors to run environment episodes. This includes a new helper function build_openai_compatible_llm_response to standardize LLM responses, which is also retrofitted into an existing example.

Modified files (3)

examples/02_local_llm.py
examples/03_seedrl_swebench.py
src/ares/llms/llm_clients.py

Latest Contributors(1)

User	Commit	Date
[email protected]	Add-example-2-local-LL...	January 09, 2026

Coding Guidelines

Adds a new Claude.md document outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.

Modified files (1)

Claude.md

Latest Contributors(0)

User	Commit	Date

This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

New Features
- Added SWE-bench example demonstrating async inference batching with value estimation
- Improved LLM response construction for consistent output handling
Documentation
- Added comprehensive coding standards and guidelines documentation
Refactor
- Simplified LLM response creation across examples

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

baz-reviewer · 2026-01-13T01:16:11Z

+        response = llm_clients.LLMResponse(
+            chat_completion_response=openai.types.chat.chat_completion.ChatCompletion(
+                id=str(uuid.uuid4()),
+                choices=[
+                    openai.types.chat.chat_completion.Choice(
+                        message=openai.types.chat.chat_completion_message.ChatCompletionMessage(
+                            content=output_text,


The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?

Prompt for AI Agents:

In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse construction is duplicated inside the _run_inference and _run_batched_inference logic. Refactor by adding a new private helper method on Learner (e.g., def _build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens: int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse using self.model_name and uuid/time as currently done. Replace the inline response construction at lines 324-347 and 465-488 with calls to this helper, passing the calculated output_text and token counts, and update any variable names accordingly to remove duplication.

Finding type: Conciseness

Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2

Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.

baz-reviewer · 2026-01-13T01:16:11Z

+        # Get generation parameters from requests (use first request's params)
+        temperature = batch[0].request.temperature or 1.0
+        do_sample = True if batch[0].request.temperature else False
+
+        # Generate completions for the entire batch
+        with torch.no_grad():
+            outputs = self.model.generate(
+                input_ids=batched_input_ids,
+                attention_mask=batched_attention_mask,
+                max_new_tokens=2048,
+                temperature=temperature,
+                do_sample=do_sample,
+                pad_token_id=self.tokenizer.pad_token_id,


The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.

Finding type: Breaking Changes

Addresses PR #10 review feedback: - Add build_openai_compatible_llm_response() helper in llm_clients.py with full type hints and docstring for constructing LLMResponse from raw generation outputs - Refactor examples/02_local_llm.py to use the new helper, removing duplicate OpenAI object construction boilerplate - Refactor examples/03_seedrl_swebench.py to use the new helper in both single and batched inference paths - Fix batched inference contract: group requests by generation parameters (temperature, do_sample) and process each group separately to ensure parameters are respected per request while preserving original order - Pass compileall and ruff linter checks Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

baz-reviewer · 2026-01-13T18:52:51Z

+                temperature=req.request.temperature or 1.0,
+                do_sample=True if req.request.temperature else False,


Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.

Suggested change

temperature=req.request.temperature or 1.0,

do_sample=True if req.request.temperature else False,

temperature=1.0 if req.request.temperature is None else req.request.temperature,

do_sample=False if req.request.temperature == 0.0 else True,

Finding type: Logical Bugs

rsmith49 · 2026-04-10T03:51:07Z

@coderabbitai full review

coderabbitai · 2026-04-10T03:51:15Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-04-10T03:51:20Z

📝 Walkthrough

Walkthrough

Added comprehensive coding guidelines, introduced a helper function for OpenAI-compatible LLM response construction, refactored an example script to use the helper, and created a new SeedRL-style asynchronous example featuring batch inference with actor-learner architecture for SWE-bench environments.

Changes

Cohort / File(s)	Summary
Documentation `Claude.md`	New coding guidelines covering commenting philosophy (WHY over WHAT), type hints, async patterns, docstrings, logging, error handling, naming conventions, and project-specific patterns for containers, agents, and SWE-bench testing.
LLM Response Helper `src/ares/llms/llm_clients.py`	Added `build_openai_compatible_llm_response()` helper function to construct OpenAI-compatible `ChatCompletion` responses with auto-generated IDs, timestamps, and token usage tracking.
Example Refactoring `examples/02_local_llm.py`	Simplified response construction by replacing manual `LLMResponse` building with call to new `build_openai_compatible_llm_response()` helper; removed redundant imports for `time`, `uuid`, and OpenAI-specific modules.
SeedRL Example `examples/03_seedrl_swebench.py`	New example implementing SeedRL-style async architecture with `Learner` class performing continuous batched inference via background thread, `Actor` class executing SWE-bench episodes, `C51ValueHead` module for value prediction, and `InferenceRequest` dataclass for request-response mediation.

Sequence Diagram

sequenceDiagram
    participant A as Actor
    participant L as Learner (main)
    participant Q as Request Queue
    participant BT as Batch Thread
    participant M as LLM Model
    participant E as SWE-bench Env

    A->>L: request_inference(actor_id, LLMRequest)
    L->>Q: enqueue InferenceRequest
    L-->>A: return Future[LLMResponse]
    
    BT->>Q: check queue (timeout or max_batch_size)
    BT->>BT: group requests by generation params
    BT->>BT: batch and pad requests
    BT->>M: generate (batched) in thread pool
    BT->>BT: compute C51 value head outputs
    BT->>BT: build OpenAI-compatible responses
    BT->>L: resolve futures with LLMResponses
    
    L-->>A: future.result() = LLMResponse
    
    A->>E: step(action)
    E-->>A: observation, reward, done
    A->>A: loop until done or max_steps

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Async requests now batch with care,
Learners and actors beyond compare,
Responses built with helper's grace,
SWE-bench runs at nimble pace!
Guidelines carved in Claude's wise scroll,
Code hops forward toward its goal. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main purpose of the PR: adding a SeedRL-style example for SWE-bench, which is demonstrated by the new examples/03_seedrl_swebench.py file containing the Learner and Actor architecture.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/add-seedrl-example

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/03_seedrl_swebench.py`:
- Around line 546-549: Guard the nullable observation before calling
request_inference: check ts.observation is not None in the loop around the call
to self.learner.request_inference(self.actor_id, ts.observation) and only invoke
request_inference when it is non-null; if it is None, either continue the loop
(skip/await next timestep) or assign a safe default action and/or log a warning
so tokenization in request_inference is never called with a None observation.
- Around line 249-250: The pairing of requests to responses uses zip(batch,
responses) which silently truncates on length mismatch; change these to
zip(batch, responses, strict=True) at the three occurrences (the loop using req,
response and calling req.response_future.set_result, and the other two analogous
loops around lines 371 and 476) so a ValueError is raised immediately on
misalignment and prevents leaving unresolved response_future promises; no other
API changes needed—just add strict=True to the zip calls where batch and
responses are iterated together.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc134caf-8f41-4e76-8c18-dd7d6f1a967f

📥 Commits

Reviewing files that changed from the base of the PR and between 93733b8 and f5f9328.

📒 Files selected for processing (4)

Claude.md
examples/02_local_llm.py
examples/03_seedrl_swebench.py
src/ares/llms/llm_clients.py

coderabbitai · 2026-04-10T03:58:22Z

+                for req, response in zip(batch, responses):
+                    req.response_future.set_result(response)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Python version constraints:" fd -i 'pyproject.toml' --exec rg -n "requires-python|target-version" {} echo echo "zip() call sites in examples/03_seedrl_swebench.py:" rg -nP '\bzip\s*\(' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 396

🏁 Script executed:

# Read the relevant sections of the file to understand the context echo "=== Context around line 249 ===" sed -n '240,260p' examples/03_seedrl_swebench.py echo echo "=== Context around line 371 ===" sed -n '365,380p' examples/03_seedrl_swebench.py echo echo "=== Context around line 476 ===" sed -n '470,485p' examples/03_seedrl_swebench.py # Check if there are any assertions or length checks in the file echo echo "=== Length/size assertions in the file ===" rg -n 'len\(.*\)|assert|ValueError' examples/03_seedrl_swebench.py | head -20

Repository: withmartian/ares

Length of output: 3059

🏁 Script executed:

# Find the _run_batched_inference function echo "=== _run_batched_inference implementation ===" rg -A 15 'def _run_batched_inference' examples/03_seedrl_swebench.py echo echo "=== _run_single_param_batch implementation ===" rg -A 15 'def _run_single_param_batch' examples/03_seedrl_swebench.py # Check how tokenized_inputs is created echo echo "=== Tokenization context around line 476 ===" sed -n '450,475p' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 2658

🏁 Script executed:

# Look at more context to see tokenized_inputs creation echo "=== Full context for tokenized_inputs creation ===" sed -n '380,475p' examples/03_seedrl_swebench.py | tail -50

Repository: withmartian/ares

Length of output: 2347

🏁 Script executed:

# Find where tokenized_inputs is defined/created echo "=== Search for tokenized_inputs definition ===" rg -n 'tokenized_inputs' examples/03_seedrl_swebench.py echo echo "=== Full _run_single_param_batch function ===" sed -n '378,520p' examples/03_seedrl_swebench.py

Repository: withmartian/ares

Length of output: 6150

Use strict=True in zip() to fail fast on request/response misalignment.

At lines 249, 371, and 476, zip() without strict=True silently truncates if batch and response lengths mismatch. This leaves unresolved futures (line 252-256's exception handler shows the authors are aware of this risk), which can hang indefinitely in the async learner flow. The docstrings promise "one per request in the same order," so failures to honor this contract should surface immediately.

💡 Proposed fix

- for req, response in zip(batch, responses): + for req, response in zip(batch, responses, strict=True): req.response_future.set_result(response) @@ - for (orig_idx, _), response in zip(group, group_responses): + for (orig_idx, _), response in zip(group, group_responses, strict=True): all_responses.append((orig_idx, response)) @@ - for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs)): + for i, (req, original_inputs) in enumerate(zip(batch, tokenized_inputs, strict=True)):

Python 3.12 (per pyproject.toml) supports strict=True (available since 3.10).

🧰 Tools

🪛 Ruff (0.15.9)

[warning] 249-249: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/03_seedrl_swebench.py` around lines 249 - 250, The pairing of requests to responses uses zip(batch, responses) which silently truncates on length mismatch; change these to zip(batch, responses, strict=True) at the three occurrences (the loop using req, response and calling req.response_future.set_result, and the other two analogous loops around lines 371 and 476) so a ValueError is raised immediately on misalignment and prevents leaving unresolved response_future promises; no other API changes needed—just add strict=True to the zip calls where batch and responses are iterated together.

coderabbitai · 2026-04-10T03:58:22Z

+            while not ts.last() and step_count < self.max_steps:
+                # Request inference from the centralized learner
+                action = await self.learner.request_inference(self.actor_id, ts.observation)
+


⚠️ Potential issue | 🟠 Major

Guard against None observations before requesting inference.

ts.observation is nullable by contract, but this path passes it directly into request_inference(). If it is None, inference crashes in tokenization.

💡 Proposed fix

while not ts.last() and step_count < self.max_steps: + if ts.observation is None: + raise RuntimeError( + f"[Actor {self.actor_id}] Missing observation before inference request." + ) # Request inference from the centralized learner action = await self.learner.request_inference(self.actor_id, ts.observation)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/03_seedrl_swebench.py` around lines 546 - 549, Guard the nullable observation before calling request_inference: check ts.observation is not None in the loop around the call to self.learner.request_inference(self.actor_id, ts.observation) and only invoke request_inference when it is non-null; if it is None, either continue the loop (skip/await next timestep) or assign a safe default action and/or log a warning so tokenization in request_inference is never called with a None observation.

feat: Add minimal SeedRL-style example for SWE-bench

dbe5b46

baz-reviewer bot approved these changes Jan 12, 2026

View reviewed changes

feat: Implement continuous batching for SeedRL inference

7a002ee

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

baz-reviewer bot reviewed Jan 13, 2026

View reviewed changes

Rowan and others added 2 commits January 13, 2026 17:28

docs: Add Claude.md with commenting guidelines

f5f9328

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

baz-reviewer bot reviewed Jan 13, 2026

View reviewed changes

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

		temperature=req.request.temperature or 1.0,
		do_sample=True if req.request.temperature else False,

		for req, response in zip(batch, responses):
		req.response_future.set_result(response)

Conversation

joshgreaves commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generated description

Summary by CodeRabbit

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

joshgreaves Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

rsmith49 commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshgreaves commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading