[codex] Implement Symphony Elixir runtime parity#2
Conversation
WalkthroughComplete migration from a Python implementation to an Elixir-based Symphony: new Mix project, Elixir libraries and escript CLI, comprehensive orchestrator, Codex client and tracker integrations, workspace/repo planning, HTTP server and self‑healing, many new ExUnit tests, workflow/docs, macOS launchd plists; removes previous Python modules, tests and pyproject configuration. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI (escript)
participant Orch as Orchestrator
participant Agent as AgentRunner
participant WS as Workspace Manager
participant Codex as CodexClient
participant Tracker as Linear Tracker
participant HTTP as HTTPServer
CLI->>Orch: start (config, workflow)
Orch->>Tracker: fetch candidate issues
Orch->>Agent: dispatch issue
Agent->>WS: create/materialize workspace & repo plan
Agent->>Codex: start_session / run_turn
Codex-->>Agent: streamed notifications / turn result
Agent->>Tracker: refresh issue state or write Codex workpad (fallback)
Orch->>HTTP: serve snapshot / accept refresh requests
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Pull request overview
This PR migrates the Symphony runner from the legacy Python implementation to an Elixir/Mix-based implementation, aiming to restore runtime behavior parity (OTP orchestration, dashboard/state APIs, repo planning, retry/block handling) while updating workflow guidance for credentialed production data operations.
Changes:
- Removes the Python package, implementation modules, and pytest suite.
- Adds a full Elixir application (Mix project) plus ExUnit test coverage for core subsystems (workspace, tracker, review, repo planner, Codex client, CLI).
- Updates docs/workflows/launchd examples to reflect the Elixir escript-based runner and revised operational guidance.
Reviewed changes
Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_workspace.py | Removes legacy Python workspace tests. |
| tests/test_workflow_config_template.py | Removes legacy Python workflow/config/template tests. |
| tests/test_tracker.py | Removes legacy Python tracker tests. |
| tests/test_review.py | Removes legacy Python review resolver tests. |
| tests/test_orchestrator.py | Removes legacy Python orchestrator tests. |
| tests/test_codex_client.py | Removes legacy Python Codex client tests. |
| tests/test_agent_runner.py | Removes legacy Python agent-runner unit tests. |
| test/workspace_test.exs | Adds ExUnit coverage for workspace creation, hooks, repo materialization, and git guards/timeouts. |
| test/workflow_config_template_test.exs | Adds ExUnit coverage for workflow parsing, config resolution/reload validation, templating, and coding-context augmentation. |
| test/tracker_test.exs | Adds ExUnit coverage for Linear GraphQL + Linear MCP tracker behavior and Codex MCP gateway decoding/approvals. |
| test/review_test.exs | Adds ExUnit coverage for PR evidence resolution, dependency parsing, and merge gating logic. |
| test/repo_planner_http_test.exs | Adds ExUnit coverage for repo planning (rules/LLM/fallback) plus HTTP state/dashboard endpoints. |
| test/codex_client_test.exs | Adds ExUnit coverage for Codex JSONL session behavior (approvals, tool input, timeouts). |
| test/cli_test.exs | Adds ExUnit coverage for CLI non-zero exit on startup errors. |
| test/agent_runner_test.exs | Adds ExUnit coverage for agent runner flow, delivery fallback, and blocker detection. |
| test/test_helper.exs | Starts ExUnit for the new Elixir test suite. |
| symphony/workflow.py | Removes Python workflow loader. |
| symphony/utils.py | Removes Python utilities (time, sanitization, redaction, tool input helpers). |
| symphony/templating.py | Removes Python Liquid-based prompt rendering and continuation prompt. |
| symphony/review.py | Removes Python PR inspection/resolution logic. |
| symphony/repo_planner.py | Removes Python repo planner (rules/LLM) and prompt injection logic. |
| symphony/models.py | Removes Python dataclass models for issues/state/repo plans. |
| symphony/logging.py | Removes Python logging helpers. |
| symphony/errors.py | Removes Python exception hierarchy. |
| symphony/dashboard_summary.py | Removes Python dashboard summarization logic. |
| symphony/coding_context.py | Removes Python coding-context classification/injection logic. |
| symphony/cli.py | Removes Python CLI entrypoint. |
| symphony/agent_runner.py | Removes Python agent runner implementation. |
| symphony/main.py | Removes Python module entrypoint. |
| symphony/init.py | Removes Python package init/exports. |
| pyproject.toml | Removes Python packaging/test configuration. |
| mix.lock | Adds Elixir dependency lockfile. |
| mix.exs | Adds Mix project configuration (deps, escript, coverage tool). |
| lib/symphony/workflow.ex | Adds Elixir workflow parsing/loading and mtime tracking. |
| lib/symphony/utils.ex | Adds Elixir utilities (time parsing, sanitization, redaction, tool input helpers). |
| lib/symphony/templating.ex | Adds Elixir prompt rendering and continuation prompt. |
| lib/symphony/review.ex | Adds Elixir PR parsing/inspection and merge gating logic. |
| lib/symphony/models.ex | Adds Elixir models (Issue, RepoPlan, runtime state structs). |
| lib/symphony/logging.ex | Adds Elixir logging wrapper with key-value formatting. |
| lib/symphony/error.ex | Adds Elixir exception type with stable error code/message. |
| lib/symphony/dashboard_summary.ex | Adds Elixir activity summarization via Codex client. |
| lib/symphony/coding_context.ex | Adds Elixir coding-context classification and prompt augmentation. |
| lib/symphony/codex_client.ex | Adds Elixir Codex JSONL/port-based session management. |
| lib/symphony/tracker.ex | Adds Elixir tracker implementations (Linear GraphQL, Linear MCP via Codex gateway). |
| lib/symphony/repo_planner.ex | Adds Elixir repo planning (rules/LLM/fallback) and prompt guardrails. |
| lib/symphony/workspace.ex | Adds Elixir workspace manager (hooks, repo materialization, quarantine, git branch guard). |
| lib/symphony/orchestrator.ex | Adds Elixir OTP orchestrator with state persistence/snapshots and dashboard integration. |
| lib/symphony/http_server.ex | Adds Elixir HTTP server endpoints for state/dashboard. |
| lib/symphony/cli.ex | Adds Elixir CLI and escript entry. |
| lib/symphony/agent_runner.ex | Adds Elixir agent runner with delivery fallback and blocker detection. |
| lib/symphony.ex | Adds top-level Elixir module and version. |
| WORKFLOW.md | Adds/updates the primary workflow definition used by the runner. |
| WORKFLOW.linear-mcp.example.md | Updates workflow example with credentialed/data operation guidance. |
| docs/IMPLEMENTATION.md | Updates implementation notes to reflect Elixir rewrite and runtime snapshot behavior. |
| README.md | Updates README for Elixir install/run/test instructions and project layout. |
| launchd/com.symphony.linear-mcp.example.plist | Updates launchd example to invoke the built escript binary. |
| launchd/com.caretta.symphony.local.plist | Adds a local launchd example plist. |
| .gitignore | Adds Elixir build/deps artifacts and the escript output to ignores. |
| .formatter.exs | Adds Elixir formatter configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defp parse_workflow(raw), do: {%{}, raw} | ||
|
|
||
| def mtime(path) do | ||
| case File.stat(path, time: :nanosecond) do |
There was a problem hiding this comment.
Actionable comments posted: 20
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (2)
lib/symphony/cli.ex-17-17 (1)
17-17:⚠️ Potential issue | 🟡 Minor
--log_levelis accepted but functionally ignored.Line 17 advertises a runtime control that never takes effect, which is misleading in production ops.
Proposed fix
defmodule Symphony.CLI do `@moduledoc` false + require Logger alias Symphony.Config.ConfigManager alias Symphony.HTTPServer alias Symphony.Orchestrator @@ def run(argv) do {opts, args, _invalid} = OptionParser.parse(argv, switches: [port: :integer, log_level: :string, once: :boolean], aliases: [p: :port] ) + + case Keyword.get(opts, :log_level) do + nil -> :ok + "debug" -> Logger.configure(level: :debug) + "info" -> Logger.configure(level: :info) + "warning" -> Logger.configure(level: :warning) + "error" -> Logger.configure(level: :error) + level -> raise Symphony.Error, code: :invalid_log_level, message: "invalid --log_level: #{level}" + end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/cli.ex` at line 17, The CLI advertises a --log_level switch in the switches list but never applies it; update the CLI parsing flow to read the :log_level value from the parsed options (where switches: [port: :integer, log_level: :string, once: :boolean] is defined), validate it against allowed levels ("debug","info","warn","error"), convert it to an atom (e.g. String.to_existing_atom/1 or String.to_atom after validation) and call Logger.configure(level: level_atom) (or Logger.level/1) early in the startup path (e.g. in the function that handles parsed args such as parse_args/1 or main/1) so the provided --log_level actually controls the runtime logger level. Ensure invalid values produce a clear error/exit.lib/symphony/dashboard_summary.ex-49-60 (1)
49-60:⚠️ Potential issue | 🟡 MinorNormalise quoted booleans before setting
needs_human.Line 56 turns any non-
nilJSON value intotrue, so"false"becomestrue. Models drift into quoted booleans often enough that this will produce bogus dashboard escalations.Possible fix
%__MODULE__{ summary: Utils.truncate( to_string(data["summary"] || "No substantive activity has been summarized yet."), 800 ), current_step: Utils.truncate(to_string(data["current_step"] || "Unknown."), 300), - needs_human: !!data["needs_human"], + needs_human: normalize_boolean(data["needs_human"]), human_reason: if(data["human_reason"], do: Utils.truncate(to_string(data["human_reason"]), 500)), risk: normalize_risk(data["risk"]), confidence: normalize_confidence(data["confidence"]) } @@ def normalize_confidence(value) do case Utils.to_float(value) do nil -> nil float -> float |> max(0.0) |> min(1.0) end end + + defp normalize_boolean(value) when value in [true, false], do: value + defp normalize_boolean(nil), do: false + + defp normalize_boolean(value) when is_binary(value) do + case String.downcase(String.trim(value)) do + "true" -> true + "false" -> false + _ -> true + end + end + + defp normalize_boolean(_value), do: true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/dashboard_summary.ex` around lines 49 - 60, The needs_human assignment currently coerces any non-nil JSON value to true (data["needs_human"]) so quoted booleans like "false" become true; update the DashboardSummary construction to normalize quoted booleans first (e.g., add or use a helper like Utils.parse_bool / parse_boolean) and set needs_human to the parsed boolean (treat "true"/true -> true, "false"/false -> false, nil/unknown -> false) instead of using double-bang on the raw JSON value; call the parser where needs_human is assigned so the struct receives a real boolean.
🧹 Nitpick comments (4)
lib/symphony.ex (1)
11-13: Avoid a second source of truth for versioning.Line 11 and Line 13 hard-code the version in-module. That will drift from release metadata sooner or later. Prefer deriving this value from project/app metadata at build time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony.ex` around lines 11 - 13, The module currently hard-codes `@version` "0.1.0" and returns it from version/0, creating a second source-of-truth; replace the literal with a value derived from build/project metadata (e.g., use Mix.Project.config()[:version] at compile time or Application.spec(:symphony, :vsn) at runtime) so version/0 always reflects the package version. Update the `@version` attribute and/or the implementation of version/0 in the Symphony module accordingly, keeping the symbol names (`@version` and version/0) so callers are unchanged.launchd/com.caretta.symphony.local.plist (1)
21-22: Consider a safer restart policy than unconditionalKeepAlive=true.Line 21-22 will aggressively restart on persistent startup failures. Add throttling/conditional keep-alive to reduce churn and noisy logs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@launchd/com.caretta.symphony.local.plist` around lines 21 - 22, Replace the unconditional <key>KeepAlive</key><true/> with a conditional KeepAlive dictionary to avoid aggressive restarts: change KeepAlive from a boolean to a dict that specifies restart conditions (for example include a SuccessfulExit entry to avoid restarting after clean exits) and add a throttling key (ThrottleInterval or similar launchd throttling option) so repeated failures are rate-limited; update the KeepAlive element in the plist accordingly so the service only restarts under defined conditions and with a backoff interval.lib/symphony/cli.ex (1)
15-19: Invalid CLI options are silently dropped.You already capture
_invalid; letting typos pass quietly is an operator trap. Fail fast and print a precise error.Proposed fix
- {opts, args, _invalid} = + {opts, args, invalid} = OptionParser.parse(argv, switches: [port: :integer, log_level: :string, once: :boolean], aliases: [p: :port] ) + + if invalid != [] do + formatted = + invalid + |> Enum.map(fn + {k, nil} -> "--#{k}" + {k, v} -> "--#{k}=#{v}" + end) + |> Enum.join(", ") + + raise Symphony.Error, code: :invalid_cli_args, message: "unrecognised options: #{formatted}" + end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/cli.ex` around lines 15 - 19, Option parsing currently ignores the third return value from OptionParser.parse (bound as _invalid), letting unknown/typo options be dropped silently; change the code around the OptionParser.parse call to inspect the invalid list (third tuple element) and, if non-empty, print a clear error message listing the invalid switches and exit non-zero (use the same context where argv/opts are used); reference the OptionParser.parse call and the variables _invalid/opts/args to locate the code and replace the silent discard with a fail-fast log + System.halt(1) (or the project’s standard exit mechanism).lib/symphony/workflow.ex (1)
40-41: Front-matter detection is broader than intended.Line 40 accepts any file starting with
"---"(including malformed delimiters like"---abc"). Treat only an exact first-line delimiter as YAML front matter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/workflow.ex` around lines 40 - 41, The current parse_workflow clause incorrectly treats any string starting with "---" as YAML front-matter; change detection to only accept an exact first-line delimiter. Replace the head pattern match (defp parse_workflow("---" <> _ = raw)) with a single clause that splits raw into lines (String.split(raw, ~r/\R/, trim: false)) and then branch on the first line being exactly "---" (e.g., case lines do ["---" | rest] -> handle front-matter; _ -> handle no front-matter end), keeping the rest of the parsing logic in the respective branches and retaining the parse_workflow function name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@launchd/com.caretta.symphony.local.plist`:
- Around line 10-15: The plist currently contains user-specific absolute paths
(/Users/omarelamin/...) in the top <string> and inside ProgramArguments; remove
those hard-coded personal paths and replace them with a portable placeholder or
standard install location (e.g., ${SYMPHONY_INSTALL_DIR} or
/usr/local/bin/symphony) and a relative or templated path for the WORKFLOW file,
update the ProgramArguments entries (and the standalone <string> entry) in
launchd/com.caretta.symphony.local.plist accordingly, and add a note in
deployment docs explaining how to substitute the placeholder at install time.
In `@lib/symphony/agent_runner.ex`:
- Around line 407-434: The try_delivery_fallback function currently swallows all
errors with rescue _ -> false which can hide partial writes (e.g.,
save_issue_comment/3 succeeds but save_issue_state/2 fails); change
try_delivery_fallback to stop rescuing everything and instead perform the
tracker calls in a with/chain so failures propagate or return a tagged error
(e.g., {:error, reason})—use call_tracker(:save_issue_comment, ...) and then
call_tracker(:save_issue_state, ...) inside a with that returns true on success
and returns or bubbles up the actual error from call_tracker on failure; remove
the blanket rescue and ensure callers handle the {:error, reason} or allow the
exception to propagate so half-deliveries are not silently ignored.
In `@lib/symphony/codex_client.ex`:
- Around line 218-239: receive_response/3 currently reuses the full timeout_ms
on each recursive call which can block forever if many notifications arrive;
change it to compute a monotonic deadline at the first call (using
System.monotonic_time(:millisecond) or :erlang.monotonic_time(:millisecond)) and
pass the deadline or remaining_timeout through recursive calls, recomputing
remaining = deadline - now before each read_message call and before recursing
into handle_server_request/handle_notification branches, and raise or return a
timeout error when remaining <= 0; update call sites that invoke
receive_response (e.g., initialize and turn/start) to pass the initial
timeout_ms as the remaining budget or new deadline as needed so read_message and
recursion never reuse the full original timeout repeatedly.
- Around line 314-323: The reader currently only checks the incoming chunk size
but not the accumulated buffer; update the receive branch in read_message to
compute combined_size = byte_size(session.buffer) + byte_size(data) and compare
that against Utils.jsonl_read_limit_bytes() before concatenating, raising Error
(code: :response_error, message: "app-server JSONL message exceeded reader
limit") if the combined size exceeds the limit; ensure you still use
read_message(%{session | buffer: session.buffer <> data}, timeout_ms) when the
combined size is within the limit.
In `@lib/symphony/coding_context.ex`:
- Around line 196-224: The code currently calls
CodexClient.stop_session(session) twice (once immediately after
CodexClient.run_turn and again in the after block), which can cause errors if
stop_session/1 is not idempotent; remove the inline call to
CodexClient.stop_session(session) that appears right after
CodexClient.run_turn(session, ...) and keep the cleanup in the after block so
the session is stopped only once while leaving the rest of the logic
(parse_classifier_json(result.agent_message_text), Utils.to_float handling, and
returning the %CodingClassification{}) unchanged.
In `@lib/symphony/config.ex`:
- Around line 651-658: resolve_path/2 currently converts a nil result from
resolve_env_reference/2 into the workflow directory by calling to_string(nil)
and Path.expand, so missing env refs silently point at the workflow folder;
change resolve_path/2 to detect when value is non-nil but
resolve_env_reference(environ) returns nil and fail fast (raise ArgumentError or
return an explicit error) with a clear message that includes the original value
and context (e.g., the :workflow_dir and which key was being resolved) instead
of proceeding to Path.expand; update any callers of resolve_path/2 accordingly
if they expect exceptions vs error tuples.
In `@lib/symphony/http_server.ex`:
- Around line 57-60: The code is waiting unnecessarily because drain_headers/1
always performs a fresh recv even when the initial :gen_tcp.recv/3 returned the
full header block; update the logic so the bytes already read in request_line
are inspected for the header terminator "\r\n\r\n" and, if present, avoid
further reads. Concretely, change the drain call to a variant that accepts the
initial buffer (e.g., drain_headers/2 or modify drain_headers to accept
request_line) and have drain_headers check request_line for the end-of-headers
sequence and only perform additional :gen_tcp.recv/3 calls if the headers are
incomplete; keep the existing behavior for sockets that need more bytes.
- Line 523: The escape map in the esc helper arrow function is using an invalid
key `"\\\""` causing a SyntaxError; update the replacement object inside the esc
= (value) => ... function so the double-quote key is a valid string (e.g.,
change the problematic entry from the escaped-backslash key to a proper
double-quote key like '"' : """ or use single-quoted '"' : """),
ensuring the map becomes
{"&":"&","<":"<",">":">","\"":""","'":"'"} so the esc
function (esc) parses and runs correctly.
In `@lib/symphony/orchestrator.ex`:
- Around line 1138-1162: process_due_retries currently only checks that the
refreshed issue is in an active state before dispatching, which allows persisted
retries to be re-dispatched even when other async gates (labels/Todo
blockers/claimed) should block them; mirror the safer conditional used in
handle_info({:retry_due, ...}) by adding the same gate checks used there (e.g.
claimed/lock checks, required-labels/Todo blockers) when deciding to call
dispatch_issue_sync(orchestrator, issue, tracker, retry.attempt), and in the
else branch make sure to clear the retry's claimed state from
orchestrator.state.retry_attempts and call persist_state so a later reopen isn't
permanently undispatchable; use the same helper functions/conditions referenced
by handle_info to keep behavior consistent with dispatch_issue_sync,
refresh_retry_issue, available_global_slots, and persist_state.
- Around line 1072-1100: dispatch_issue_sync/4 currently hard-codes
Symphony.AgentRunner.new(...), which ignores any injected :agent_runner
override; change the code to use the orchestrator's injected agent runner (e.g.
get module with Map.get(orchestrator, :agent_runner, Symphony.AgentRunner) or
access orchestrator.agent_runner) and call that module's new(...) and
run_issue(...) so retries/tick behavior matches the async path (replace
Symphony.AgentRunner.new(orchestrator.config_manager, tracker) |> ... with the
injected runner's new call and subsequent run_issue invocation).
In `@lib/symphony/repo_planner.ex`:
- Around line 381-383: The current truthy?/1 considers the string "false" as
true, breaking planner boolean normalization; update the truthy?/1
implementation so it treats string values case-insensitively and only returns
true for explicit true values (e.g., true, "true", "1", 1, "yes") and returns
false for "false", "0", "no", nil, false, and other non-explicit-true inputs;
then ensure places that call truthy? (e.g., the checks around
coding_task/is_coding_task, needs_human, edit_allowed in the repo planner logic)
will use the corrected truthy?/1 so the planner respects explicit "false"
strings.
- Around line 367-378: After deduping secondary and read_only with dedupe_items,
resolve cross-list conflicts by removing any repos from read_only whose slug
appears in secondary (so a repo won't be marked both "you may edit" and "do not
edit"); update the block handling secondary/read_only (near variables secondary,
read_only, dedupe_items and primary.slug) to compute the set of secondary slugs
and then filter read_only to reject entries with those slugs (keeping secondary
as the authoritative editable role).
In `@lib/symphony/review.ex`:
- Around line 93-100: The call to System.cmd in run_gh_json lacks a timeout and
can hang; add a timeout option (e.g. timeout: 15_000) to the System.cmd("gh",
args, ...) options list so external gh invocations have a deadline, and ensure
the existing error/rescue paths still return nil on timeout or non-zero exit;
update the System.cmd invocation inside run_gh_json to include the timeout and
keep stderr_to_stdout: true and the Jason.decode!/nil handling.
In `@lib/symphony/templating.ex`:
- Around line 24-25: The clause matching {:ok, value} currently calls
to_string(value) which will crash for lists/maps; change it to safely convert
scalars and fall back to a non-crashing representation for composites: e.g. in
the {:ok, value} -> branch replace to_string(value) with a safe conversion that
returns binaries as-is, numbers/atoms via to_string/1, and everything else via
inspect(value, pretty: false) (or return a structured {:error,
:template_render_error, value} if you prefer preserving the error flow) so
lists/maps no longer raise during template rendering; locate the pattern
matching the {:ok, value} tuple in templating.ex and update that branch
accordingly.
In `@lib/symphony/tracker.ex`:
- Around line 599-603: read_message/1 is calling Jason.decode! and letting
Jason.DecodeError escape (bypassing call_tool/3's rescue of %Symphony.Error{});
change read_message/1 to wrap JSON decode failures in a %Symphony.Error{} (same
pattern used by Symphony.CodexClient.decode_line/1) so malformed gateway lines
are converted to Symphony.Error and go through the existing retry/handling;
apply the same change to the other decode site around lines 680-683 so both
Jason.decode! usages are wrapped/converted to Symphony.Error instead of letting
Jason exceptions bubble up.
In `@lib/symphony/workflow.ex`:
- Around line 92-95: The mtime/1 function calls File.stat with an invalid option
(:nanosecond) causing the call to fail and return nil; update the call in
mtime/1 (in lib/symphony/workflow.ex) to use a valid time option such as :posix
(e.g., File.stat(path, time: :posix)) or remove the time option entirely, keep
the {:ok, stat} -> stat.mtime match, and ensure the fallback still returns nil
on error so reload/change detection works again.
In `@lib/symphony/workspace.ex`:
- Around line 241-277: The existing-checkout path currently leaves the workspace
on whatever branch is checked out (current_branch) which can be wrong; update
the block after computing current_branch (involving git_output, repo_path,
repo_config, expected_branch, and install_pre_push_guard) to validate that
current_branch equals expected_branch and, if not, programmatically check out
the expected branch: attempt a normal git checkout of expected_branch via
git_output and, if that fails (branch missing), create/reset the local branch to
track origin/expected_branch (e.g., git checkout -B expected_branch
origin/expected_branch) so the returned map's "current_branch" is the
expected_branch and "branch_prepared" remains false until further prep.
In `@WORKFLOW.linear-mcp.example.md`:
- Around line 167-168: Update the two workflow bullets that currently require
"dry-run/apply output" (the paragraph starting "For Supabase/Postgres data
migrations..." and the second instance at line 202) to require "redacted
evidence" instead of raw data: change wording to insist on redacted
dry-run/apply logs or summarized verification artifacts that omit or mask row
contents, identifiers, PII, and any customer data, and include examples of
acceptable evidence (screenshots with redactions, checksums/counts, audit IDs,
or anonymized diffs) and an explicit note that full raw outputs must never be
uploaded to the issue tracker; ensure both occurrences (the existing migration
bullet and the later rule about moving issues to In Review) reference "redacted
evidence" and the same acceptable-evidence examples so reviewers know what to
expect.
- Around line 162-166: Update the guidance that currently says "inspect local
repo `.env*` files" to explicitly restrict checks to only the presence of
`.env*` files and the names of variables (no file contents or values), require
explicit redaction of any discovered variable values, and forbid opening or
pasting raw `.env*` contents into agent/tool transcripts; ensure the revised
sentence references `.env*` and variable name checks, instructs to only log
existence and variable keys, and reiterates "Never paste secret values" so tools
like terminal summaries, Linear, or PRs never include secret values.
In `@WORKFLOW.md`:
- Around line 12-13: The WORKFLOW.md currently embeds operator-specific/local
values (e.g., mcp_command and mcp_server using absolute/local paths), making the
workflow non-portable; update WORKFLOW.md to replace any hard-coded local paths
and machine-bound commands (mcp_command) and host identifiers (mcp_server) with
generic placeholders or variable references, move operator-specific overrides
into an ignored overlay like WORKFLOW.local.md or use environment
interpolation/placeholders and document the expected env vars or secret keys so
local values are not committed.
---
Minor comments:
In `@lib/symphony/cli.ex`:
- Line 17: The CLI advertises a --log_level switch in the switches list but
never applies it; update the CLI parsing flow to read the :log_level value from
the parsed options (where switches: [port: :integer, log_level: :string, once:
:boolean] is defined), validate it against allowed levels
("debug","info","warn","error"), convert it to an atom (e.g.
String.to_existing_atom/1 or String.to_atom after validation) and call
Logger.configure(level: level_atom) (or Logger.level/1) early in the startup
path (e.g. in the function that handles parsed args such as parse_args/1 or
main/1) so the provided --log_level actually controls the runtime logger level.
Ensure invalid values produce a clear error/exit.
In `@lib/symphony/dashboard_summary.ex`:
- Around line 49-60: The needs_human assignment currently coerces any non-nil
JSON value to true (data["needs_human"]) so quoted booleans like "false" become
true; update the DashboardSummary construction to normalize quoted booleans
first (e.g., add or use a helper like Utils.parse_bool / parse_boolean) and set
needs_human to the parsed boolean (treat "true"/true -> true, "false"/false ->
false, nil/unknown -> false) instead of using double-bang on the raw JSON value;
call the parser where needs_human is assigned so the struct receives a real
boolean.
---
Nitpick comments:
In `@launchd/com.caretta.symphony.local.plist`:
- Around line 21-22: Replace the unconditional <key>KeepAlive</key><true/> with
a conditional KeepAlive dictionary to avoid aggressive restarts: change
KeepAlive from a boolean to a dict that specifies restart conditions (for
example include a SuccessfulExit entry to avoid restarting after clean exits)
and add a throttling key (ThrottleInterval or similar launchd throttling option)
so repeated failures are rate-limited; update the KeepAlive element in the plist
accordingly so the service only restarts under defined conditions and with a
backoff interval.
In `@lib/symphony.ex`:
- Around line 11-13: The module currently hard-codes `@version` "0.1.0" and
returns it from version/0, creating a second source-of-truth; replace the
literal with a value derived from build/project metadata (e.g., use
Mix.Project.config()[:version] at compile time or Application.spec(:symphony,
:vsn) at runtime) so version/0 always reflects the package version. Update the
`@version` attribute and/or the implementation of version/0 in the Symphony module
accordingly, keeping the symbol names (`@version` and version/0) so callers are
unchanged.
In `@lib/symphony/cli.ex`:
- Around line 15-19: Option parsing currently ignores the third return value
from OptionParser.parse (bound as _invalid), letting unknown/typo options be
dropped silently; change the code around the OptionParser.parse call to inspect
the invalid list (third tuple element) and, if non-empty, print a clear error
message listing the invalid switches and exit non-zero (use the same context
where argv/opts are used); reference the OptionParser.parse call and the
variables _invalid/opts/args to locate the code and replace the silent discard
with a fail-fast log + System.halt(1) (or the project’s standard exit
mechanism).
In `@lib/symphony/workflow.ex`:
- Around line 40-41: The current parse_workflow clause incorrectly treats any
string starting with "---" as YAML front-matter; change detection to only accept
an exact first-line delimiter. Replace the head pattern match (defp
parse_workflow("---" <> _ = raw)) with a single clause that splits raw into
lines (String.split(raw, ~r/\R/, trim: false)) and then branch on the first line
being exactly "---" (e.g., case lines do ["---" | rest] -> handle front-matter;
_ -> handle no front-matter end), keeping the rest of the parsing logic in the
respective branches and retaining the parse_workflow function name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 58ed8ce7-b6fb-499c-ba70-0fb816252cb5
⛔ Files ignored due to path filters (1)
mix.lockis excluded by!**/*.lock
📒 Files selected for processing (66)
.formatter.exs.gitignoreREADME.mdWORKFLOW.linear-mcp.example.mdWORKFLOW.mddocs/IMPLEMENTATION.mdlaunchd/com.caretta.symphony.local.plistlaunchd/com.symphony.linear-mcp.example.plistlib/symphony.exlib/symphony/agent_runner.exlib/symphony/cli.exlib/symphony/codex_client.exlib/symphony/coding_context.exlib/symphony/config.exlib/symphony/dashboard_summary.exlib/symphony/error.exlib/symphony/http_server.exlib/symphony/logging.exlib/symphony/models.exlib/symphony/orchestrator.exlib/symphony/repo_planner.exlib/symphony/review.exlib/symphony/templating.exlib/symphony/tracker.exlib/symphony/utils.exlib/symphony/workflow.exlib/symphony/workspace.exmix.exspyproject.tomlsymphony/__init__.pysymphony/__main__.pysymphony/agent_runner.pysymphony/cli.pysymphony/codex_client.pysymphony/coding_context.pysymphony/config.pysymphony/dashboard_summary.pysymphony/errors.pysymphony/http_server.pysymphony/logging.pysymphony/models.pysymphony/orchestrator.pysymphony/repo_planner.pysymphony/review.pysymphony/templating.pysymphony/tracker.pysymphony/utils.pysymphony/workflow.pysymphony/workspace.pytest/agent_runner_test.exstest/cli_test.exstest/codex_client_test.exstest/orchestrator_test.exstest/repo_planner_http_test.exstest/review_test.exstest/test_helper.exstest/tracker_test.exstest/workflow_config_template_test.exstest/workspace_test.exstests/test_agent_runner.pytests/test_codex_client.pytests/test_orchestrator.pytests/test_review.pytests/test_tracker.pytests/test_workflow_config_template.pytests/test_workspace.py
💤 Files with no reviewable changes (28)
- symphony/main.py
- pyproject.toml
- tests/test_agent_runner.py
- symphony/init.py
- symphony/cli.py
- symphony/logging.py
- symphony/http_server.py
- symphony/templating.py
- symphony/utils.py
- symphony/workflow.py
- symphony/repo_planner.py
- tests/test_review.py
- symphony/config.py
- symphony/workspace.py
- symphony/codex_client.py
- tests/test_tracker.py
- symphony/models.py
- symphony/orchestrator.py
- symphony/dashboard_summary.py
- tests/test_codex_client.py
- tests/test_workflow_config_template.py
- tests/test_workspace.py
- symphony/tracker.py
- symphony/errors.py
- symphony/coding_context.py
- symphony/agent_runner.py
- symphony/review.py
- tests/test_orchestrator.py
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following</string> | ||
|
|
||
| <key>ProgramArguments</key> | ||
| <array> | ||
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/symphony</string> | ||
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/WORKFLOW.md</string> |
There was a problem hiding this comment.
Do not commit user-specific absolute paths.
Line 10 and Line 14-15 bake in a personal filesystem path (/Users/omarelamin/...). That is both non-portable and a mild PII leak in-repo. Use deploy paths or placeholders in committed launchd configs.
Suggested fix
- <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following</string>
+ <string>/opt/symphony/runner</string>
@@
- <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/symphony</string>
- <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/WORKFLOW.md</string>
+ <string>/opt/symphony/runner/symphony</string>
+ <string>/opt/symphony/config/WORKFLOW.md</string>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following</string> | |
| <key>ProgramArguments</key> | |
| <array> | |
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/symphony</string> | |
| <string>/Users/omarelamin/Documents/Codex/2026-04-28/implement-symphony-according-to-the-following/WORKFLOW.md</string> | |
| <string>/opt/symphony/runner</string> | |
| <key>ProgramArguments</key> | |
| <array> | |
| <string>/opt/symphony/runner/symphony</string> | |
| <string>/opt/symphony/config/WORKFLOW.md</string> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@launchd/com.caretta.symphony.local.plist` around lines 10 - 15, The plist
currently contains user-specific absolute paths (/Users/omarelamin/...) in the
top <string> and inside ProgramArguments; remove those hard-coded personal paths
and replace them with a portable placeholder or standard install location (e.g.,
${SYMPHONY_INSTALL_DIR} or /usr/local/bin/symphony) and a relative or templated
path for the WORKFLOW file, update the ProgramArguments entries (and the
standalone <string> entry) in launchd/com.caretta.symphony.local.plist
accordingly, and add a note in deployment docs explaining how to substitute the
placeholder at install time.
| def try_delivery_fallback(tracker, %Issue{} = issue, agent_message_text, opts) do | ||
| if agent_reported_linear_delivery_blocker?(agent_message_text) and | ||
| not agent_reported_unresolved_external_blocker?(agent_message_text) and | ||
| tracker_supports_writes?(tracker) do | ||
| comments = call_tracker(tracker, :list_issue_comments, [issue.identifier]) | ||
| comment_id = existing_workpad_comment_id(comments) | ||
|
|
||
| body = | ||
| fallback_workpad_body(issue, agent_message_text, Keyword.fetch!(opts, :workspace_path)) | ||
|
|
||
| call_tracker(tracker, :save_issue_comment, [ | ||
| issue.identifier, | ||
| body, | ||
| [comment_id: comment_id] | ||
| ]) | ||
|
|
||
| call_tracker(tracker, :save_issue_state, [ | ||
| issue.identifier, | ||
| Keyword.fetch!(opts, :handoff_state) | ||
| ]) | ||
|
|
||
| true | ||
| else | ||
| false | ||
| end | ||
| rescue | ||
| _ -> false | ||
| end |
There was a problem hiding this comment.
Don’t hide partial tracker writes behind rescue _ -> false.
If save_issue_comment/3 succeeds and save_issue_state/2 fails, this returns false and the turn loop carries on as if nothing happened. That leaves the tracker in a half-delivered state, which is worse than an explicit failure. Fail the fallback path loudly, or return a tagged error that halts the run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/agent_runner.ex` around lines 407 - 434, The
try_delivery_fallback function currently swallows all errors with rescue _ ->
false which can hide partial writes (e.g., save_issue_comment/3 succeeds but
save_issue_state/2 fails); change try_delivery_fallback to stop rescuing
everything and instead perform the tracker calls in a with/chain so failures
propagate or return a tagged error (e.g., {:error, reason})—use
call_tracker(:save_issue_comment, ...) and then call_tracker(:save_issue_state,
...) inside a with that returns true on success and returns or bubbles up the
actual error from call_tracker on failure; remove the blanket rescue and ensure
callers handle the {:error, reason} or allow the exception to propagate so
half-deliveries are not silently ignored.
| defp receive_response(session, request_id, timeout_ms) do | ||
| {msg, session} = read_message(session, timeout_ms) | ||
|
|
||
| cond do | ||
| Map.get(msg, "id") == request_id and !Map.has_key?(msg, "method") -> | ||
| if Map.has_key?(msg, "error") do | ||
| raise Error, code: :response_error, message: Jason.encode!(msg["error"]) | ||
| end | ||
|
|
||
| {Map.get(msg, "result", %{}), session} | ||
|
|
||
| Map.has_key?(msg, "method") and Map.has_key?(msg, "id") -> | ||
| session = handle_server_request(session, msg) | ||
| receive_response(session, request_id, timeout_ms) | ||
|
|
||
| Map.has_key?(msg, "method") -> | ||
| session = handle_notification(session, msg) | ||
| receive_response(session, request_id, timeout_ms) | ||
|
|
||
| true -> | ||
| receive_response(session, request_id, timeout_ms) | ||
| end |
There was a problem hiding this comment.
Use a real deadline for request waits.
receive_response/3 reuses the full timeout_ms on every recursive call. If the app-server keeps sending notifications but never answers the request, initialize and turn/start can wait forever. Carry a monotonic deadline and recompute the remaining budget on each loop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/codex_client.ex` around lines 218 - 239, receive_response/3
currently reuses the full timeout_ms on each recursive call which can block
forever if many notifications arrive; change it to compute a monotonic deadline
at the first call (using System.monotonic_time(:millisecond) or
:erlang.monotonic_time(:millisecond)) and pass the deadline or remaining_timeout
through recursive calls, recomputing remaining = deadline - now before each
read_message call and before recursing into
handle_server_request/handle_notification branches, and raise or return a
timeout error when remaining <= 0; update call sites that invoke
receive_response (e.g., initialize and turn/start) to pass the initial
timeout_ms as the remaining budget or new deadline as needed so read_message and
recursion never reuse the full original timeout repeatedly.
| receive do | ||
| {port, {:data, data}} when port == session.port -> | ||
| if byte_size(data) > Utils.jsonl_read_limit_bytes() do | ||
| raise Error, | ||
| code: :response_error, | ||
| message: "app-server JSONL message exceeded reader limit" | ||
| end | ||
|
|
||
| read_message(%{session | buffer: session.buffer <> data}, timeout_ms) | ||
|
|
There was a problem hiding this comment.
Enforce the JSONL limit on the accumulated buffer.
Line 316 only rejects a single oversized chunk. A peer can send many small fragments without a newline and grow session.buffer without bound, which turns the reader limit into theatre. Check byte_size(session.buffer) + byte_size(data) before concatenating.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/codex_client.ex` around lines 314 - 323, The reader currently
only checks the incoming chunk size but not the accumulated buffer; update the
receive branch in read_message to compute combined_size =
byte_size(session.buffer) + byte_size(data) and compare that against
Utils.jsonl_read_limit_bytes() before concatenating, raising Error (code:
:response_error, message: "app-server JSONL message exceeded reader limit") if
the combined size exceeds the limit; ensure you still use read_message(%{session
| buffer: session.buffer <> data}, timeout_ms) when the combined size is within
the limit.
| try do | ||
| {result, session} = | ||
| CodexClient.run_turn(session, classification_prompt(issue), capture_agent_text: true) | ||
|
|
||
| CodexClient.stop_session(session) | ||
| data = parse_classifier_json(result.agent_message_text) | ||
| needed = data["coding_context_needed"] || data["is_coding_task"] | ||
|
|
||
| unless is_boolean(needed) do | ||
| raise ArgumentError, "classifier JSON missing boolean coding_context_needed" | ||
| end | ||
|
|
||
| confidence = | ||
| case Utils.to_float(data["confidence"]) do | ||
| nil -> nil | ||
| value -> value |> max(0.0) |> min(1.0) | ||
| end | ||
|
|
||
| reason = if data["reason"], do: Utils.truncate(data["reason"], 500) | ||
|
|
||
| %CodingClassification{ | ||
| is_coding_task: needed, | ||
| source: "llm", | ||
| confidence: confidence, | ||
| reason: reason | ||
| } | ||
| after | ||
| CodexClient.stop_session(session) | ||
| end |
There was a problem hiding this comment.
Stop the classifier session once, not twice.
On the success path you call CodexClient.stop_session(session) before parsing the result, and then again in after. If stop_session/1 is not idempotent, cleanup can raise and mask a valid classification. Keep the after; drop the inline stop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/coding_context.ex` around lines 196 - 224, The code currently
calls CodexClient.stop_session(session) twice (once immediately after
CodexClient.run_turn and again in the after block), which can cause errors if
stop_session/1 is not idempotent; remove the inline call to
CodexClient.stop_session(session) that appears right after
CodexClient.run_turn(session, ...) and keep the cleanup in the after block so
the session is stopped only once while leaving the rest of the logic
(parse_classifier_json(result.agent_message_text), Utils.to_float handling, and
returning the %CodingClassification{}) unchanged.
| def mtime(path) do | ||
| case File.stat(path, time: :nanosecond) do | ||
| {:ok, stat} -> stat.mtime | ||
| _ -> nil |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In current Elixir (and its underlying Erlang/OTP), what are the valid :timevalues forFile.stat/2? Is time: :nanosecond supported?
💡 Result:
Valid :time values for File.stat/2 (and therefore what Elixir passes through to Erlang/OTP’s :file.read_file_info/2) are exactly: :local, :universal, and :posix. No, time: :nanosecond is not supported. When you use :posix, the timestamps are returned as integer seconds since (or before) the Unix epoch; :local/:universal return {{date, time}}-style calendar datetimes (not a nanosecond integer).
Citations:
- 1: https://hexdocs.pm/elixir/File.html
- 2: https://hexdocs.pm/elixir/File.Stat.html
- 3: https://erlang.org/doc/man/file.html
🏁 Script executed:
cat -n lib/symphony/workflow.ex | sed -n '85,100p'Repository: CarettaAI/caretta-symphony
Length of output: 477
File.stat/2 doesn't support time: :nanosecond; function always returns nil.
The :nanosecond option is invalid—Elixir/OTP only accepts :local, :universal, and :posix. This causes silent failure: the File.stat call errors, matching the catch-all clause and returning nil. Reload and change detection are broken.
Fix
def mtime(path) do
- case File.stat(path, time: :nanosecond) do
- {:ok, stat} -> stat.mtime
+ case File.stat(path, time: :posix) do
+ {:ok, stat} -> System.convert_time_unit(stat.mtime, :second, :nanosecond)
_ -> nil
end
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def mtime(path) do | |
| case File.stat(path, time: :nanosecond) do | |
| {:ok, stat} -> stat.mtime | |
| _ -> nil | |
| def mtime(path) do | |
| case File.stat(path, time: :posix) do | |
| {:ok, stat} -> System.convert_time_unit(stat.mtime, :second, :nanosecond) | |
| _ -> nil | |
| end | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/workflow.ex` around lines 92 - 95, The mtime/1 function calls
File.stat with an invalid option (:nanosecond) causing the call to fail and
return nil; update the call in mtime/1 (in lib/symphony/workflow.ex) to use a
valid time option such as :posix (e.g., File.stat(path, time: :posix)) or remove
the time option entirely, keep the {:ok, stat} -> stat.mtime match, and ensure
the fallback still returns nil on error so reload/change detection works again.
| if File.exists?(repo_path) do | ||
| unless File.dir?(repo_path), | ||
| do: | ||
| raise(Error, | ||
| code: :repo_path_not_directory, | ||
| message: "repo path exists and is not a directory: #{repo_path}" | ||
| ) | ||
|
|
||
| unless repo_checkout_matches?(repo_path, repo_config) do | ||
| raise Error, | ||
| code: :repo_checkout_mismatch, | ||
| message: | ||
| "repo path exists but remote does not match #{repo_config.slug}: #{repo_path}" | ||
| end | ||
|
|
||
| current_branch = | ||
| git_output( | ||
| repo_path, | ||
| ["branch", "--show-current"], | ||
| timeout_ms, | ||
| :repo_branch_read_failed, | ||
| "failed reading current branch for #{repo_config.slug}" | ||
| ) | ||
|
|
||
| install_pre_push_guard(repo_path, expected_branch) | ||
|
|
||
| %{ | ||
| "base_branch" => base_branch, | ||
| "base_ref" => "origin/#{base_branch}", | ||
| "base_sha" => nil, | ||
| "expected_branch" => expected_branch, | ||
| "expected_ref" => "refs/heads/#{expected_branch}", | ||
| "current_branch" => current_branch, | ||
| "branch_prepared" => false, | ||
| "pre_push_guard" => true, | ||
| "remote_url" => git_remote(repo_path) | ||
| } |
There was a problem hiding this comment.
Reused workspaces are not put back on the expected branch.
In the existing-checkout path you verify the remote, then keep whatever branch happens to be checked out. If the workspace was left on main, detached HEAD, or another issue branch, the agent edits the wrong branch and only discovers it at push time via the hook. This path needs to validate or re-checkout expected_branch as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/workspace.ex` around lines 241 - 277, The existing-checkout path
currently leaves the workspace on whatever branch is checked out
(current_branch) which can be wrong; update the block after computing
current_branch (involving git_output, repo_path, repo_config, expected_branch,
and install_pre_push_guard) to validate that current_branch equals
expected_branch and, if not, programmatically check out the expected branch:
attempt a normal git checkout of expected_branch via git_output and, if that
fails (branch missing), create/reset the local branch to track
origin/expected_branch (e.g., git checkout -B expected_branch
origin/expected_branch) so the returned map's "current_branch" is the
expected_branch and "branch_prepared" remains false until further prep.
| - You run under the same macOS user context as Symphony. Before declaring missing non-GitHub auth, inspect configured local auth and secret sources without printing secret values: | ||
| - `which supabase && supabase projects list` | ||
| - `which aws && aws sts get-caller-identity` | ||
| - local repo `.env*` files, Vercel env, AWS Secrets Manager/SSM names, Supabase project links, and connected MCP tools when relevant. | ||
| - Never paste secret values into Linear, PRs, terminal summaries, or final messages. Load credentials into the command environment or an untracked temporary file only when required for the operation. |
There was a problem hiding this comment.
Don’t point the agent at raw .env* files.
Line 165 is too loose. In an unattended run, “inspect local repo .env* files” is how secret values end up in tool transcripts. Restrict this to existence checks and variable names only, with explicit redaction.
Suggested wording
-- local repo `.env*` files, Vercel env, AWS Secrets Manager/SSM names, Supabase project links, and connected MCP tools when relevant.
+- local repo `.env*` filenames and variable names only (never values), Vercel env names, AWS Secrets Manager/SSM secret names, Supabase project links, and connected MCP tools when relevant.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - You run under the same macOS user context as Symphony. Before declaring missing non-GitHub auth, inspect configured local auth and secret sources without printing secret values: | |
| - `which supabase && supabase projects list` | |
| - `which aws && aws sts get-caller-identity` | |
| - local repo `.env*` files, Vercel env, AWS Secrets Manager/SSM names, Supabase project links, and connected MCP tools when relevant. | |
| - Never paste secret values into Linear, PRs, terminal summaries, or final messages. Load credentials into the command environment or an untracked temporary file only when required for the operation. | |
| - You run under the same macOS user context as Symphony. Before declaring missing non-GitHub auth, inspect configured local auth and secret sources without printing secret values: | |
| - `which supabase && supabase projects list` | |
| - `which aws && aws sts get-caller-identity` | |
| - local repo `.env*` filenames and variable names only (never values), Vercel env names, AWS Secrets Manager/SSM secret names, Supabase project links, and connected MCP tools when relevant. | |
| - Never paste secret values into Linear, PRs, terminal summaries, or final messages. Load credentials into the command environment or an untracked temporary file only when required for the operation. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WORKFLOW.linear-mcp.example.md` around lines 162 - 166, Update the guidance
that currently says "inspect local repo `.env*` files" to explicitly restrict
checks to only the presence of `.env*` files and the names of variables (no file
contents or values), require explicit redaction of any discovered variable
values, and forbid opening or pasting raw `.env*` contents into agent/tool
transcripts; ensure the revised sentence references `.env*` and variable name
checks, instructs to only log existence and variable keys, and reiterates "Never
paste secret values" so tools like terminal summaries, Linear, or PRs never
include secret values.
| - For Supabase/Postgres data migrations, a PR or migration script alone is not completion. Record dry-run output and either apply output or a concrete verified reason the data operation must not be run. | ||
| - If the issue explicitly asks to move, copy, backfill, delete, or repair production rows or cloud resources, do not move it to `In Review` just because code was written. Move it to `In Review` only after the operation has been executed and verified, or after the requester explicitly converts the issue to a code-only preparatory task. |
There was a problem hiding this comment.
Record evidence, not raw data-operation output.
Lines 167-168 and Line 202 ask for dry-run/apply output to be recorded. For production data work, that can include row contents, identifiers, or other customer data. The workflow should require redacted evidence instead.
Suggested wording
-- For Supabase/Postgres data migrations, a PR or migration script alone is not completion. Record dry-run output and either apply output or a concrete verified reason the data operation must not be run.
+- For Supabase/Postgres data migrations, a PR or migration script alone is not completion. Record redacted execution evidence only (command, timestamp, affected counts/checksums, verification query results) and either a verified apply summary or a concrete verified reason the data operation must not be run. Never paste raw row data or sensitive output.
-- For credentialed data or cloud operations, the requested operation is executed and verified, with dry-run/apply output or read-only verification recorded. A code-only helper script is not enough unless the requester explicitly asked only for a helper script.
+- For credentialed data or cloud operations, the requested operation is executed and verified, with redacted execution evidence or read-only verification recorded. A code-only helper script is not enough unless the requester explicitly asked only for a helper script.Also applies to: 202-202
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WORKFLOW.linear-mcp.example.md` around lines 167 - 168, Update the two
workflow bullets that currently require "dry-run/apply output" (the paragraph
starting "For Supabase/Postgres data migrations..." and the second instance at
line 202) to require "redacted evidence" instead of raw data: change wording to
insist on redacted dry-run/apply logs or summarized verification artifacts that
omit or mask row contents, identifiers, PII, and any customer data, and include
examples of acceptable evidence (screenshots with redactions, checksums/counts,
audit IDs, or anonymized diffs) and an explicit note that full raw outputs must
never be uploaded to the issue tracker; ensure both occurrences (the existing
migration bullet and the later rule about moving issues to In Review) reference
"redacted evidence" and the same acceptable-evidence examples so reviewers know
what to expect.
| mcp_command: /Applications/Codex.app/Contents/Resources/codex app-server | ||
| mcp_server: codex_apps |
There was a problem hiding this comment.
Committed workflow leaks operator-specific local topology.
Hard-coded /Users/omarelamin/... paths and machine-bound command locations expose personal/workstation metadata and make the default workflow non-portable. Keep this file generic; move local values to ignored overlays (WORKFLOW.local.md, env interpolation, or secrets-managed config).
Also applies to: 38-39, 61-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WORKFLOW.md` around lines 12 - 13, The WORKFLOW.md currently embeds
operator-specific/local values (e.g., mcp_command and mcp_server using
absolute/local paths), making the workflow non-portable; update WORKFLOW.md to
replace any hard-coded local paths and machine-bound commands (mcp_command) and
host identifiers (mcp_server) with generic placeholders or variable references,
move operator-specific overrides into an ignored overlay like WORKFLOW.local.md
or use environment interpolation/placeholders and document the expected env vars
or secret keys so local values are not committed.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
WORKFLOW.md (1)
1-149:⚠️ Potential issue | 🟠 MajorRemove hard-coded operator workstation paths/commands from committed workflow.
This workflow is currently non-portable and leaks operator-specific local topology:
- Line 12 hard-codes the Codex binary path (
/Applications/.../codex app-server).- Line 38 hard-codes an absolute skills path (
/Users/omarelamin/...).- Lines 61-149 hard-code
repositories.known[].local_pathfor a specific machine.Move these values out of committed
WORKFLOW.mdinto an ignored overlay (e.g.WORKFLOW.local.md) or a configuration mechanism that the runtime actually supports, and keepWORKFLOW.mdgeneric with placeholders. Also consider quoting YAML scalars that contain spaces (e.g. thecodex app-serverpart) if you keep any command strings in the template.Suggested direction (example placeholders)
- mcp_command: /Applications/Codex.app/Contents/Resources/codex app-server + mcp_command: ${SYMPHONY_MCP_COMMAND} # or move to WORKFLOW.local.md - - /Users/omarelamin/.codex/skills/caretta-architecture + - ${SYMPHONY_SKILLS_DIR}/caretta-architecture - local_path: /Users/omarelamin/Documents/Caretta/repos/caretta-app/Project-N + local_path: ${SYMPHONY_REPOS_ROOT}/caretta-app/Project-N#!/bin/bash # Verify how WORKFLOW.local.md / env interpolation is handled in the runtime. set -euo pipefail # 1) Look for overlay loading logic. rg -n "WORKFLOW\.local\.md|local\.md|overlay" --glob '*.ex' --glob '*.exs' --glob '*.md' # 2) Look for env interpolation support beyond api_key. rg -n "resolve_env_reference|System\.get_env|interpolat" --glob '*.ex' --glob '*.exs' # 3) Specifically check whether mcp_command / skill_paths / local_path are interpolated # or are taken literally from YAML. rg -n "mcp_command|skill_paths|local_path" --glob 'lib/**/*.ex'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@WORKFLOW.md` around lines 1 - 149, The WORKFLOW.md contains operator-specific hard-coded paths/commands (mcp_command and codex.command), an absolute skill path (context.coding.skill_paths), and many repositories.known[].local_path entries; replace these concrete values with generic placeholders (e.g. "${CODEX_CMD}", "${SKILL_PATH}", "${REPO_LOCAL_PATH}") and update mcp_command/codex.command to quoted strings if kept, move actual machine-specific values into a non-committed overlay like WORKFLOW.local.md or environment variables referenced at runtime, add WORKFLOW.local.md to .gitignore, and ensure the runtime uses env interpolation or overlay loading for mcp_command, context.coding.skill_paths, and repositories.known[].local_path so these keys (mcp_command, codex.command, context.coding.skill_paths, repositories.known[].local_path) are populated from local config instead of being committed.lib/symphony/repo_planner.ex (2)
442-453:⚠️ Potential issue | 🟠 MajorResolve cross-role repo conflicts before you emit the plan.
secondary_reposandread_only_context_reposare deduped independently, then only stripped of the primary. If the same slug lands in both lists, the prompt can tell the agent both “you may edit” and “do not edit” for the same repo. That breaks the guardrail.Possible fix
secondary = dedupe_items(secondary) read_only = dedupe_items(read_only) secondary = if primary, do: Enum.reject(secondary, &(&1.slug == primary.slug)), else: secondary + editable_slugs = + secondary + |> Enum.map(& &1.slug) + |> then(fn slugs -> if(primary, do: [primary.slug | slugs], else: slugs) end) + |> MapSet.new() + read_only = - if primary, - do: Enum.reject(read_only, &(&1.slug == primary.slug)), - else: read_only + Enum.reject(read_only, &MapSet.member?(editable_slugs, &1.slug))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/repo_planner.ex` around lines 442 - 453, secondary and read_only are deduped independently which allows the same repo slug to appear in both roles; compute the intersection of slugs between secondary and read_only after dedupe (and after removing primary) and remove overlapping slugs from the editable set (secondary) so a repo cannot be both "may edit" and "do not edit"; update the logic around dedupe_items, the secondary and read_only variables (and the primary-based reject blocks) to perform this cross-role conflict resolution before emitting the plan.
537-539:⚠️ Potential issue | 🟠 Major
truthy?/1still turns"false"intotrue.That is a bad place to be loose. It can flip
coding_task,needs_human, andedit_allowedaway from what the planner actually meant, especially on slightly off-spec JSON.Possible fix
- defp truthy?(false), do: false - defp truthy?(nil), do: false - defp truthy?(_), do: true + defp truthy?(value) when value in [true, false], do: value + defp truthy?(nil), do: false + + defp truthy?(value) when is_binary(value) do + case String.downcase(String.trim(value)) do + "true" -> true + "false" -> false + "1" -> true + "0" -> false + "yes" -> true + "no" -> false + "" -> false + _ -> false + end + end + + defp truthy?(1), do: true + defp truthy?(0), do: false + defp truthy?(_), do: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/repo_planner.ex` around lines 537 - 539, The helper truthy?/1 is too permissive (e.g., "false" string becomes true); modify the truthy?/1 implementation so it explicitly handles common falsey string forms for string inputs (e.g., case-insensitive "false", "0", "nil", "", "no") by normalizing the binary (String.trim |> String.downcase) and returning false for those values, while keeping the existing clauses for false and nil; update the truthy?/1 function to use a binary clause that normalizes and matches these falsey tokens so fields like coding_task, needs_human, and edit_allowed are not flipped by off-spec JSON.lib/symphony/agent_runner.ex (1)
461-488:⚠️ Potential issue | 🟠 MajorDon’t hide partial tracker writes behind
rescue _ -> false.This can write the workpad comment successfully, fail on
save_issue_state/2, and then pretend nothing happened. Half-delivered state is worse than an explicit failure.Possible fix
def try_delivery_fallback(tracker, %Issue{} = issue, agent_message_text, opts) do if agent_reported_linear_delivery_blocker?(agent_message_text) and not agent_reported_unresolved_external_blocker?(agent_message_text) and tracker_supports_writes?(tracker) do comments = call_tracker(tracker, :list_issue_comments, [issue.identifier]) comment_id = existing_workpad_comment_id(comments) body = fallback_workpad_body(issue, agent_message_text, Keyword.fetch!(opts, :workspace_path)) - call_tracker(tracker, :save_issue_comment, [ - issue.identifier, - body, - [comment_id: comment_id] - ]) - - call_tracker(tracker, :save_issue_state, [ - issue.identifier, - Keyword.fetch!(opts, :handoff_state) - ]) - - true + with _ <- + call_tracker(tracker, :save_issue_comment, [ + issue.identifier, + body, + [comment_id: comment_id] + ]), + _ <- + call_tracker(tracker, :save_issue_state, [ + issue.identifier, + Keyword.fetch!(opts, :handoff_state) + ]) do + true + end else false end - rescue - _ -> false end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/agent_runner.ex` around lines 461 - 488, The current try_delivery_fallback function swallows all exceptions with a broad rescue, which can hide partial successes (e.g., comment saved but state save failed); remove the blanket rescue and instead explicitly handle results from call_tracker(:save_issue_comment, ...) and call_tracker(:save_issue_state, ...) (use a with/ case to sequence calls), verify return values or errors from existing_workpad_comment_id, call_tracker(:list_issue_comments, ...), fallback_workpad_body, and tracker_supports_writes?; if the tracker API returns {:ok, ...} / {:error, ...} patterns, propagate or log errors rather than returning true on partial success—ensure the function only returns true after both save_issue_comment and save_issue_state succeed, return {:error, reason} or raise on failure, and add minimal logging around failures instead of rescuing all exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/symphony/agent_runner.ex`:
- Around line 247-250: Move the workspace/config setup into the protected
try/rescue path inside run_issue/5 so exceptions are handled and after_run
always receives a possibly created workspace; specifically, stop calling
ConfigManager.current(config_manager) and
WorkspaceManager.create_for_issue(workspace_manager, issue.identifier) before
the try block and instead call ConfigManager.current/1, WorkspaceManager.new/2
and WorkspaceManager.create_for_issue/2 inside the try, ensuring any raised
errors are caught and the resulting workspace (or nil) is passed into the
existing after_run/.. handling and AgentRunResult flow.
In `@WORKFLOW.md`:
- Around line 228-259: The persistent "## Codex Workpad" stamp currently embeds
machine-specific details in the line
"<hostname>:<abs-workspace-path>@<short-sha>", which must be removed; update the
template used for the Codex Workpad stamp (the stamp line under "## Codex
Workpad") to include only non-sensitive identifiers (for example just
"<short-sha>", or "<rel-workspace-path>@<short-sha>", or a hashed hostname/path)
so that hostname and absolute workspace paths are not written into the
persistent comment; locate the stamp string in the Workpad Template and replace
the sensitive format with one of the non-sensitive alternatives and ensure any
code that generates the stamp produces the new non-sensitive value.
---
Duplicate comments:
In `@lib/symphony/agent_runner.ex`:
- Around line 461-488: The current try_delivery_fallback function swallows all
exceptions with a broad rescue, which can hide partial successes (e.g., comment
saved but state save failed); remove the blanket rescue and instead explicitly
handle results from call_tracker(:save_issue_comment, ...) and
call_tracker(:save_issue_state, ...) (use a with/ case to sequence calls),
verify return values or errors from existing_workpad_comment_id,
call_tracker(:list_issue_comments, ...), fallback_workpad_body, and
tracker_supports_writes?; if the tracker API returns {:ok, ...} / {:error, ...}
patterns, propagate or log errors rather than returning true on partial
success—ensure the function only returns true after both save_issue_comment and
save_issue_state succeed, return {:error, reason} or raise on failure, and add
minimal logging around failures instead of rescuing all exceptions.
In `@lib/symphony/repo_planner.ex`:
- Around line 442-453: secondary and read_only are deduped independently which
allows the same repo slug to appear in both roles; compute the intersection of
slugs between secondary and read_only after dedupe (and after removing primary)
and remove overlapping slugs from the editable set (secondary) so a repo cannot
be both "may edit" and "do not edit"; update the logic around dedupe_items, the
secondary and read_only variables (and the primary-based reject blocks) to
perform this cross-role conflict resolution before emitting the plan.
- Around line 537-539: The helper truthy?/1 is too permissive (e.g., "false"
string becomes true); modify the truthy?/1 implementation so it explicitly
handles common falsey string forms for string inputs (e.g., case-insensitive
"false", "0", "nil", "", "no") by normalizing the binary (String.trim |>
String.downcase) and returning false for those values, while keeping the
existing clauses for false and nil; update the truthy?/1 function to use a
binary clause that normalizes and matches these falsey tokens so fields like
coding_task, needs_human, and edit_allowed are not flipped by off-spec JSON.
In `@WORKFLOW.md`:
- Around line 1-149: The WORKFLOW.md contains operator-specific hard-coded
paths/commands (mcp_command and codex.command), an absolute skill path
(context.coding.skill_paths), and many repositories.known[].local_path entries;
replace these concrete values with generic placeholders (e.g. "${CODEX_CMD}",
"${SKILL_PATH}", "${REPO_LOCAL_PATH}") and update mcp_command/codex.command to
quoted strings if kept, move actual machine-specific values into a non-committed
overlay like WORKFLOW.local.md or environment variables referenced at runtime,
add WORKFLOW.local.md to .gitignore, and ensure the runtime uses env
interpolation or overlay loading for mcp_command, context.coding.skill_paths,
and repositories.known[].local_path so these keys (mcp_command, codex.command,
context.coding.skill_paths, repositories.known[].local_path) are populated from
local config instead of being committed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 13652ac9-5be8-4953-9ce5-7870dbbe9002
📒 Files selected for processing (6)
WORKFLOW.linear-mcp.example.mdWORKFLOW.mdlib/symphony/agent_runner.exlib/symphony/repo_planner.extest/agent_runner_test.exstest/repo_planner_http_test.exs
✅ Files skipped from review due to trivial changes (1)
- WORKFLOW.linear-mcp.example.md
🚧 Files skipped from review as they are similar to previous changes (1)
- test/agent_runner_test.exs
| {_manager, workflow, config} = ConfigManager.current(config_manager) | ||
| workspace_manager = WorkspaceManager.new(config.workspace, config.hooks) | ||
| workspace = WorkspaceManager.create_for_issue(workspace_manager, issue.identifier) | ||
|
|
There was a problem hiding this comment.
Move workspace setup inside the protected path.
Right now ConfigManager.current/1 and WorkspaceManager.create_for_issue/2 happen before the try. If either raises, run_issue/5 skips the structured AgentRunResult path, and a partially created workspace never reaches after_run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/agent_runner.ex` around lines 247 - 250, Move the
workspace/config setup into the protected try/rescue path inside run_issue/5 so
exceptions are handled and after_run always receives a possibly created
workspace; specifically, stop calling ConfigManager.current(config_manager) and
WorkspaceManager.create_for_issue(workspace_manager, issue.identifier) before
the try block and instead call ConfigManager.current/1, WorkspaceManager.new/2
and WorkspaceManager.create_for_issue/2 inside the try, ensuring any raised
errors are caught and the resulting workspace (or nil) is passed into the
existing after_run/.. handling and AgentRunResult flow.
| ## Workpad Template | ||
|
|
||
| Keep this structure and edit it in place: | ||
|
|
||
| ````md | ||
| ## Codex Workpad | ||
|
|
||
| ```text | ||
| <hostname>:<abs-workspace-path>@<short-sha> | ||
| ``` | ||
|
|
||
| ### Plan | ||
|
|
||
| - [ ] 1. Parent task | ||
| - [ ] 1.1 Child task | ||
|
|
||
| ### Acceptance Criteria | ||
|
|
||
| - [ ] Criterion | ||
|
|
||
| ### Validation | ||
|
|
||
| - [ ] `<command>` - result | ||
|
|
||
| ### Notes | ||
|
|
||
| - <timestamped concise note> | ||
|
|
||
| ### Confusions | ||
|
|
||
| - <only include when something was unclear> | ||
| ```` |
There was a problem hiding this comment.
Avoid embedding absolute workspace paths + hostname in the persistent Workpad stamp.
Line 236 (<hostname>:<abs-workspace-path>@<short-sha>) will cause Symphony to write machine-specific details into the persistent ## Codex Workpad comment. Even if it’s not “secrets”, this is still sensitive operator topology (and can vary per environment), and it’s unnecessary for runtime correctness.
Suggest switching to a non-sensitive stamp (e.g. short SHA only, or workspace-relative path, or hashing hostname/path), and keep anything absolute/local out of the comment body.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@WORKFLOW.md` around lines 228 - 259, The persistent "## Codex Workpad" stamp
currently embeds machine-specific details in the line
"<hostname>:<abs-workspace-path>@<short-sha>", which must be removed; update the
template used for the Codex Workpad stamp (the stamp line under "## Codex
Workpad") to include only non-sensitive identifiers (for example just
"<short-sha>", or "<rel-workspace-path>@<short-sha>", or a hashed hostname/path)
so that hostname and absolute workspace paths are not written into the
persistent comment; locate the stamp string in the Workpad Template and replace
the sensitive format with one of the non-sensitive alternatives and ensure any
code that generates the stamp produces the new non-sensitive value.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
lib/symphony/orchestrator.ex (2)
1255-1259:⚠️ Potential issue | 🟠 MajorUse the injected runner in the sync path too.
This still hard-codes
Symphony.AgentRunner, sotick/1and sync retries ignore the:agent_runneroverride and can diverge from the async path.Suggested fix
+ runner_module = orchestrator.agent_runner + runner = apply(runner_module, :new, [orchestrator.config_manager, tracker]) + result = - Symphony.AgentRunner.new(orchestrator.config_manager, tracker) - |> Symphony.AgentRunner.run_issue(issue, attempt, fn issue_id, event -> - Agent.update(holder, &handle_codex_event(&1, issue_id, event)) - end) + apply(runner_module, :run_issue, [ + runner, + issue, + attempt, + fn issue_id, event -> + Agent.update(holder, &handle_codex_event(&1, issue_id, event)) + end + ])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/orchestrator.ex` around lines 1255 - 1259, The sync path is hard-coding Symphony.AgentRunner in the call that builds and runs the runner, so tick/1 and sync retry paths ignore the injected :agent_runner override; change the code that calls Symphony.AgentRunner.new(...)->... to use the injected runner from the orchestrator (the same agent_runner used by the async path) — i.e., obtain the agent runner via orchestrator.agent_runner (or the equivalent field) and call its new and run_issue functions (preserving the existing tracker, issue, attempt and the Agent.update(... &handle_codex_event/3 callback) invocation) so tick/1, sync retries, and async paths all use the same injected runner implementation.
1305-1321:⚠️ Potential issue | 🟠 MajorMirror the async retry gate before redispatching persisted retries.
This branch only checks active state. A due retry can therefore be redispatched after required labels were removed or a
Todoblocker reappeared, and the fallback issue fromrefresh_retry_issue/2can slip through with blank fields. In the discard path,claimedis never cleared, so the issue can stay permanently undispatchable.Suggested fix
- if MapSet.member?( - TrackerConfig.active_state_set(config.tracker), - Utils.normalize_state(issue.state) - ) do - dispatch_issue_sync(acc, issue, tracker, retry.attempt) - else - persist_state(acc) - end + if MapSet.member?( + TrackerConfig.active_state_set(config.tracker), + Utils.normalize_state(issue.state) + ) and + is_dispatch_eligible(acc, issue, config, ignore_claimed_issue_id: retry.issue_id) do + dispatch_issue_sync(acc, issue, tracker, retry.attempt) + else + state = %{ + acc.state + | claimed: MapSet.delete(acc.state.claimed, retry.issue_id) + } + + persist_state(%{acc | state: state}) + end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/orchestrator.ex` around lines 1305 - 1321, Before redispatching persisted retries, re-evaluate the full async-retry gate (not just active state) and clear any claimed marker when we fall back to persisting: add the same additional predicate used by the async retry path (the labels/blocker checks applied elsewhere) to the existing MapSet.member?(TrackerConfig.active_state_set(config.tracker), Utils.normalize_state(issue.state)) guard so dispatch_issue_sync(acc, issue, tracker, retry.attempt) only runs if both active-state and async-retry conditions pass, and in the else/persist branch update acc.state.retry_attempts for retry.issue_id to remove or clear the :claimed flag (so the retry isn’t permanently blocked) before calling persist_state(acc).
🧹 Nitpick comments (1)
lib/symphony/orchestrator.ex (1)
128-132: Drop the second snapshot publish on this hot path.
publish_snapshot_cache/1runs twice for every Codex event here. That means two full snapshot builds and two ETS writes on the busiest path in the module.Suggested simplification
def handle_cast({:codex_event, issue_id, event}, orchestrator) do orchestrator = handle_codex_event(orchestrator, issue_id, event) orchestrator = maybe_schedule_summary(orchestrator, issue_id) - orchestrator = publish_snapshot_cache(orchestrator) {:noreply, publish_snapshot_cache(orchestrator)} end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/symphony/orchestrator.ex` around lines 128 - 132, The code calls publish_snapshot_cache/1 twice in handle_cast for {:codex_event,...}; remove the duplicate so the snapshot is built and written only once: assign orchestrator = handle_codex_event(orchestrator, issue_id, event), then orchestrator = maybe_schedule_summary(orchestrator, issue_id), then orchestrator = publish_snapshot_cache(orchestrator) and return {:noreply, orchestrator} (i.e., eliminate the extra publish_snapshot_cache(orchestrator) in the final tuple).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/symphony/orchestrator.ex`:
- Around line 1253-1263: The holder Agent started via Agent.start_link(fn ->
orchestrator end) must always be stopped even if
Symphony.AgentRunner.run_issue/4 raises; wrap the run_issue call and subsequent
Agent.get/handle_worker_done logic in a try ... after block (or equivalent) so
Agent.stop(holder) is invoked in the after clause. Keep the existing update
callback (handle_codex_event/3) and result handling (handle_worker_done/3) but
ensure Agent.stop(holder) is moved into the guaranteed cleanup path so the
holder process cannot leak on errors.
- Around line 239-249: The handler handle_info({ref, {:summary_result, ...}},
orchestrator) currently applies late summary results even when meta is nil
(meaning the task timed out/was removed), so change it to skip processing when
meta is nil: after {meta, summary_tasks} = Map.pop(orchestrator.summary_tasks,
ref) check if meta is nil and if so just set orchestrator = %{orchestrator |
summary_tasks: summary_tasks} and return {:noreply, orchestrator} (without
calling apply_summary_result or publish_snapshot_cache); only call
apply_summary_result(issue_id, activity_revision, result) |>
publish_snapshot_cache() when meta is present. Ensure you still demonitor and
cancel_timer when meta exists as currently done.
---
Duplicate comments:
In `@lib/symphony/orchestrator.ex`:
- Around line 1255-1259: The sync path is hard-coding Symphony.AgentRunner in
the call that builds and runs the runner, so tick/1 and sync retry paths ignore
the injected :agent_runner override; change the code that calls
Symphony.AgentRunner.new(...)->... to use the injected runner from the
orchestrator (the same agent_runner used by the async path) — i.e., obtain the
agent runner via orchestrator.agent_runner (or the equivalent field) and call
its new and run_issue functions (preserving the existing tracker, issue, attempt
and the Agent.update(... &handle_codex_event/3 callback) invocation) so tick/1,
sync retries, and async paths all use the same injected runner implementation.
- Around line 1305-1321: Before redispatching persisted retries, re-evaluate the
full async-retry gate (not just active state) and clear any claimed marker when
we fall back to persisting: add the same additional predicate used by the async
retry path (the labels/blocker checks applied elsewhere) to the existing
MapSet.member?(TrackerConfig.active_state_set(config.tracker),
Utils.normalize_state(issue.state)) guard so dispatch_issue_sync(acc, issue,
tracker, retry.attempt) only runs if both active-state and async-retry
conditions pass, and in the else/persist branch update acc.state.retry_attempts
for retry.issue_id to remove or clear the :claimed flag (so the retry isn’t
permanently blocked) before calling persist_state(acc).
---
Nitpick comments:
In `@lib/symphony/orchestrator.ex`:
- Around line 128-132: The code calls publish_snapshot_cache/1 twice in
handle_cast for {:codex_event,...}; remove the duplicate so the snapshot is
built and written only once: assign orchestrator =
handle_codex_event(orchestrator, issue_id, event), then orchestrator =
maybe_schedule_summary(orchestrator, issue_id), then orchestrator =
publish_snapshot_cache(orchestrator) and return {:noreply, orchestrator} (i.e.,
eliminate the extra publish_snapshot_cache(orchestrator) in the final tuple).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e54353fc-90b1-413a-8230-11bf3e4272d2
📒 Files selected for processing (2)
lib/symphony/orchestrator.extest/orchestrator_test.exs
🚧 Files skipped from review as they are similar to previous changes (1)
- test/orchestrator_test.exs
| def handle_info({ref, {:summary_result, issue_id, activity_revision, result}}, orchestrator) | ||
| when is_reference(ref) do | ||
| Process.demonitor(ref, [:flush]) | ||
| {meta, summary_tasks} = Map.pop(orchestrator.summary_tasks, ref) | ||
| if meta && meta.timer_ref, do: Process.cancel_timer(meta.timer_ref) | ||
| orchestrator = %{orchestrator | summary_tasks: summary_tasks} | ||
|
|
||
| {:noreply, | ||
| orchestrator | ||
| |> apply_summary_result(issue_id, activity_revision, result) | ||
| |> publish_snapshot_cache()} |
There was a problem hiding this comment.
Ignore summary results once the timeout path has already won.
If Line 175 already removed this ref and recorded a timeout, meta is nil here but the late result is still applied. A raced completion can therefore overwrite the timeout error and make the timeout meaningless.
Suggested fix
def handle_info({ref, {:summary_result, issue_id, activity_revision, result}}, orchestrator)
when is_reference(ref) do
Process.demonitor(ref, [:flush])
{meta, summary_tasks} = Map.pop(orchestrator.summary_tasks, ref)
if meta && meta.timer_ref, do: Process.cancel_timer(meta.timer_ref)
- orchestrator = %{orchestrator | summary_tasks: summary_tasks}
-
- {:noreply,
- orchestrator
- |> apply_summary_result(issue_id, activity_revision, result)
- |> publish_snapshot_cache()}
+
+ if meta do
+ {:noreply,
+ %{orchestrator | summary_tasks: summary_tasks}
+ |> apply_summary_result(issue_id, activity_revision, result)
+ |> publish_snapshot_cache()}
+ else
+ {:noreply, orchestrator}
+ end
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def handle_info({ref, {:summary_result, issue_id, activity_revision, result}}, orchestrator) | |
| when is_reference(ref) do | |
| Process.demonitor(ref, [:flush]) | |
| {meta, summary_tasks} = Map.pop(orchestrator.summary_tasks, ref) | |
| if meta && meta.timer_ref, do: Process.cancel_timer(meta.timer_ref) | |
| orchestrator = %{orchestrator | summary_tasks: summary_tasks} | |
| {:noreply, | |
| orchestrator | |
| |> apply_summary_result(issue_id, activity_revision, result) | |
| |> publish_snapshot_cache()} | |
| def handle_info({ref, {:summary_result, issue_id, activity_revision, result}}, orchestrator) | |
| when is_reference(ref) do | |
| Process.demonitor(ref, [:flush]) | |
| {meta, summary_tasks} = Map.pop(orchestrator.summary_tasks, ref) | |
| if meta && meta.timer_ref, do: Process.cancel_timer(meta.timer_ref) | |
| if meta do | |
| {:noreply, | |
| %{orchestrator | summary_tasks: summary_tasks} | |
| |> apply_summary_result(issue_id, activity_revision, result) | |
| |> publish_snapshot_cache()} | |
| else | |
| {:noreply, orchestrator} | |
| end | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/orchestrator.ex` around lines 239 - 249, The handler
handle_info({ref, {:summary_result, ...}}, orchestrator) currently applies late
summary results even when meta is nil (meaning the task timed out/was removed),
so change it to skip processing when meta is nil: after {meta, summary_tasks} =
Map.pop(orchestrator.summary_tasks, ref) check if meta is nil and if so just set
orchestrator = %{orchestrator | summary_tasks: summary_tasks} and return
{:noreply, orchestrator} (without calling apply_summary_result or
publish_snapshot_cache); only call apply_summary_result(issue_id,
activity_revision, result) |> publish_snapshot_cache() when meta is present.
Ensure you still demonitor and cancel_timer when meta exists as currently done.
| {:ok, holder} = Agent.start_link(fn -> orchestrator end) | ||
|
|
||
| result = | ||
| Symphony.AgentRunner.new(orchestrator.config_manager, tracker) | ||
| |> Symphony.AgentRunner.run_issue(issue, attempt, fn issue_id, event -> | ||
| Agent.update(holder, &handle_codex_event(&1, issue_id, event)) | ||
| end) | ||
|
|
||
| orchestrator = Agent.get(holder, & &1) | ||
| Agent.stop(holder) | ||
| handle_worker_done(orchestrator, issue.id, result) |
There was a problem hiding this comment.
Always stop the holder agent in an after block.
Agent.stop(holder) only runs on the happy path. If run_issue/4 raises, the GenServer survives but the linked holder process does not get cleaned up, so repeated sync failures leak processes.
Suggested fix
{:ok, holder} = Agent.start_link(fn -> orchestrator end)
- result =
- Symphony.AgentRunner.new(orchestrator.config_manager, tracker)
- |> Symphony.AgentRunner.run_issue(issue, attempt, fn issue_id, event ->
- Agent.update(holder, &handle_codex_event(&1, issue_id, event))
- end)
-
- orchestrator = Agent.get(holder, & &1)
- Agent.stop(holder)
- handle_worker_done(orchestrator, issue.id, result)
+ try do
+ result =
+ Symphony.AgentRunner.new(orchestrator.config_manager, tracker)
+ |> Symphony.AgentRunner.run_issue(issue, attempt, fn issue_id, event ->
+ Agent.update(holder, &handle_codex_event(&1, issue_id, event))
+ end)
+
+ orchestrator = Agent.get(holder, & &1)
+ handle_worker_done(orchestrator, issue.id, result)
+ after
+ Agent.stop(holder)
+ end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/orchestrator.ex` around lines 1253 - 1263, The holder Agent
started via Agent.start_link(fn -> orchestrator end) must always be stopped even
if Symphony.AgentRunner.run_issue/4 raises; wrap the run_issue call and
subsequent Agent.get/handle_worker_done logic in a try ... after block (or
equivalent) so Agent.stop(holder) is invoked in the after clause. Keep the
existing update callback (handle_codex_event/3) and result handling
(handle_worker_done/3) but ensure Agent.stop(holder) is moved into the
guaranteed cleanup path so the holder process cannot leak on errors.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/symphony/cli.ex`:
- Around line 17-30: The OptionParser.parse call currently ignores the {_opts,
args, _invalid} third element so mistyped flags silently proceed; update the
parsing logic (the tuple assignment that binds {_opts, args, _invalid} from
OptionParser.parse) to inspect the third element (currently named _invalid) and,
if it's non-empty, print the usage/help (same output as the existing help path)
and exit with a non-zero status; apply this same fix to the second parse site
around the lines referenced (both places where OptionParser.parse returns {opts,
args, _invalid}) so any invalid CLI flags cause a usage message and non-zero
exit instead of continuing.
In `@lib/symphony/self_heal.ex`:
- Around line 235-266: collect_evidence currently returns raw outputs (patch,
tails, remotes, state) and writes the full patch to disk; create a
redact_sensitive_evidence/1 helper that strips or masks secrets/PII from keys
like "current-working-tree.patch"/patch content, "git_remote", "stdout_tail",
"stderr_tail", and the payload returned from fetch_state/1, then: 1) call this
redactor before File.write!/Path.join(run_dir, "current-working-tree.patch")
(write a redacted patch on disk or omit sensitive hunks), 2) return the redacted
map from collect_evidence, and 3) ensure any caller that injects the evidence
into the repair prompt (the code that consumes collect_evidence) also invokes
redact_sensitive_evidence/1 immediately before constructing the prompt so
nothing unredacted is sent to external models. Use the existing symbols
collect_evidence/4, fetch_state/1, run_shell/3 and command_output/3 to locate
where to apply the redaction.
- Around line 512-531: with_lock currently fails fast on {:error, :eexist} and
never recovers stale locks; modify with_lock (and use lock_path,
SelfHealingConfig) to detect and recover stale lock files by checking the lock
file's mtime (File.stat or File.lstat) and comparing it to a TTL (hardcode a
reasonable default or read from config), remove the lock if it's older than the
TTL, then retry File.open(lock_path, [:write, :exclusive]) once more; ensure you
handle races by treating a subsequent :eexist as a real active lock and return
%RunResult{status: :skipped, ...}, and preserve the existing cleanup in the
after block to File.rm(lock_path) when the lock is acquired.
- Around line 302-334: The reduce_while currently turns repair agent failures
into the same {:error, history} shape as validation failures, hiding the root
cause; update the branch that handles {:error, reason} from repair_fun.(...) to
halt with a distinct error payload (e.g. {:halt, {:error, {:repair_agent,
reason}}}) so agent errors are preserved, then adjust the final case after the
reduce_while to pattern-match and return agent failures verbatim (propagate
{:error, {:repair_agent, reason}}) separately from the validation-failure branch
that still returns {:error, "validation failed after ..."}; use the existing
symbols repair_fun, build_repair_prompt, validate, validation_success?, and
encode_results to locate the logic to change.
- Around line 275-277: The worktree is being created from HEAD instead of the
configured base branch; update the git worktree creation command in
lib/symphony/self_heal.ex so the "git worktree add -B #{shell(branch)}
#{shell(worktree_path)} ..." uses #{shell(config.self_healing.base_branch)} (the
configured base branch) instead of HEAD, ensuring the repair branch (branch) is
created from config.self_healing.base_branch; locate the command near where
branch and worktree_path are used and replace the HEAD token accordingly.
In `@lib/symphony/watchdog.ex`:
- Around line 104-125: The watchdog loops (functions loop and loop_config)
currently call run_once every poll and ignore self_healing.cooldown_ms and
self_healing.max_attempts; modify these loops to carry simple state across
iterations (e.g., last_trigger_time and consecutive_attempts stored on the
manager or passed alongside config), and before invoking run_once check that
either consecutive_attempts < self_healing.max_attempts and (now -
last_trigger_time) >= self_healing.cooldown_ms; if a self-heal is attempted
increment consecutive_attempts and set last_trigger_time, if a poll reports
healthy reset consecutive_attempts to 0 and clear last_trigger_time, and ensure
watchdog_interval(config) still governs sleep; update both loop and loop_config
to implement this short-circuit/cooldown behavior and reference run_once,
watchdog_interval, and the self_healing.{cooldown_ms,max_attempts} config
fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dbc5ea63-685a-4ff1-897e-113703fc4ac5
📒 Files selected for processing (21)
README.mdWORKFLOW.linear-mcp.example.mdWORKFLOW.mddocs/IMPLEMENTATION.mdlaunchd/com.caretta.symphony.local.plistlaunchd/com.caretta.symphony.watchdog.plistlib/symphony/cli.exlib/symphony/config.exlib/symphony/models.exlib/symphony/orchestrator.exlib/symphony/review.exlib/symphony/self_heal.exlib/symphony/tracker.exlib/symphony/watchdog.exscripts/symphony-managed.shtest/orchestrator_test.exstest/review_test.exstest/self_heal_test.exstest/tracker_test.exstest/watchdog_test.exstest/workflow_config_template_test.exs
✅ Files skipped from review due to trivial changes (3)
- launchd/com.caretta.symphony.local.plist
- launchd/com.caretta.symphony.watchdog.plist
- lib/symphony/models.ex
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/symphony/orchestrator.ex
| {opts, args, _invalid} = | ||
| OptionParser.parse(argv, | ||
| switches: [ | ||
| port: :integer, | ||
| log_level: :string, | ||
| once: :boolean, | ||
| watchdog: :boolean, | ||
| self_heal_once: :boolean, | ||
| restart_managed: :boolean, | ||
| reason: :string, | ||
| help: :boolean | ||
| ], | ||
| aliases: [p: :port, h: :help] | ||
| ) |
There was a problem hiding this comment.
Fail fast on invalid CLI flags.
_invalid is ignored, so typoed options can silently start the default runtime path. Reject invalid options with usage + non-zero exit.
Suggested patch
- {opts, args, _invalid} =
+ {opts, args, invalid} =
OptionParser.parse(argv,
switches: [
...
)
workflow_path = List.first(args)
- if Keyword.get(opts, :help, false) do
+ cond do
+ invalid != [] ->
+ IO.puts(:stderr, "Invalid options: #{Enum.map_join(invalid, ", ", fn {k, _v} -> "--#{k}" end)}")
+ IO.puts(:stderr, usage())
+ 1
+
+ Keyword.get(opts, :help, false) ->
IO.puts(usage())
0
- else
+ true ->
try do
manager = ConfigManager.new(workflow_path)Also applies to: 34-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/cli.ex` around lines 17 - 30, The OptionParser.parse call
currently ignores the {_opts, args, _invalid} third element so mistyped flags
silently proceed; update the parsing logic (the tuple assignment that binds
{_opts, args, _invalid} from OptionParser.parse) to inspect the third element
(currently named _invalid) and, if it's non-empty, print the usage/help (same
output as the existing help path) and exit with a non-zero status; apply this
same fix to the second parse site around the lines referenced (both places where
OptionParser.parse returns {opts, args, _invalid}) so any invalid CLI flags
cause a usage message and non-zero exit instead of continuing.
| defp collect_evidence(%ServiceConfig{} = config, reason, run_dir, opts) do | ||
| root = repo_root(config) | ||
| port = config.self_healing.restart_port | ||
| diff = run_shell("git diff --binary HEAD", root, opts) | ||
| File.write!(Path.join(run_dir, "current-working-tree.patch"), diff.output || "") | ||
|
|
||
| %{ | ||
| "reason" => reason, | ||
| "captured_at" => Utils.isoformat_z(Utils.now_utc()), | ||
| "repo_root" => root, | ||
| "workflow_path" => config.workflow_path, | ||
| "git_status" => command_output("git status --short", root, opts), | ||
| "git_head" => command_output("git rev-parse HEAD", root, opts), | ||
| "git_branch" => command_output("git branch --show-current", root, opts), | ||
| "git_remote" => command_output("git remote -v", root, opts), | ||
| "state" => fetch_state(port), | ||
| "process" => | ||
| command_output("ps -ax -o pid,ppid,stat,command | grep '[s]ymphony'", root, opts), | ||
| "listeners" => command_output("lsof -nP -iTCP:#{port} -sTCP:LISTEN", root, opts), | ||
| "stdout_tail" => | ||
| command_output( | ||
| "tail -n 200 /var/tmp/caretta-symphony.out.log 2>/dev/null || true", | ||
| root, | ||
| opts | ||
| ), | ||
| "stderr_tail" => | ||
| command_output( | ||
| "tail -n 200 /var/tmp/caretta-symphony.err.log 2>/dev/null || true", | ||
| root, | ||
| opts | ||
| ) | ||
| } |
There was a problem hiding this comment.
Redact evidence before persisting and before sending it to the repair model.
Current evidence includes command tails, API state payload, remotes, and full patch content, then Line 122 injects it into the repair prompt. That can leak secrets/PII into external model context and disk artefacts.
Also applies to: 104-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/self_heal.ex` around lines 235 - 266, collect_evidence currently
returns raw outputs (patch, tails, remotes, state) and writes the full patch to
disk; create a redact_sensitive_evidence/1 helper that strips or masks
secrets/PII from keys like "current-working-tree.patch"/patch content,
"git_remote", "stdout_tail", "stderr_tail", and the payload returned from
fetch_state/1, then: 1) call this redactor before File.write!/Path.join(run_dir,
"current-working-tree.patch") (write a redacted patch on disk or omit sensitive
hunks), 2) return the redacted map from collect_evidence, and 3) ensure any
caller that injects the evidence into the repair prompt (the code that consumes
collect_evidence) also invokes redact_sensitive_evidence/1 immediately before
constructing the prompt so nothing unredacted is sent to external models. Use
the existing symbols collect_evidence/4, fetch_state/1, run_shell/3 and
command_output/3 to locate where to apply the redaction.
| "git fetch origin #{shell(config.self_healing.base_branch)} --quiet || true", | ||
| "git worktree add -B #{shell(branch)} #{shell(worktree_path)} HEAD" | ||
| ] |
There was a problem hiding this comment.
Use the configured base branch when creating the worktree.
Line 276 creates the repair branch from HEAD, not from self_healing.base_branch. That can produce PRs from the wrong ancestry.
Suggested patch
- commands = [
- "git fetch origin #{shell(config.self_healing.base_branch)} --quiet || true",
- "git worktree add -B #{shell(branch)} #{shell(worktree_path)} HEAD"
- ]
+ base = config.self_healing.base_branch
+ commands = [
+ "git fetch origin #{shell(base)} --quiet",
+ "git worktree add -B #{shell(branch)} #{shell(worktree_path)} #{shell("origin/#{base}")}"
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "git fetch origin #{shell(config.self_healing.base_branch)} --quiet || true", | |
| "git worktree add -B #{shell(branch)} #{shell(worktree_path)} HEAD" | |
| ] | |
| base = config.self_healing.base_branch | |
| commands = [ | |
| "git fetch origin #{shell(base)} --quiet", | |
| "git worktree add -B #{shell(branch)} #{shell(worktree_path)} #{shell("origin/#{base}")}" | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/self_heal.ex` around lines 275 - 277, The worktree is being
created from HEAD instead of the configured base branch; update the git worktree
creation command in lib/symphony/self_heal.ex so the "git worktree add -B
#{shell(branch)} #{shell(worktree_path)} ..." uses
#{shell(config.self_healing.base_branch)} (the configured base branch) instead
of HEAD, ensuring the repair branch (branch) is created from
config.self_healing.base_branch; locate the command near where branch and
worktree_path are used and replace the HEAD token accordingly.
| Enum.reduce_while(1..max_attempts, {:error, []}, fn attempt, {_status, history} -> | ||
| prompt = | ||
| build_repair_prompt( | ||
| reason, | ||
| Map.put(evidence, "validation_history", history), | ||
| attempt, | ||
| max_attempts | ||
| ) | ||
|
|
||
| case repair_fun.(config, worktree_path, prompt, attempt, opts) do | ||
| {:ok, _text} -> | ||
| validation = validate(worktree_path, config.self_healing.validation_commands, opts) | ||
|
|
||
| history = | ||
| history ++ [%{"attempt" => attempt, "validation" => encode_results(validation)}] | ||
|
|
||
| if validation_success?(validation) do | ||
| {:halt, {:ok, attempt, validation}} | ||
| else | ||
| {:cont, {:error, history}} | ||
| end | ||
|
|
||
| {:error, reason} -> | ||
| {:halt, {:error, "repair agent failed: #{reason}"}} | ||
| end | ||
| end) | ||
| |> case do | ||
| {:ok, attempts, validation} -> | ||
| {:ok, attempts, validation} | ||
|
|
||
| {:error, history} -> | ||
| {:error, "validation failed after #{max_attempts} attempt(s): #{Jason.encode!(history)}"} | ||
| end |
There was a problem hiding this comment.
Do not collapse agent failures into “validation failed”.
When repair_fun returns {:error, reason}, it is later reported as validation failed after .... That is the wrong failure mode and obscures root cause.
Suggested patch
- Enum.reduce_while(1..max_attempts, {:error, []}, fn attempt, {_status, history} ->
+ Enum.reduce_while(1..max_attempts, {:validation_failed, []}, fn attempt, {_status, history} ->
...
- {:cont, {:error, history}}
+ {:cont, {:validation_failed, history}}
...
- {:halt, {:error, "repair agent failed: #{reason}"}}
+ {:halt, {:agent_error, reason}}
end
end)
|> case do
{:ok, attempts, validation} ->
{:ok, attempts, validation}
- {:error, history} ->
+ {:agent_error, reason} ->
+ {:error, "repair agent failed: #{reason}"}
+
+ {:validation_failed, history} ->
{:error, "validation failed after #{max_attempts} attempt(s): #{Jason.encode!(history)}"}
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Enum.reduce_while(1..max_attempts, {:error, []}, fn attempt, {_status, history} -> | |
| prompt = | |
| build_repair_prompt( | |
| reason, | |
| Map.put(evidence, "validation_history", history), | |
| attempt, | |
| max_attempts | |
| ) | |
| case repair_fun.(config, worktree_path, prompt, attempt, opts) do | |
| {:ok, _text} -> | |
| validation = validate(worktree_path, config.self_healing.validation_commands, opts) | |
| history = | |
| history ++ [%{"attempt" => attempt, "validation" => encode_results(validation)}] | |
| if validation_success?(validation) do | |
| {:halt, {:ok, attempt, validation}} | |
| else | |
| {:cont, {:error, history}} | |
| end | |
| {:error, reason} -> | |
| {:halt, {:error, "repair agent failed: #{reason}"}} | |
| end | |
| end) | |
| |> case do | |
| {:ok, attempts, validation} -> | |
| {:ok, attempts, validation} | |
| {:error, history} -> | |
| {:error, "validation failed after #{max_attempts} attempt(s): #{Jason.encode!(history)}"} | |
| end | |
| Enum.reduce_while(1..max_attempts, {:validation_failed, []}, fn attempt, {_status, history} -> | |
| prompt = | |
| build_repair_prompt( | |
| reason, | |
| Map.put(evidence, "validation_history", history), | |
| attempt, | |
| max_attempts | |
| ) | |
| case repair_fun.(config, worktree_path, prompt, attempt, opts) do | |
| {:ok, _text} -> | |
| validation = validate(worktree_path, config.self_healing.validation_commands, opts) | |
| history = | |
| history ++ [%{"attempt" => attempt, "validation" => encode_results(validation)}] | |
| if validation_success?(validation) do | |
| {:halt, {:ok, attempt, validation}} | |
| else | |
| {:cont, {:validation_failed, history}} | |
| end | |
| {:error, reason} -> | |
| {:halt, {:agent_error, reason}} | |
| end | |
| end) | |
| |> case do | |
| {:ok, attempts, validation} -> | |
| {:ok, attempts, validation} | |
| {:agent_error, reason} -> | |
| {:error, "repair agent failed: #{reason}"} | |
| {:validation_failed, history} -> | |
| {:error, "validation failed after #{max_attempts} attempt(s): #{Jason.encode!(history)}"} | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/self_heal.ex` around lines 302 - 334, The reduce_while currently
turns repair agent failures into the same {:error, history} shape as validation
failures, hiding the root cause; update the branch that handles {:error, reason}
from repair_fun.(...) to halt with a distinct error payload (e.g. {:halt,
{:error, {:repair_agent, reason}}}) so agent errors are preserved, then adjust
the final case after the reduce_while to pattern-match and return agent failures
verbatim (propagate {:error, {:repair_agent, reason}}) separately from the
validation-failure branch that still returns {:error, "validation failed after
..."}; use the existing symbols repair_fun, build_repair_prompt, validate,
validation_success?, and encode_results to locate the logic to change.
| defp with_lock(%SelfHealingConfig{} = config, fun) do | ||
| lock_path = Path.join(config.workspace_root, "self-heal.lock") | ||
| File.mkdir_p!(Path.dirname(lock_path)) | ||
|
|
||
| case File.open(lock_path, [:write, :exclusive]) do | ||
| {:ok, io} -> | ||
| try do | ||
| IO.write(io, "#{System.os_time(:second)}\n") | ||
| File.close(io) | ||
| fun.() | ||
| after | ||
| File.rm(lock_path) | ||
| end | ||
|
|
||
| {:error, :eexist} -> | ||
| %RunResult{status: :skipped, error: "another self-heal run is active"} | ||
|
|
||
| {:error, reason} -> | ||
| %RunResult{status: :error, error: "failed to create self-heal lock: #{inspect(reason)}"} | ||
| end |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Recover stale lock files, or self-heal can stay disabled indefinitely.
If the process dies before cleanup, self-heal.lock remains and every future run is skipped on :eexist. There is no stale-lock recovery path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/self_heal.ex` around lines 512 - 531, with_lock currently fails
fast on {:error, :eexist} and never recovers stale locks; modify with_lock (and
use lock_path, SelfHealingConfig) to detect and recover stale lock files by
checking the lock file's mtime (File.stat or File.lstat) and comparing it to a
TTL (hardcode a reasonable default or read from config), remove the lock if it's
older than the TTL, then retry File.open(lock_path, [:write, :exclusive]) once
more; ensure you handle races by treating a subsequent :eexist as a real active
lock and return %RunResult{status: :skipped, ...}, and preserve the existing
cleanup in the after block to File.rm(lock_path) when the lock is acquired.
| defp loop(manager, opts) do | ||
| {manager, _changed} = ConfigManager.reload_if_changed(manager) | ||
| {_manager, _workflow, config} = ConfigManager.current(manager) | ||
| log_once(run_once(config, opts)) | ||
| Process.sleep(watchdog_interval(config)) | ||
| loop(manager, opts) | ||
| rescue | ||
| error -> | ||
| IO.puts(:stderr, "Symphony watchdog poll failed: #{Exception.message(error)}") | ||
| Process.sleep(30_000) | ||
| loop(manager, opts) | ||
| end | ||
|
|
||
| defp loop_config(config, opts) do | ||
| log_once(run_once(config, opts)) | ||
| Process.sleep(watchdog_interval(config)) | ||
| loop_config(config, opts) | ||
| rescue | ||
| error -> | ||
| IO.puts(:stderr, "Symphony watchdog poll failed: #{Exception.message(error)}") | ||
| Process.sleep(30_000) | ||
| loop_config(config, opts) |
There was a problem hiding this comment.
Throttle self-heal attempts instead of firing on every unhealthy poll.
The loop never uses self_healing.cooldown_ms or self_healing.max_attempts, so a stale or unreachable service will trigger repair again on the very next poll. Here that is dangerous: self-heal is not a read-only action; it deploys artefacts, restarts the managed process, pushes branches, and opens PRs.
Carry watchdog state between iterations — at least last trigger time and consecutive attempt count — and short-circuit until cooldown expires. Reset the counter after a healthy poll.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/symphony/watchdog.ex` around lines 104 - 125, The watchdog loops
(functions loop and loop_config) currently call run_once every poll and ignore
self_healing.cooldown_ms and self_healing.max_attempts; modify these loops to
carry simple state across iterations (e.g., last_trigger_time and
consecutive_attempts stored on the manager or passed alongside config), and
before invoking run_once check that either consecutive_attempts <
self_healing.max_attempts and (now - last_trigger_time) >=
self_healing.cooldown_ms; if a self-heal is attempted increment
consecutive_attempts and set last_trigger_time, if a poll reports healthy reset
consecutive_attempts to 0 and clear last_trigger_time, and ensure
watchdog_interval(config) still governs sleep; update both loop and loop_config
to implement this short-circuit/cooldown behavior and reference run_once,
watchdog_interval, and the self_healing.{cooldown_ms,max_attempts} config
fields.
|
Superseded by #4, the combined dev -> main integration PR. |
Summary
Validation
mix format --check-formattedmix testmix escript.build./symphony WORKFLOW.mdcurl /api/v1/stateand the local dashboard athttp://127.0.0.1:8765/Notes
mainis blocked by repository rules, so this branch contains the latest commit for PR review.Summary by CodeRabbit
Rewrite
New Features
Documentation
Configuration
Tests