Skip to content

feat(daemon): default daemon mode, hardcoded wait timeout, session kill#532

Merged
RyderFreeman4Logos merged 4 commits intomainfrom
feat/daemon-default-and-session-kill
Mar 31, 2026
Merged

feat(daemon): default daemon mode, hardcoded wait timeout, session kill#532
RyderFreeman4Logos merged 4 commits intomainfrom
feat/daemon-default-and-session-kill

Conversation

@RyderFreeman4Logos
Copy link
Copy Markdown
Owner

Summary

  • Daemon mode is now the default for csa run — returns SID immediately, runs in background. Use --no-daemon to opt back into blocking mode. Old --daemon flag is kept as hidden no-op.
  • csa session wait timeout hardcoded to 250s--timeout parameter removed. Callers no longer choose wait duration.
  • New csa session kill subcommand — sends SIGTERM to daemon process group, with 5s grace period before SIGKILL escalation. Uses new daemon.pid file written by spawn_daemon.
  • All 10 pattern families updated — removed --daemon flags and session wait --timeout parameters to match new defaults.

Test plan

  • csa run "prompt" returns SID immediately (0.016s) without --daemon flag
  • csa run --no-daemon "prompt" blocks until completion
  • csa session wait --help shows no --timeout parameter
  • csa session kill --session <SID> successfully terminates daemon
  • daemon.pid file written to session directory on spawn
  • All 18 patterns pass weave compile
  • 16/16 tests pass
  • just pre-commit clean

🤖 Generated with Claude Code

RyderFreeman4Logos and others added 3 commits March 31, 2026 06:59
Three behavioral changes:

1. `csa run` now launches in daemon mode by default (returns SID
   immediately). Use `--no-daemon` to opt back into blocking mode.
   The old `--daemon` flag is kept as a hidden no-op for compatibility.

2. `csa session wait` timeout is hardcoded to 250s. The `--timeout`
   parameter has been removed — callers no longer choose wait duration.

3. New `csa session kill` subcommand sends SIGTERM to the daemon
   process group, with 5s grace period before SIGKILL escalation.

All 10 pattern families updated: removed `--daemon` flags and
`session wait --timeout` parameters to match the new defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… kill

spawn_daemon now writes a `daemon.pid` file to the session directory,
which `read_daemon_pid` reads as the primary PID source. The previous
approach (parsing stderr.log for the RPJ directive) failed because the
directive is written to the parent's stderr, not the session's.

Keeps the stderr fallback for legacy sessions spawned before this fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review findings R3/R4: use libc::pid_t instead of bare i32
cast, and check kill() return value with error reporting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ac6da6dcb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cd: Option<String>,
) -> Result<i32> {
/// Hardcoded wait timeout in seconds.
const WAIT_TIMEOUT_SECS: u64 = 250;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore configurable timeout for session wait

Hardcoding WAIT_TIMEOUT_SECS to 250 causes csa session wait to return 124 after ~4 minutes even when the daemon is still running, which regresses long-running delegated tasks. In this same change, workflows such as patterns/pr-bot/workflow.toml still launch runs with --timeout 1800/--timeout 5400 and treat non-zero session wait as failure, so valid runs now fail early and can incorrectly mark bot/rebase gates as failed while work continues in the background.

Useful? React with 👍 / 👎.

Comment on lines +230 to +231
let pgid = -(pid as libc::pid_t);
let rc = unsafe { libc::kill(pgid, libc::SIGTERM) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify daemon identity before killing process group

This sends signals to -pid based solely on daemon.pid, but there is no validation that the PID still belongs to the target session daemon (for example, if the session already ended and the PID was reused). In that case csa session kill can terminate an unrelated process group; adding a session-completion check and process-identity validation before signaling would prevent accidental cross-process kills.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the cli-sub-agent to make daemon mode the default for the run command, introduces a new session kill command, and implements a daemon.pid file for better process tracking. It also removes the configurable timeout from the session wait command in favor of a hardcoded 250-second limit. Feedback highlights several critical issues: the argument forwarding logic loses global flags passed before the run subcommand, the use of unstable Rust let_chains will cause compilation failures on stable toolchains, and the session kill implementation lacks PID validation which could lead to unintended system-wide process termination. Additionally, the hardcoded timeout is noted as a regression for long-running tasks, and the liveness check in the kill command should verify the entire process group rather than just the leader.

Comment on lines +29 to +34
let forwarded_args: Vec<String> = all_args
.iter()
.skip(run_pos + 1)
.filter(|a| *a != "--daemon") // daemon is now default; strip no-op flag
.cloned()
.collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for collecting forwarded_args skips all arguments before the run subcommand. This causes global flags (e.g., --format json) passed before run to be lost when spawning the daemon child. The code should preserve arguments both before and after the run subcommand to ensure the child process inherits the same global configuration, as suggested by the comment on line 26.

Suggested change
let forwarded_args: Vec<String> = all_args
.iter()
.skip(run_pos + 1)
.filter(|a| *a != "--daemon") // daemon is now default; strip no-op flag
.cloned()
.collect();
let mut forwarded_args: Vec<String> = all_args.iter().take(run_pos).skip(1).cloned().collect();
forwarded_args.extend(
all_args
.iter()
.skip(run_pos + 1)
.filter(|a| *a != "--daemon") // daemon is now default; strip no-op flag
.cloned(),
)

Comment on lines +25 to +27
if let Ok(content) = fs::read_to_string(&pid_path)
&& let Ok(pid) = content.trim().parse()
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This code uses let_chains (e.g., if let ... && let ...), which is an unstable feature in Rust (tracking issue #53667). Unless the project is using a nightly toolchain with the #![feature(let_chains)] attribute, this will fail to compile on stable Rust. This pattern is used in several other places in this file (lines 32, 71, 186) and should be replaced with nested if let or match statements for compatibility with stable Rust.

Suggested change
if let Ok(content) = fs::read_to_string(&pid_path)
&& let Ok(pid) = content.trim().parse()
{
if let Ok(content) = fs::read_to_string(&pid_path) {
if let Ok(pid) = content.trim().parse() {
return Some(pid);
}
}

)
})?;

if !is_pid_alive(pid) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The PID read from the session directory should be validated before use in libc::kill. If pid is 0 or 1, negating it to create a process group ID (pgid) can have dangerous side effects. Specifically, kill(-1, SIGTERM) sends the signal to every process the caller has permission to signal (except init), which could lead to unintended system-wide process termination. Please ensure pid > 1 before proceeding.

Suggested change
if !is_pid_alive(pid) {
if pid <= 1 {
anyhow::bail!("Invalid daemon PID {} for session {}", pid, resolved.session_id);
}
if !is_pid_alive(pid) {

cd: Option<String>,
) -> Result<i32> {
/// Hardcoded wait timeout in seconds.
const WAIT_TIMEOUT_SECS: u64 = 250;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the wait timeout to 250 seconds is a significant regression in flexibility. Several patterns updated in this PR (e.g., pr-bot at line 358) explicitly expect to wait up to 10 minutes or more. This change will cause long-running background tasks to be reported as timed out by orchestrator scripts even if the agent is still working correctly. Consider keeping the timeout configurable or setting a much higher default (e.g., 3600s).


// Wait for reaping.
std::thread::sleep(std::time::Duration::from_millis(500));
if is_pid_alive(pid) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The final liveness check is_pid_alive(pid) only verifies the session leader process. If the leader has exited but other processes in the process group are still running, the command will report success while leaving runaway processes. Since the pgid is available, consider using libc::kill(pgid, 0) == 0 (where pgid is the negative value calculated at line 230) to check if the entire process group has been terminated.

Guard against invalid daemon PIDs (0 or 1) which would send signals
to the caller's process group or init. Addresses gemini-code-assist
security finding on PR #532.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@RyderFreeman4Logos
Copy link
Copy Markdown
Owner Author

Bot Review Arbitration — Audit Trail

Reviewed by: CSA adversarial debate (session 01KN24DZR62K8GDT0T85JGD0ZN)
Verdict: 6 false positives dismissed, 1 real issue fixed

Fixed (Category C)

  • gemini SEC-HIGH (PID validation): Added pid <= 1 guard in handle_session_kill to prevent sending signals to init or caller's process group. Fixed in commit 1aa4a6a.

Dismissed False Positives (Category B)

Finding Bot Rationale
250s hardcoded timeout (×3) codex P1, gemini MEDIUM ×2 By explicit user design: user requested wait子命令不应该可以设timeout, 而是写死250s. Pre-production software (rule 019).
PID reuse risk codex P2 Standard Unix: PID reuse within 5s grace is astronomically unlikely. setsid creates unique process group. kill() on dead PID returns ESRCH.
Global flags lost in daemon gemini HIGH Daemon child writes to log files (stdout.log/stderr.log), not parent's stdout. Parent's --format flag is irrelevant for file-captured output.
let_chains unstable gemini HIGH Factual error: let_chains stabilized in Rust 1.87.0. This project uses rust-version = "1.88", compiled with rustc 1.94.1.
Liveness only checks leader gemini MEDIUM Standard Unix: kill(-pgid, SIGTERM) sends to ALL group members. Leader death confirms signal delivery. Orphaned children reparented to init.

All 6 false positives confirmed with HIGH confidence by independent adversarial debate.

@RyderFreeman4Logos
Copy link
Copy Markdown
Owner Author

/gemini review

@RyderFreeman4Logos
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1aa4a6a17f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fi
set +e
GATE_RESULT="$(csa session wait --session "${GATE_SID}" --timeout 5400)"
GATE_RESULT="$(csa session wait --session "${GATE_SID}")"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve long wait windows for delegated workflow steps

This change removes per-call wait budgeting and now always times out csa session wait after ~250s, but this workflow still launches delegated tasks that are explicitly allowed to run much longer (--timeout 5400 plus instructions to wait 5+10 minutes per cycle). In this post-rebase gate path, GATE_RESULT can return non-zero before the delegated daemon finishes, causing REBASE_REVIEW_HAS_ISSUES/FALLBACK_REVIEW_HAS_ISSUES to be set and aborting the gate even when the background task would have succeeded.

Useful? React with 👍 / 👎.

Comment on lines +237 to +238
let pgid = -(pid as libc::pid_t);
let rc = unsafe { libc::kill(pgid, libc::SIGTERM) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard session kill against stale PID reuse

The new kill path sends signals to -pid (entire process group) based only on daemon.pid liveness, without validating that the PID still belongs to this session. If the session already ended and the PID was reused, csa session kill can terminate an unrelated process group owned by the same user, which is a high-impact safety regression for shared/long-lived hosts.

Useful? React with 👍 / 👎.

@RyderFreeman4Logos
Copy link
Copy Markdown
Owner Author

Round 2 Audit Trail

Codex round 2 findings (2 × P1) are identical categories to round 1:

  1. Timeout 250s — by explicit user design (see audit trail above)
  2. PID reuse guard — addressed via pid <= 1 validation + debate confirmation (see audit trail above)

No new issues discovered. All findings covered by existing audit trail. Proceeding to merge.

@RyderFreeman4Logos RyderFreeman4Logos merged commit 65b780e into main Mar 31, 2026
4 of 6 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request changes the default behavior of the csa run command to daemon mode, deprecating the --daemon flag and introducing a --no-daemon option for foreground execution. It also implements a new csa session kill command to terminate daemon sessions using process groups and updates csa session wait to use a hardcoded 250-second timeout. Additionally, the system now writes a daemon.pid file for more reliable process tracking. Feedback was provided to improve the robustness of the session kill grace period by using std::time::Instant instead of a fixed loop count to ensure a more accurate 5-second timeout.

Comment on lines +245 to +251
for _ in 0..50 {
if !is_pid_alive(pid) {
eprintln!("Session {} terminated", resolved.session_id);
return Ok(());
}
std::thread::sleep(std::time::Duration::from_millis(100));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The for loop for the grace period is not guaranteed to last exactly 5 seconds, as it doesn't account for the execution time of is_pid_alive(pid). Using std::time::Instant ensures the total grace period is more accurate, which improves the robustness of the timeout logic.

Suggested change
for _ in 0..50 {
if !is_pid_alive(pid) {
eprintln!("Session {} terminated", resolved.session_id);
return Ok(());
}
std::thread::sleep(std::time::Duration::from_millis(100));
}
let start = std::time::Instant::now();
while start.elapsed() < std::time::Duration::from_secs(5) {
if !is_pid_alive(pid) {
eprintln!("Session {} terminated", resolved.session_id);
return Ok(());
}
std::thread::sleep(std::time::Duration::from_millis(100));
}

RyderFreeman4Logos added a commit that referenced this pull request Mar 31, 2026
Generalize the daemon spawn mechanism (introduced in PR #532 for `csa run`)
to also apply to `csa review` and `csa debate`. All three execution commands
now return a session ID immediately and run in the background by default.
`--no-daemon` opts into blocking mode.

Key changes:
- Add `subcommand` field to `DaemonSpawnConfig` (replaces hardcoded "run")
- Extract `check_daemon_flags()` helper in `run_cmd_daemon.rs`, shared by
  all three commands (run/review/debate) — eliminates inline duplication
- Add daemon flags to `ReviewArgs` and `DebateArgs` (daemon, no_daemon,
  daemon_child, session_id)
- Fix #534: rename `daemon_session_id` → `session_id` to match the
  `--session-id` flag that `spawn_daemon()` passes to the child process
- Propagate session exit code from `result.toml` in `session wait`
- Convert all workflow patterns to daemon+wait idiom:
  `SID=$(csa review ...) && csa session wait --session "$SID"`
- Add `serial_test` crate for env-var-mutating tests (fixes flaky failures)

Design note: `session wait` uses a 250s hardcoded timeout by design — this
returns control to the caller periodically so the caller's KV cache stays
warm. Callers should retry `session wait` after timeout, not treat it as
failure. The daemon child continues running independently of the wait timeout.

Closes #534

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant