Streaming agent loop missing touch_agent → heartbeat false-positives on selected local LLMs


## Description

When multiple agents are active and using a local Ollama backend, their LLM requests queue behind each other. Each queued agent waits silently — no tokens flowing, no activity the heartbeat monitor can see — and gets falsely declared `Crashed` before its request is even reached.

The underlying cause: `run_agent_loop_streaming` does not call `KernelHandle::touch_agent` before invoking the LLM, so the heartbeat clock is never reset when a streaming call begins. The non-streaming `run_agent_loop` has this call (with an explicit comment explaining its purpose); the streaming path does not. The result is that any chat session using the WebChat UI or a WebSocket client can trigger a crash-recovery loop whenever generation takes longer than the configured timeout — which is routine for large local models (`qwen3.5:35b`, `llama3:70b`, etc.).

The cascade makes it worse: each auto-recovery re-queues another streaming LLM request to Ollama, which is already saturated, lengthening every other agent's wait and causing them to crash in turn.

Compare the two loops:

- `crates/openfang-runtime/src/agent_loop.rs:446-450` (non-streaming) — calls `k.touch_agent(&agent_id_str)` with comment: *"Stamp last_active before the (potentially long) LLM call so the heartbeat monitor doesn't flag us as unresponsive mid-iteration."*
- `crates/openfang-runtime/src/agent_loop.rs:1660-1670` (streaming) — no equivalent touch before `stream_with_retry(...)`.

Secondary concern: even the non-streaming path only touches **once** at iteration start. For providers slower than the full timeout window, a single in-flight call still trips the heartbeat. The streaming path is ideal for finer-grained touches — one per chunk received would make the false-positive mathematically impossible as long as tokens are arriving.

## Expected Behavior

An agent actively waiting on or receiving streamed tokens from its LLM provider should not be marked `Crashed` by the heartbeat monitor. `last_active` should reflect "last evidence of forward progress", not "last completed iteration".

## Steps to Reproduce

1. Configure two or more agents with a local Ollama provider (any model where generation exceeds 60s):
   ```toml
   [default_model]
   provider = "ollama"
   model = "qwen3.5:35b"
   ```
2. Start the daemon: `openfang start --config config.toml`.
3. Open the WebChat dashboard and send messages to two agents simultaneously.
4. Watch logs. Within 180s you will see:
   ```
   WARN openfang_kernel::heartbeat: Agent is unresponsive agent=<name> inactive_secs=210 timeout_secs=180
   WARN openfang_kernel::kernel: Unresponsive Running agent marked as Crashed for recovery
   INFO openfang_kernel::kernel: Auto-recovering crashed agent (attempt 1/3)
   ```
5. Each recovery re-queues another streaming LLM request, compounding Ollama load and perpetuating the loop. Both chats appear frozen in the UI.

Autonomous agents are hit harder: `heartbeat_interval_secs * UNRESPONSIVE_MULTIPLIER` (default 2) produces timeouts as short as 60s, and the crash loop never resolves because each recovery restarts the same generation.

## Proposed Fix

**Minimum fix** — mirror the non-streaming path:

```rust
// crates/openfang-runtime/src/agent_loop.rs, inside run_agent_loop_streaming,
// immediately before the stream_with_retry call at ~line 1660:
if let Some(k) = &kernel {
    k.touch_agent(&agent_id_str);
}
```

**Better fix** — touch on every streamed chunk. `StreamEvent::Delta` (or equivalent) already fires per token/chunk; wrapping the stream consumer to call `touch_agent` on each event eliminates the false-positive entirely while local generation is making progress, and still lets the heartbeat catch a genuinely stalled stream.

Either fix is self-contained to `agent_loop.rs` plus (for the better fix) the stream pump. No config surface changes required.

## Workarounds (current)

**Reactive agents** — add to `config.toml` (hot-reload safe):
```toml
[heartbeat]
default_timeout_secs = 600
```

**Autonomous agents** — `default_timeout_secs` has no effect. Must raise `heartbeat_interval_secs` in the agent manifest. No API surface for this; requires container stop + direct `openfang.db` edit (`agents.manifest` is MessagePack-encoded).

## OpenFang Version

Reproduced on `0.5.10`. The same missing `touch_agent` call is present on `0.6.0` (`main @ e6bab99`, `crates/openfang-runtime/src/agent_loop.rs:1657`), so this bug carries forward unchanged — PR #1090 patches both.

## Operating System

Linux (x86_64) — Ubuntu 25.10, kernel 6.19.3, Framework Desktop / AMD Ryzen AI Max+ 395. Ollama via Vulkan backend.

## Logs

```
2026-04-20T00:32:30Z WARN openfang_kernel::heartbeat: Agent is crashed — eligible for recovery agent=collector-hand inactive_secs=30
2026-04-20T00:33:04Z WARN openfang_runtime::agent_loop: Max tokens hit (streaming), continuing iteration=0
2026-04-20T00:34:00Z WARN openfang_kernel::heartbeat: Agent is unresponsive agent=collector-hand inactive_secs=89 timeout_secs=60
2026-04-20T00:34:00Z WARN openfang_kernel::kernel: Unresponsive Running agent marked as Crashed for recovery agent=collector-hand inactive_secs=89
2026-04-20T00:35:30Z WARN openfang_kernel::heartbeat: Agent is unresponsive agent="DevOps Engineer" inactive_secs=210 timeout_secs=180
```

## Notes for PR

- Commit style candidate: `fix(runtime): stamp last_active in streaming agent loop to prevent heartbeat false-positives`
- Test coverage: `crates/openfang-kernel/src/heartbeat.rs` already has `test_active_agent_within_timeout_is_ok` — extend with a case where a long streaming run elapses > default timeout but touches land in between.
- `release-fast` profile verification:
  ```bash
  cargo clippy -p openfang-runtime --all-targets -- -D warnings
  cargo test -p openfang-runtime
  cargo test -p openfang-kernel
  ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming agent loop missing touch_agent → heartbeat false-positives on selected local LLMs #1089

Description

Expected Behavior

Steps to Reproduce

Proposed Fix

Workarounds (current)

OpenFang Version

Operating System

Logs

Notes for PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming agent loop missing touch_agent → heartbeat false-positives on selected local LLMs #1089

Description

Description

Expected Behavior

Steps to Reproduce

Proposed Fix

Workarounds (current)

OpenFang Version

Operating System

Logs

Notes for PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions