Description
When multiple agents are active and using a local Ollama backend, their LLM requests queue behind each other. Each queued agent waits silently — no tokens flowing, no activity the heartbeat monitor can see — and gets falsely declared Crashed before its request is even reached.
The underlying cause: run_agent_loop_streaming does not call KernelHandle::touch_agent before invoking the LLM, so the heartbeat clock is never reset when a streaming call begins. The non-streaming run_agent_loop has this call (with an explicit comment explaining its purpose); the streaming path does not. The result is that any chat session using the WebChat UI or a WebSocket client can trigger a crash-recovery loop whenever generation takes longer than the configured timeout — which is routine for large local models (qwen3.5:35b, llama3:70b, etc.).
The cascade makes it worse: each auto-recovery re-queues another streaming LLM request to Ollama, which is already saturated, lengthening every other agent's wait and causing them to crash in turn.
Compare the two loops:
crates/openfang-runtime/src/agent_loop.rs:446-450 (non-streaming) — calls k.touch_agent(&agent_id_str) with comment: "Stamp last_active before the (potentially long) LLM call so the heartbeat monitor doesn't flag us as unresponsive mid-iteration."
crates/openfang-runtime/src/agent_loop.rs:1660-1670 (streaming) — no equivalent touch before stream_with_retry(...).
Secondary concern: even the non-streaming path only touches once at iteration start. For providers slower than the full timeout window, a single in-flight call still trips the heartbeat. The streaming path is ideal for finer-grained touches — one per chunk received would make the false-positive mathematically impossible as long as tokens are arriving.
Expected Behavior
An agent actively waiting on or receiving streamed tokens from its LLM provider should not be marked Crashed by the heartbeat monitor. last_active should reflect "last evidence of forward progress", not "last completed iteration".
Steps to Reproduce
- Configure two or more agents with a local Ollama provider (any model where generation exceeds 60s):
[default_model]
provider = "ollama"
model = "qwen3.5:35b"
- Start the daemon:
openfang start --config config.toml.
- Open the WebChat dashboard and send messages to two agents simultaneously.
- Watch logs. Within 180s you will see:
WARN openfang_kernel::heartbeat: Agent is unresponsive agent=<name> inactive_secs=210 timeout_secs=180
WARN openfang_kernel::kernel: Unresponsive Running agent marked as Crashed for recovery
INFO openfang_kernel::kernel: Auto-recovering crashed agent (attempt 1/3)
- Each recovery re-queues another streaming LLM request, compounding Ollama load and perpetuating the loop. Both chats appear frozen in the UI.
Autonomous agents are hit harder: heartbeat_interval_secs * UNRESPONSIVE_MULTIPLIER (default 2) produces timeouts as short as 60s, and the crash loop never resolves because each recovery restarts the same generation.
Proposed Fix
Minimum fix — mirror the non-streaming path:
// crates/openfang-runtime/src/agent_loop.rs, inside run_agent_loop_streaming,
// immediately before the stream_with_retry call at ~line 1660:
if let Some(k) = &kernel {
k.touch_agent(&agent_id_str);
}
Better fix — touch on every streamed chunk. StreamEvent::Delta (or equivalent) already fires per token/chunk; wrapping the stream consumer to call touch_agent on each event eliminates the false-positive entirely while local generation is making progress, and still lets the heartbeat catch a genuinely stalled stream.
Either fix is self-contained to agent_loop.rs plus (for the better fix) the stream pump. No config surface changes required.
Workarounds (current)
Reactive agents — add to config.toml (hot-reload safe):
[heartbeat]
default_timeout_secs = 600
Autonomous agents — default_timeout_secs has no effect. Must raise heartbeat_interval_secs in the agent manifest. No API surface for this; requires container stop + direct openfang.db edit (agents.manifest is MessagePack-encoded).
OpenFang Version
Reproduced on 0.5.10. The same missing touch_agent call is present on 0.6.0 (main @ e6bab99, crates/openfang-runtime/src/agent_loop.rs:1657), so this bug carries forward unchanged — PR #1090 patches both.
Operating System
Linux (x86_64) — Ubuntu 25.10, kernel 6.19.3, Framework Desktop / AMD Ryzen AI Max+ 395. Ollama via Vulkan backend.
Logs
2026-04-20T00:32:30Z WARN openfang_kernel::heartbeat: Agent is crashed — eligible for recovery agent=collector-hand inactive_secs=30
2026-04-20T00:33:04Z WARN openfang_runtime::agent_loop: Max tokens hit (streaming), continuing iteration=0
2026-04-20T00:34:00Z WARN openfang_kernel::heartbeat: Agent is unresponsive agent=collector-hand inactive_secs=89 timeout_secs=60
2026-04-20T00:34:00Z WARN openfang_kernel::kernel: Unresponsive Running agent marked as Crashed for recovery agent=collector-hand inactive_secs=89
2026-04-20T00:35:30Z WARN openfang_kernel::heartbeat: Agent is unresponsive agent="DevOps Engineer" inactive_secs=210 timeout_secs=180
Notes for PR
- Commit style candidate:
fix(runtime): stamp last_active in streaming agent loop to prevent heartbeat false-positives
- Test coverage:
crates/openfang-kernel/src/heartbeat.rs already has test_active_agent_within_timeout_is_ok — extend with a case where a long streaming run elapses > default timeout but touches land in between.
release-fast profile verification:
cargo clippy -p openfang-runtime --all-targets -- -D warnings
cargo test -p openfang-runtime
cargo test -p openfang-kernel
Description
When multiple agents are active and using a local Ollama backend, their LLM requests queue behind each other. Each queued agent waits silently — no tokens flowing, no activity the heartbeat monitor can see — and gets falsely declared
Crashedbefore its request is even reached.The underlying cause:
run_agent_loop_streamingdoes not callKernelHandle::touch_agentbefore invoking the LLM, so the heartbeat clock is never reset when a streaming call begins. The non-streamingrun_agent_loophas this call (with an explicit comment explaining its purpose); the streaming path does not. The result is that any chat session using the WebChat UI or a WebSocket client can trigger a crash-recovery loop whenever generation takes longer than the configured timeout — which is routine for large local models (qwen3.5:35b,llama3:70b, etc.).The cascade makes it worse: each auto-recovery re-queues another streaming LLM request to Ollama, which is already saturated, lengthening every other agent's wait and causing them to crash in turn.
Compare the two loops:
crates/openfang-runtime/src/agent_loop.rs:446-450(non-streaming) — callsk.touch_agent(&agent_id_str)with comment: "Stamp last_active before the (potentially long) LLM call so the heartbeat monitor doesn't flag us as unresponsive mid-iteration."crates/openfang-runtime/src/agent_loop.rs:1660-1670(streaming) — no equivalent touch beforestream_with_retry(...).Secondary concern: even the non-streaming path only touches once at iteration start. For providers slower than the full timeout window, a single in-flight call still trips the heartbeat. The streaming path is ideal for finer-grained touches — one per chunk received would make the false-positive mathematically impossible as long as tokens are arriving.
Expected Behavior
An agent actively waiting on or receiving streamed tokens from its LLM provider should not be marked
Crashedby the heartbeat monitor.last_activeshould reflect "last evidence of forward progress", not "last completed iteration".Steps to Reproduce
openfang start --config config.toml.Autonomous agents are hit harder:
heartbeat_interval_secs * UNRESPONSIVE_MULTIPLIER(default 2) produces timeouts as short as 60s, and the crash loop never resolves because each recovery restarts the same generation.Proposed Fix
Minimum fix — mirror the non-streaming path:
Better fix — touch on every streamed chunk.
StreamEvent::Delta(or equivalent) already fires per token/chunk; wrapping the stream consumer to calltouch_agenton each event eliminates the false-positive entirely while local generation is making progress, and still lets the heartbeat catch a genuinely stalled stream.Either fix is self-contained to
agent_loop.rsplus (for the better fix) the stream pump. No config surface changes required.Workarounds (current)
Reactive agents — add to
config.toml(hot-reload safe):Autonomous agents —
default_timeout_secshas no effect. Must raiseheartbeat_interval_secsin the agent manifest. No API surface for this; requires container stop + directopenfang.dbedit (agents.manifestis MessagePack-encoded).OpenFang Version
Reproduced on
0.5.10. The same missingtouch_agentcall is present on0.6.0(main @ e6bab99,crates/openfang-runtime/src/agent_loop.rs:1657), so this bug carries forward unchanged — PR #1090 patches both.Operating System
Linux (x86_64) — Ubuntu 25.10, kernel 6.19.3, Framework Desktop / AMD Ryzen AI Max+ 395. Ollama via Vulkan backend.
Logs
Notes for PR
fix(runtime): stamp last_active in streaming agent loop to prevent heartbeat false-positivescrates/openfang-kernel/src/heartbeat.rsalready hastest_active_agent_within_timeout_is_ok— extend with a case where a long streaming run elapses > default timeout but touches land in between.release-fastprofile verification: