Skip to content

WebUI memory leak causes the session to completely freeze #2233

@LineTang

Description

@LineTang

Environment

  • Hermes Agent: hermes-webui (latest, GitHub: nesquena/hermes-webui)
  • OS: WSL2 Ubuntu 26.04 on Windows
  • Hardware: Samsung Galaxy Book Flex (i5-1035G4, 12GB RAM, WSL2 memory limit 2GB)
  • Model: mimo-v2.5-pro via xiaomi provider

Bug Description

After extended use, the WebUI server process (server.py) memory grows to 1GB+ RSS, causing all API endpoints to time out. The frontend then falls back to returning locally cached content — specifically the compression marker — for any user input, making the session appear completely broken.

Steps to Reproduce

  1. Start WebUI on port 7002
  2. Have a long conversation (500+ messages, 800KB session file) with context compression triggered
  3. Continue using the WebUI for ~30+ minutes without restart
  4. Observe that sending any message immediately returns [Your active task list was preserved across context compression] — no API call is made

Evidence

Process memory at time of failure

PID 15005 — started 11:38, observed at ~11:47 (9 minutes uptime)
  %CPU 17.1  %MEM 56.3  VIRT 3.4GB  RSS 1.07GB
  command: python server.py (port 7002)

API timeout

$ curl -s http://127.0.0.1:7002/api/health/agent
# Timed out after 60s — no response

Crash diagnostics (auto-collected by keepalive)

crash-20260514_114711/ — WebUI health check failed, keepalive restarted

Session data (backend is healthy)

  • 11 independent sessions all triggered context compression at message index 104
  • After compression, the backend continued normally — all sessions show valid tool calls and assistant responses for 400+ messages after compression
  • The problem is purely frontend: when server.py is unresponsive, the browser-side code returns cached content without attempting an API call

Affected session

session_20260514_065326_8a3de3.json — 544 messages, 796KB, mimo-v2.5-pro

Expected Behavior

  • WebUI should not leak memory during normal use
  • If the API is unreachable, the frontend should show an error toast/banner, not silently return stale content
  • The compression marker [Your active task list was preserved...] should never be returned as a "response" to user input

Actual Behavior

  • server.py RSS grows to 1GB+ (56% of system memory) within ~9 minutes
  • All API calls time out
  • Frontend returns the compression marker text as if it were the model's response, with no API round-trip
  • User sees the same message repeated for every input — appears as if the model is "stuck"

Workaround

Kill the WebUI process and let keepalive restart it:

kill $(pgrep -f "server.py.*7002")
# keepalive auto-restarts within 5s

Additional Notes

  • The session file itself is not corrupted — all 544 messages are valid
  • The compression itself works correctly — the backend continues normally after compression
  • The _isPreservedCompressionTaskListMessage() function in ui.js:4544 detects this specific message pattern, which suggests the frontend has special handling for it — but when the server is unreachable, this content leaks through as a "response"
  • This may be related to how the frontend handles SSE stream disconnection when the server is overloaded

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneedinfoperformancePerformance, speed, memory, virtual scroll

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions