Skip to content

fix: clarify update network failures#1684

Merged
2 commits merged into
nesquena:masterfrom
Michaelyklam:fix/issue-1321-update-network-error
May 5, 2026
Merged

fix: clarify update network failures#1684
2 commits merged into
nesquena:masterfrom
Michaelyklam:fix/issue-1321-update-network-error

Conversation

@Michaelyklam
Copy link
Copy Markdown
Contributor

Thinking Path

Issue #1321 was narrowed by maintainer comments to a frontend UX gap: fetch('/api/updates/apply') can reject before any HTTP response arrives, so the browser's raw Failed to fetch text leaked into the Update Now error UI. Server-reachable updater failures already return structured JSON and should keep the existing targeted message path.

What Changed

  • Added frontend detection for update-apply network/interrupted fetch failures (Failed to fetch, NetworkError, Load failed).
  • Replaced the raw browser error with recovery-oriented guidance: the WebUI may have restarted or the connection was interrupted; wait, reload, and check the server if needed.
  • Kept structured server JSON errors flowing through _showUpdateError(target, res) unchanged.
  • Added an in-flight guard so repeated Update Now clicks do not send duplicate apply requests during restart-race windows.
  • Added source-level regression coverage for the clearer network message, structured JSON error path, and duplicate-submit guard.

Why It Matters

Users hitting a dying-process, stale-port, service-worker, or interrupted connection during self-update now get actionable recovery steps instead of the opaque browser string Failed to fetch. The duplicate-submit guard also reduces the likely double-click race that can hit the server while it is already restarting.

Evidence / UI Media

Mocked TypeError: Failed to fetch from /api/updates/apply on an isolated local WebUI server:

Update network failure message

Verification

  • RED: /home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_update_apply_ui.py -q initially failed on the missing friendly network-error formatter and in-flight guard.
  • node --check static/ui.js
  • /home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_update_apply_ui.py tests/test_updates.py tests/test_version_badge.py tests/test_service_worker_api_cache.py tests/test_subpath_frontend_routes.py -q45 passed in 1.89s
  • git diff --check
  • Browser validation on temporary server http://127.0.0.1:18791/?test_updates=1 with synthetic mocked failures:
    • mocked TypeError: Failed to fetch shows the recovery-oriented message and does not show only raw Failed to fetch.
    • mocked {ok:false,message:'structured server failure'} still shows Update failed (webui): structured server failure.

Risks / Follow-ups

  • The fix intentionally stays frontend-only and does not redesign the updater or environmental diagnostics.
  • The duplicate-click guard remains active through successful update/restart, and re-enables after failures; network/interrupted failures use a short 5s cooldown before retry.

Closes #1321

Model Used

AI-assisted with Hermes Agent using OpenAI Codex gpt-5.5; tools used included terminal, file editing, browser validation, and GitHub CLI.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Review

Reading the PR diff against origin/master and pulling the worktree at 03949f8, the change does exactly what the issue thread converged on: distinguish a fetch-rejected-before-response case from server-reachable JSON failures, and add an in-flight guard so a double-click during the dying-process restart window can't land a second POST.

Code I read

  • static/ui.js PR HEAD lines 3205-3257 (the new _isUpdateApplyNetworkError, _formatUpdateApplyExceptionMessage, applyUpdates rewrite, plus the existing _showUpdateError it calls into).
  • static/ui.js on origin/master lines 3205-3234 (the original raw 'Update failed: '+e.message catch).
  • static/workspace.js:1-40 — the shared api() wrapper, since the catch in applyUpdates is what receives whatever this throws.
  • api/updates.py:386-496 and api/routes.py:4319-4325 — confirms every server-reachable failure goes through _apply_update_inner returning structured JSON via j(handler, …), so the try body's _showUpdateError(target,res) path covers all server-side failures and the catch(e) block is genuinely only reached on transport-level failure.
  • tests/test_update_apply_ui.py (new, 42 lines) — same source-assertion pattern already in use at tests/test_ollama_model_chip_label_regression.py:5.

What I like

The browser-string regex is the right shape — Failed to fetch (Chrome), NetworkError (Firefox), Load failed (Safari) covers all three engines, and the /i flag is conservative. The 5s cooldown only on the network branch, with resetApplyButton(0) everywhere else, is the right tradeoff: structured server errors should be retryable immediately, transport failures benefit from a brief debounce. And keeping _showUpdateError(target,res) untouched means the Force update recovery flow (forceBtn.dataset.target = target) and conflict-message paths stay intact, which is what I'd want from a frontend-only patch.

The interaction with static/workspace.js:1 api() is also clean: api() already retries TypeError up to 2 times before re-throwing, so by the time the catch(e) here fires, three attempts have already been burned. The 5s cooldown after that is appropriate — the connection has been confirmed dead, give the restart a moment.

One small nit (non-blocking)

api() distinguishes HTTP-error throws (which set err.status) from transport throws (which do not). The current _isUpdateApplyNetworkError only inspects error.message, which means a server response whose body happens to contain the literal string "Failed to fetch" would be misclassified. Practically impossible given how j(handler, dict) serializes, but it'd be more defensive to gate on !error.status:

function _isUpdateApplyNetworkError(error){
  if(error && error.status) return false;  // got an HTTP response
  const message=(error&&error.message)||String(error||'');
  return /Failed to fetch|NetworkError|Load failed/i.test(message);
}

Optional — happy to merge as-is and revisit if it ever surfaces.

Test coverage

The three new tests are source-level assertions, consistent with how the rest of the JS-touching tests in this repo are written (tests/test_ollama_model_chip_label_regression.py, tests/test_subpath_frontend_routes.py). They lock down: (a) the new recovery message text, (b) that the structured-server-error path still goes through _showUpdateError before resetApplyButton(0), (c) that the _updateApplyInFlight early-return is in applyUpdates. All three CI workers (3.11/3.12/3.13) green.

Verification I did

Read-only diff inspection only, per cron policy — I didn't run anything from the worktree. CI passing across 3.11/3.12/3.13 covers what I'd otherwise have run locally.

LGTM. Will let nesquena take the merge call.

@Michaelyklam
Copy link
Copy Markdown
Contributor Author

Addressed the optional reviewer nit in d51510a.

What changed:

  • _isUpdateApplyNetworkError(error) now returns false immediately when error.status is present, so HTTP-response errors are not treated as transport/fetch failures.
  • Added source-level regression coverage for the status guard while keeping Failed to fetch / NetworkError / Load failed covered as transport failures.

Verification:

  • RED: pytest tests/test_update_apply_ui.py::test_update_apply_network_error_classifier_ignores_http_status_errors -q failed before the guard.
  • node --check static/ui.js
  • pytest tests/test_update_apply_ui.py tests/test_update_banner_fixes.py tests/test_updates.py tests/test_update_checker.py -q → 56 passed
  • git diff --check
  • gh pr checks 1684 --repo nesquena/hermes-webui --watch --interval 10 → test (3.11), test (3.12), test (3.13) all passed

Cleanup: disposable worktree /tmp/hermes-webui-kanban/t_041bd198/pr1684 was removed.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Follow-up review on d51510a

Reading the diff at d51510a against my previous review point: the optional nit is now addressed exactly the way I sketched it.

function _isUpdateApplyNetworkError(error){
  if(error && error.status) return false;
  const message=(error&&error.message)||String(error||'');
  return /Failed to fetch|NetworkError|Load failed/i.test(message);
}

The error.status short-circuit lands before the message regex, so any thrown error carrying an HTTP status (which is how static/workspace.js:1 api() re-throws non-2xx responses) bypasses the transport-error classification entirely. That closes the theoretical misclassification I called out, and the regex still covers Chrome / Firefox / Safari for actual transport failures.

New test coverage

tests/test_update_apply_ui.py:34-44 (test_update_apply_network_error_classifier_ignores_http_status_errors) is the right shape — it slices out just the function body between _isUpdateApplyNetworkError and _formatUpdateApplyExceptionMessage, then asserts both:

assert "if(error&&error.status)returnfalse;" in compact
assert body.index("error.status") < body.index("/Failed to fetch|NetworkError|Load failed/i")

The ordering check is what I'd want — if a future refactor moves the regex check before the status guard, this test fails and tells you why. Source-level pattern matches the rest of the file's tests (test_update_apply_structured_server_errors_still_use_json_message_path, test_update_apply_prevents_duplicate_apply_requests_while_in_flight).

CI

3.11/3.12/3.13 green per statusCheckRollup. RED-GREEN history reported in the previous comment confirms the new test would have caught this regression on master.

LGTM. Ready for nesquena to merge. Closes #1321.

@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into nesquena:master in 4daa238 May 5, 2026
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Closed by the v0.51.4 release in PR #1707 (merged at 4daa238, deployed to production).

Live on production: https://github.com/nesquena/hermes-webui/releases/tag/v0.51.4

🚀

Michaelyklam pushed a commit to Michaelyklam/hermes-webui that referenced this pull request May 5, 2026
Michaelyklam added a commit to Michaelyklam/hermes-webui that referenced this pull request May 5, 2026
10 PRs (3 surfaces additions, 7 fixes):
- nesquena#1644 model picker chip + group count (@bergeouss, closes nesquena#1425)
- nesquena#1684 update network failures UX (@Michaelyklam, closes nesquena#1321)
- nesquena#1685 Codex spark models (@Michaelyklam, closes nesquena#1680)
- nesquena#1689 normalize profile base homes (@Michaelyklam, refs nesquena#749)
- nesquena#1693 adaptive title refresh deadlock (@ai-ag2026)
- nesquena#1701 normalize update banner URL (@Michaelyklam, closes nesquena#1691)
- nesquena#1702 workspace double-click rename (@Michaelyklam, closes nesquena#1698)
- nesquena#1703 cache invalidation on auth-store drift (@Michaelyklam, closes nesquena#1699)
- nesquena#1704 markdown fence lengths (@Michaelyklam, closes nesquena#1696)
- nesquena#1706 multi-image paste fix (@Michaelyklam, closes nesquena#1697)

Tests: 4477 → 4503 (+26). Opus: SHIP, 7/7 verification clean.

Co-authored-by: Michael Lam <Michaelyklam1@gmail.com>
Co-authored-by: ai-ag2026 <noreply@github.com>
Co-authored-by: bergeouss <noreply@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update failed: Failed to Fetch

2 participants