Skip to content

Harden record-pears so Slack failures can't crash the board#561

Merged
marcdel merged 3 commits into
mainfrom
harden/record-pears-slack-resilience
Jun 10, 2026
Merged

Harden record-pears so Slack failures can't crash the board#561
marcdel merged 3 commits into
mainfrom
harden/record-pears-slack-resilience

Conversation

@marcdel

@marcdel marcdel commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Follow-up to #560. Stacked on #560 — until that merges, this PR's diff also includes #560's commits (OTP bump + CI unblock); review the Harden record-pears… commit for this change, or merge after #560.

#560 fixes the root cause (the OTP 27.2 TLS bug). This makes record-pears resilient so a slow/failing Slack post can't crash Save or time out the browser.

Changes

  • Pears.Slack.do_send_message/3 rescues transport-level exceptions and returns {:error, _} instead of raising — no caller can be crashed by a Slack hiccup.
  • record-pears posts the summary off the reply path (self-sent message + handle_info) and flashes the user when the post fails, instead of silently dropping the result.
  • Fixed the fallback case branch that returned a bare O11y.set_error/1 instead of {:noreply, socket} (itself a crash).
  • Dropped the redundant push_navigate(to: "/") on success — the board already refreshes via the PubSub team-updated broadcast.

Design note

Approved design floated a Task.Supervisor; I used a self-sent message + handle_info instead — same goals (off reply path, crash-safe, failure feedback), deterministically testable, no new supervision infra. Trade-off: a genuinely slow Slack call blocks the LiveView process (not the client reply); acceptable since #560 removes the hang and the call is now crash-safe. Can switch to a Task if you prefer true concurrency.

Tests (TDD)

  • slack_test: send returns {:error,_} instead of raising when the client blows up.
  • pairing_board_live_test: clicking Save still records + warns the user without crashing when Slack fails.
  • Full suite: 290 tests, 0 failures; compile (warnings-as-errors)/format/credo green.

🤖 Generated with Claude Code

marcdel and others added 3 commits June 10, 2026 10:40
OTP 27.2.x has a TLS certificate-validation bug that rejects Slack's
(recently rotated) certificate chain with a fatal `key_usage_mismatch`
alert — an intermediate carrying serverAuth EKU alongside CA key usage.
This raised out of SlackClient.send_message and crashed the record-pears
LiveView handler when posting the daily pairs summary, so "Save" timed
out in the browser ({timeout: true}) and no message posted.

Reproduced with a bare :ssl.connect to api.slack.com:
  - OTP 27.2.1:            fails (key_usage_mismatch)
  - OTP 27.3.4.11 / 28.5:  handshake succeeds

Bumps the pin in .tool-versions, both CI matrices, and the Dockerfile
builder image (with the matching debian-bookworm-20260518 snapshot that
has a published hexpm image; runner image moves with it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two issues kept main red and undeployable:

- `mix deps.unlock --check-unused` failed because :castore became unused
  after the Phoenix 1.8 upgrade. Removed it from the lock.
- config/runtime.exs called System.fetch_env! for SLACK_CLIENT_ID,
  SLACK_CLIENT_SECRET and SLACK_SIGNING_SECRET unconditionally, crashing
  the app at boot in test/CI whenever SLACK_SIGNING_SECRET is unset. These
  are now required only in :prod; dev/test fall back to placeholders.

Note: prod (fly app pears-app) is still missing the SLACK_SIGNING_SECRET
secret, so it must be set before the next deploy will boot.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The record-pears handler posted the daily summary synchronously and
ignored the result, so a raised Slack error (e.g. the OTP 27.2 TLS
handshake bug) crashed the LiveView and the click timed out in the
browser. Even with that bug fixed, a slow or failing Slack post
shouldn't be able to take down "Save".

- Pears.Slack.do_send_message now rescues transport-level exceptions and
  returns {:error, _} instead of raising.
- record-pears posts the summary off the reply path (self-sent message +
  handle_info) so it can't block the response, and flashes the user when
  the post fails instead of silently dropping it.
- Fixed the fallback case branch that returned a bare O11y.set_error/1
  instead of {:noreply, socket}, which would itself crash the handler.
- Dropped the now-unnecessary push_navigate on the success path; the
  board already refreshes via the PubSub team-updated broadcast.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@marcdel marcdel merged commit 762708b into main Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant