Skip to content

feat(quiet-hours): add quiet hours scheduling to notification alert rules#2615

Merged
koala73 merged 9 commits intomainfrom
feat/news-alerts-quiet-hours
Apr 2, 2026
Merged

feat(quiet-hours): add quiet hours scheduling to notification alert rules#2615
koala73 merged 9 commits intomainfrom
feat/news-alerts-quiet-hours

Conversation

@koala73
Copy link
Copy Markdown
Owner

@koala73 koala73 commented Apr 2, 2026

Summary

  • Adds quietHoursEnabled, quietHoursStart, quietHoursEnd, quietHoursTimezone, quietHoursOverride fields to Convex alertRules schema
  • New setQuietHours / setQuietHoursForUser Convex mutations with 0-23 bounds validation
  • set-quiet-hours action wired through /relay/notification-channels HTTP endpoint and edge function
  • QuietHoursOverride type + setQuietHours() service function added to client
  • Enforcement in notification-relay.cjs: resolveQuietAction() applied per rule before dispatch

Three override modes:

Mode Non-critical during quiet hours Critical
critical_only (default) Silently dropped Delivers immediately
silence_all Silently dropped Also dropped
batch_on_wake Held in Redis (TTL 24h) Delivers immediately

drainBatchOnWake() runs on each processEvent() call — flushes held batches as a mini-digest when quiet hours end for a user.

Spans-midnight windows work correctly (e.g. 23:00-07:00): start >= end inverts the comparison.

Test plan

  • Set quiet hours 22-07 UTC on a test rule; send a medium severity event — relay logs Quiet hours suppress
  • Set batch_on_wake; send event during quiet hours — LLEN digest:quiet-held:... shows 1 entry
  • Advance clock past quietHoursEnd; next event triggers drain — held batch delivered and key deleted
  • Send critical event during any quiet hours mode — delivers immediately
  • Spans-midnight window (23:00-07:00) at hour 01 → suppressed; at hour 12 → delivers

Post-Deploy Monitoring & Validation

  • Log queries: [relay] Quiet hours suppress / [relay] Quiet hours hold / [relay] drainBatchOnWake: sent
  • Expected healthy behavior: suppress/hold logs appear during configured quiet windows; drain logs appear at quiet-hours-end
  • Failure signal: drainBatchOnWake: channel fetch failed or keys persisting past 24h TTL without drain
  • Validation window: 48h after deploy, during at least one full quiet-hours cycle
  • No schema migration needed: all new fields are v.optional()

…ules

Users can now configure quiet hours on their alert rules to suppress or hold
non-critical notifications during specified times (e.g., 11pm-7am local time).

Three override modes:
- critical_only (default): only critical severity passes through; all others dropped
- silence_all: complete silence including critical alerts (shows warning in UI)
- batch_on_wake: non-critical events held in Redis (TTL 24h) and sent as a
  mini-digest when quiet hours end; critical still delivers immediately

Full stack: Convex schema + validators + mutations, relay/notification-channels
HTTP action, edge function, client service types.

Enforcement is in notification-relay.cjs: resolveQuietAction() is called per
rule before dispatch; held events RPUSH to digest:quiet-held:v1:{userId}:{variant}
and drainBatchOnWake() runs on each processEvent() call to flush held batches
for users whose quiet hours have ended.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
worldmonitor Ignored Ignored Preview Apr 2, 2026 4:15pm

Request Review

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR adds a quiet hours feature to notification alert rules, allowing users to suppress, silence, or batch-hold alerts during a configured time window, with three override modes (critical_only, silence_all, batch_on_wake). The schema and Convex mutations are cleanly implemented and backward-compatible (all new fields are optional). The full request path is correctly wired from the client service → edge function → Convex HTTP → internal mutation.

Key areas to address before merging:

  • drainBatchOnWake performance (scripts/notification-relay.cjs line 412): The drain routine is called inside processEvent on every incoming event that matches any rule, issuing at least one Redis LLEN per batch_on_wake user even when their held-event list is empty. This creates O(batch_users) extra Redis round-trips per event and will degrade relay throughput as user counts grow. A per-user timestamp throttle or a dedicated drain schedule is needed.
  • Non-atomic RPUSH + EXPIRE (scripts/notification-relay.cjs line 125): If the EXPIRE call silently fails (as signalled by upstashRest returning null), the held-events key has no TTL and can persist indefinitely. The failure is undetected — at minimum a warning log should be added.
  • Dedup bypassed for held events (scripts/notification-relay.cjs line 427): The checkDedup call is skipped for batch_on_wake events. Because the dedup TTL is only 30 minutes, a quiet window longer than 30 minutes allows the same alert to be held multiple times, producing duplicate entries in the digest.
  • No timezone validation (convex/alertRules.ts line 112): An invalid IANA timezone is accepted and persisted; at enforcement time toLocalHour catches the RangeError and returns -1, causing quiet hours to silently never apply.

Confidence Score: 2/5

  • Not safe to merge until the per-event drain overhead and non-atomic TTL issues in notification-relay.cjs are addressed.
  • Three P1 logic issues in the relay script could cause production problems: the per-event drainBatchOnWake call will degrade event dispatch throughput as users adopt batch_on_wake; the non-atomic RPUSH+EXPIRE can leave Redis keys without TTL on transient failures; and the bypassed dedup allows duplicate digest entries during long quiet windows. The Convex schema/mutation layer is solid, but the relay enforcement code needs fixes before this ships.
  • scripts/notification-relay.cjs requires the most attention — all three P1 issues are in this file. convex/alertRules.ts needs timezone validation added.

Important Files Changed

Filename Overview
scripts/notification-relay.cjs Core enforcement logic for quiet hours — contains three P1 issues: drain runs on every event (O(n) Redis calls), RPUSH+EXPIRE are non-atomic leaving keys without TTL on failure, and held events bypass dedup allowing duplicate batch entries.
convex/alertRules.ts Adds setQuietHours and setQuietHoursForUser mutations with integer-range validation; missing timezone validation means invalid IANA strings are silently persisted and cause quiet hours to never apply.
convex/schema.ts Schema additions are all v.optional() — no migration required and backward compatible.
convex/constants.ts Adds quietHoursOverrideValidator union; clean and consistent with existing validator patterns.
convex/http.ts Adds set-quiet-hours action to the relay HTTP route; validates variant, quietHoursEnabled, and override enum before calling the internal mutation.
api/notification-channels.ts Adds set-quiet-hours action to the edge function; input validation is duplicated across the edge layer and Convex HTTP layer but consistent.
src/services/notification-channels.ts Adds QuietHoursOverride type and setQuietHours() client service function; straightforward and well-typed.

Sequence Diagram

sequenceDiagram
    participant Client
    participant EdgeFn as api/notification-channels.ts
    participant ConvexHTTP as convex/http.ts
    participant ConvexDB as Convex alertRules
    participant Relay as notification-relay.cjs
    participant Redis as Upstash Redis

    Client->>EdgeFn: POST set-quiet-hours (JWT)
    EdgeFn->>EdgeFn: validateBearerToken
    EdgeFn->>ConvexHTTP: POST /relay/notification-channels (RELAY_SECRET)
    ConvexHTTP->>ConvexHTTP: validate variant, quietHoursEnabled, override enum
    ConvexHTTP->>ConvexDB: setQuietHoursForUser mutation
    ConvexDB->>ConvexDB: validateQuietHoursArgs (range check)
    ConvexDB-->>ConvexHTTP: ok
    ConvexHTTP-->>EdgeFn: {ok: true}
    EdgeFn-->>Client: {ok: true}

    Note over Relay,Redis: On each incoming event...
    Relay->>ConvexDB: getByEnabled (all enabled rules)
    Relay->>Relay: resolveQuietAction(rule, severity)
    alt suppress (critical_only or silence_all during quiet hours)
        Relay->>Relay: log + skip
    else hold (batch_on_wake, non-critical)
        Relay->>Redis: RPUSH digest:quiet-held:userId:variant
        Relay->>Redis: EXPIRE key 86400
    else deliver
        Relay->>Relay: checkDedup
        Relay->>Relay: send via channel
    end
    Relay->>Relay: drainBatchOnWake(allRules)
    loop each batch_on_wake rule outside quiet hours
        Relay->>Redis: LLEN digest:quiet-held:userId:variant
        Relay->>Redis: LRANGE key
        Relay->>Relay: format digest message
        Relay->>Relay: send digest via channel
        Relay->>Redis: DEL key
    end
Loading

Reviews (1): Last reviewed commit: "feat(quiet-hours): add quiet hours sched..." | Re-trigger Greptile

Comment on lines +427 to +430
if (quietAction === 'hold') {
console.log(`[relay] Quiet hours hold for ${rule.userId} — queuing for batch_on_wake`);
await holdEvent(rule.userId, rule.variant ?? 'full', JSON.stringify(event));
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dedup check bypassed for batch_on_wake held events — duplicate entries possible

When a held event is queued (quietAction === 'hold'), the code continues before reaching checkDedup. Because the dedup TTL is only 30 minutes, a quiet window longer than 30 minutes allows the same event title/type to arrive again through the queue, pass the dedup check (TTL expired), and get held a second time in Redis. The digest would then contain duplicates.

Consider running the dedup check for held events as well:

if (quietAction === 'hold') {
  const isNew = await checkDedup(rule.userId, event.eventType, event.payload?.title ?? '');
  if (!isNew) { console.log(`[relay] Dedup hit (hold) for ${rule.userId}`); continue; }
  console.log(`[relay] Quiet hours hold for ${rule.userId} — queuing for batch_on_wake`);
  await holdEvent(rule.userId, rule.variant ?? 'full', JSON.stringify(event));
  continue;
}

Comment on lines +84 to +96
function toLocalHour(nowMs, timezone) {
try {
const parts = new Intl.DateTimeFormat('en-US', {
timeZone: timezone,
hour: 'numeric',
hour12: false,
}).formatToParts(new Date(nowMs));
const h = parts.find(p => p.type === 'hour');
return h ? parseInt(h.value, 10) : -1;
} catch {
return -1;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 hour12: false may produce '24' for midnight on some platforms — use hourCycle: 'h23' explicitly

hour12: false asks the formatter to use 24-hour style, but ECMA-402 does not mandate a specific hourCycle. Some platforms (particularly older ICU builds or certain locales) resolve this to h24, which represents midnight as 24 rather than 0. Parsing 24 as an hour results in values outside the 0–23 range and will break the midnight-spanning window comparison (isInQuietHours).

Using hourCycle: 'h23' explicitly guarantees the 0–23 contract:

const parts = new Intl.DateTimeFormat('en-US', {
  timeZone: timezone,
  hour: 'numeric',
  hourCycle: 'h23',
}).formatToParts(new Date(nowMs));

Comment on lines +112 to +122
function validateQuietHoursArgs(args: {
quietHoursStart?: number;
quietHoursEnd?: number;
}) {
if (args.quietHoursStart !== undefined && (args.quietHoursStart < 0 || args.quietHoursStart > 23 || !Number.isInteger(args.quietHoursStart))) {
throw new ConvexError("quietHoursStart must be an integer 0–23");
}
if (args.quietHoursEnd !== undefined && (args.quietHoursEnd < 0 || args.quietHoursEnd > 23 || !Number.isInteger(args.quietHoursEnd))) {
throw new ConvexError("quietHoursEnd must be an integer 0–23");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No timezone validation — invalid timezone silently stored and causes quiet hours to silently fail

validateQuietHoursArgs validates the numeric range of quietHoursStart/quietHoursEnd but accepts any string for quietHoursTimezone. An invalid timezone (e.g. "America/Fake") is persisted to the database, and at enforcement time toLocalHour catches the RangeError and returns -1, causing isInQuietHours to always return false. Quiet hours silently never fire.

Consider validating the timezone at write time:

function validateQuietHoursArgs(args: {
  quietHoursStart?: number;
  quietHoursEnd?: number;
  quietHoursTimezone?: string;
}) {
  if (args.quietHoursStart !== undefined && (args.quietHoursStart < 0 || args.quietHoursStart > 23 || !Number.isInteger(args.quietHoursStart))) {
    throw new ConvexError("quietHoursStart must be an integer 0–23");
  }
  if (args.quietHoursEnd !== undefined && (args.quietHoursEnd < 0 || args.quietHoursEnd > 23 || !Number.isInteger(args.quietHoursEnd))) {
    throw new ConvexError("quietHoursEnd must be an integer 0–23");
  }
  if (args.quietHoursTimezone !== undefined) {
    try { Intl.DateTimeFormat(undefined, { timeZone: args.quietHoursTimezone }); }
    catch { throw new ConvexError("quietHoursTimezone is not a valid IANA timezone"); }
  }
}

Note: validateQuietHoursArgs is called in both setQuietHours and setQuietHoursForUser; the same fix covers both paths.

Comment on lines +412 to +413
// Drain batch_on_wake held events for users whose quiet hours just ended
await drainBatchOnWake(enabledRules);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 drainBatchOnWake runs on every incoming event

drainBatchOnWake(enabledRules) is called inside processEvent for every event that matches at least one rule. It filters for users with batch_on_wake configured who are outside quiet hours, and issues at least one Redis LLEN call per such user. Once quiet hours end, every event triggers O(batch_users) extra Redis round-trips indefinitely — even for users with an empty held-events list — until the 24h TTL on their key eventually expires.

With even a modest number of batch_on_wake users this adds measurable latency to every event dispatch. Consider throttling the drain check with a coarse-grained in-memory timestamp guard so it runs at most once per minute per user rather than on every event.

Comment on lines +125 to +129
async function holdEvent(userId, variant, eventJson) {
const key = `digest:quiet-held:${userId}:${variant}`;
await upstashRest('RPUSH', key, eventJson);
await upstashRest('EXPIRE', key, String(QUIET_HELD_TTL));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-atomic RPUSH + EXPIRE — key may persist without a TTL

holdEvent issues two separate Redis commands. If RPUSH succeeds but the subsequent EXPIRE call fails (upstashRest returns null on HTTP error), the newly created key has no expiry and will never be cleaned up automatically. Neither return value is checked, so this failure is completely silent.

If drainBatchOnWake subsequently never drains the key (e.g. the user's channels are all invalid or no event ever triggers a drain), those held events persist in Redis indefinitely — contrary to the documented 24h TTL guarantee.

At minimum, log a warning when the expiry call returns null. A more robust fix would be to check for an existing TTL before issuing a fresh EXPIRE, or to use a Redis pipeline to execute both commands atomically.

…d events

P1: drainBatchOnWake was only called inside processEvent, so held events never
flushed when the queue was idle after quiet hours ended. Refactored to be
self-contained (fetches its own rules) and wired into the poll loop on a
5-minute timer, independent of queue activity.

P1: All four send helpers returned void so delivery failures were invisible.
sendTelegram/Slack/Discord/Email now return boolean (true=2xx success).
drainBatchOnWake uses anyDelivered pattern; DEL only fires on confirmed delivery.

P2: Held events bypassed checkDedup, allowing duplicate alerts to accumulate
in the Redis list. Added dedup check before holdEvent in the hold branch.
…tch_on_wake

Held alerts were silently expiring (24h TTL) if a user disabled quiet hours or
changed the override before the wake-up drain ran, because drainBatchOnWake
only processes rules still matching batch_on_wake + not-in-quiet-hours.

Fix: when set-quiet-hours is saved with quietHoursEnabled=false or any override
other than batch_on_wake, the edge function publishes flush_quiet_held to the
relay queue. The relay delivers the held batch immediately via drainHeldForUser,
then DELs the key. If there's nothing held, it's a no-op.

Extracted drainHeldForUser() helper to share logic between drainBatchOnWake
(timer-driven wake-up) and processFlushQuietHeld (settings-change path).
koala73 added 4 commits April 2, 2026 13:48
processFlushQuietHeld was passing null to drainHeldForUser which meant
"all verified channels", ignoring the rule's configured channel list.
A user with only telegram enabled could have held alerts delivered to
their slack or email after changing settings.

Fix: fetch the alertRule for the user+variant and pass rule.channels
as allowedChannels. Falls back to all verified channels only if the
rule fetch fails, preserving at-least-once delivery semantics.
alertRules:getAlertRulesByUserId is an internalQuery — unreachable via
ConvexHttpClient from the relay. Switched processFlushQuietHeld to use
alertRules:getByEnabled (public) with an in-memory userId+variant filter,
matching the same query already used in drainBatchOnWake. On lookup
failure, return early (preserve held events) instead of falling back to
all channels.
When no matching enabled rule exists for the user+variant, or the rule
has an empty channels list, allowedChannels stays null. Without this
guard the call falls through to drainHeldForUser(null) which delivers
to all verified channels — defeating the channel-scoping contract.
Now return early and preserve held events for the next drain cycle.
…validation, and Quiet Hours UI

- notification-relay.cjs: add QUIET_HOURS_BATCH_ENABLED env flag (default on);
  when =0, batch_on_wake behaves as critical_only and drainBatchOnWake no-ops
- convex/alertRules.ts: validate quietHoursTimezone via Intl.DateTimeFormat so
  only valid IANA strings (e.g. America/New_York) are accepted; throw ConvexError
  with descriptive message on invalid input
- preferences-content.ts: add Quiet Hours section with enabled toggle, from/to
  hour selects, timezone text input, and override select; wired to setQuietHours
  with 800ms debounce; details panel shown/hidden on toggle change

Fixes gaps D, E, H from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md
…BATCH_ENABLED=0

When the relay flag disables batch-on-wake, the option was still visible in the
preferences UI. Users selecting it would silently get critical_only behaviour
instead. Gate the option on VITE_QUIET_HOURS_BATCH_ENABLED to keep UI and relay
contract in sync.

Addresses P2 review finding on #2615
…n tests

Covers three paths flagged as untested by reviewers:
- VITE_QUIET_HOURS_BATCH_ENABLED gates batch_on_wake option rendering
- setQuietHours (public) validates quietHoursTimezone via validateQuietHoursArgs
- setQuietHoursForUser (internalMutation) also validates quietHoursTimezone
  to prevent silent bypass through the edge-to-Convex path
@koala73 koala73 merged commit 72a9801 into main Apr 2, 2026
7 checks passed
@koala73 koala73 deleted the feat/news-alerts-quiet-hours branch April 2, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant