feat(quiet-hours): add quiet hours scheduling to notification alert rules#2615
feat(quiet-hours): add quiet hours scheduling to notification alert rules#2615
Conversation
…ules
Users can now configure quiet hours on their alert rules to suppress or hold
non-critical notifications during specified times (e.g., 11pm-7am local time).
Three override modes:
- critical_only (default): only critical severity passes through; all others dropped
- silence_all: complete silence including critical alerts (shows warning in UI)
- batch_on_wake: non-critical events held in Redis (TTL 24h) and sent as a
mini-digest when quiet hours end; critical still delivers immediately
Full stack: Convex schema + validators + mutations, relay/notification-channels
HTTP action, edge function, client service types.
Enforcement is in notification-relay.cjs: resolveQuietAction() is called per
rule before dispatch; held events RPUSH to digest:quiet-held:v1:{userId}:{variant}
and drainBatchOnWake() runs on each processEvent() call to flush held batches
for users whose quiet hours have ended.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Greptile SummaryThis PR adds a quiet hours feature to notification alert rules, allowing users to suppress, silence, or batch-hold alerts during a configured time window, with three override modes ( Key areas to address before merging:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant EdgeFn as api/notification-channels.ts
participant ConvexHTTP as convex/http.ts
participant ConvexDB as Convex alertRules
participant Relay as notification-relay.cjs
participant Redis as Upstash Redis
Client->>EdgeFn: POST set-quiet-hours (JWT)
EdgeFn->>EdgeFn: validateBearerToken
EdgeFn->>ConvexHTTP: POST /relay/notification-channels (RELAY_SECRET)
ConvexHTTP->>ConvexHTTP: validate variant, quietHoursEnabled, override enum
ConvexHTTP->>ConvexDB: setQuietHoursForUser mutation
ConvexDB->>ConvexDB: validateQuietHoursArgs (range check)
ConvexDB-->>ConvexHTTP: ok
ConvexHTTP-->>EdgeFn: {ok: true}
EdgeFn-->>Client: {ok: true}
Note over Relay,Redis: On each incoming event...
Relay->>ConvexDB: getByEnabled (all enabled rules)
Relay->>Relay: resolveQuietAction(rule, severity)
alt suppress (critical_only or silence_all during quiet hours)
Relay->>Relay: log + skip
else hold (batch_on_wake, non-critical)
Relay->>Redis: RPUSH digest:quiet-held:userId:variant
Relay->>Redis: EXPIRE key 86400
else deliver
Relay->>Relay: checkDedup
Relay->>Relay: send via channel
end
Relay->>Relay: drainBatchOnWake(allRules)
loop each batch_on_wake rule outside quiet hours
Relay->>Redis: LLEN digest:quiet-held:userId:variant
Relay->>Redis: LRANGE key
Relay->>Relay: format digest message
Relay->>Relay: send digest via channel
Relay->>Redis: DEL key
end
Reviews (1): Last reviewed commit: "feat(quiet-hours): add quiet hours sched..." | Re-trigger Greptile |
| if (quietAction === 'hold') { | ||
| console.log(`[relay] Quiet hours hold for ${rule.userId} — queuing for batch_on_wake`); | ||
| await holdEvent(rule.userId, rule.variant ?? 'full', JSON.stringify(event)); | ||
| continue; |
There was a problem hiding this comment.
Dedup check bypassed for
batch_on_wake held events — duplicate entries possible
When a held event is queued (quietAction === 'hold'), the code continues before reaching checkDedup. Because the dedup TTL is only 30 minutes, a quiet window longer than 30 minutes allows the same event title/type to arrive again through the queue, pass the dedup check (TTL expired), and get held a second time in Redis. The digest would then contain duplicates.
Consider running the dedup check for held events as well:
if (quietAction === 'hold') {
const isNew = await checkDedup(rule.userId, event.eventType, event.payload?.title ?? '');
if (!isNew) { console.log(`[relay] Dedup hit (hold) for ${rule.userId}`); continue; }
console.log(`[relay] Quiet hours hold for ${rule.userId} — queuing for batch_on_wake`);
await holdEvent(rule.userId, rule.variant ?? 'full', JSON.stringify(event));
continue;
}| function toLocalHour(nowMs, timezone) { | ||
| try { | ||
| const parts = new Intl.DateTimeFormat('en-US', { | ||
| timeZone: timezone, | ||
| hour: 'numeric', | ||
| hour12: false, | ||
| }).formatToParts(new Date(nowMs)); | ||
| const h = parts.find(p => p.type === 'hour'); | ||
| return h ? parseInt(h.value, 10) : -1; | ||
| } catch { | ||
| return -1; | ||
| } | ||
| } |
There was a problem hiding this comment.
hour12: false may produce '24' for midnight on some platforms — use hourCycle: 'h23' explicitly
hour12: false asks the formatter to use 24-hour style, but ECMA-402 does not mandate a specific hourCycle. Some platforms (particularly older ICU builds or certain locales) resolve this to h24, which represents midnight as 24 rather than 0. Parsing 24 as an hour results in values outside the 0–23 range and will break the midnight-spanning window comparison (isInQuietHours).
Using hourCycle: 'h23' explicitly guarantees the 0–23 contract:
const parts = new Intl.DateTimeFormat('en-US', {
timeZone: timezone,
hour: 'numeric',
hourCycle: 'h23',
}).formatToParts(new Date(nowMs));| function validateQuietHoursArgs(args: { | ||
| quietHoursStart?: number; | ||
| quietHoursEnd?: number; | ||
| }) { | ||
| if (args.quietHoursStart !== undefined && (args.quietHoursStart < 0 || args.quietHoursStart > 23 || !Number.isInteger(args.quietHoursStart))) { | ||
| throw new ConvexError("quietHoursStart must be an integer 0–23"); | ||
| } | ||
| if (args.quietHoursEnd !== undefined && (args.quietHoursEnd < 0 || args.quietHoursEnd > 23 || !Number.isInteger(args.quietHoursEnd))) { | ||
| throw new ConvexError("quietHoursEnd must be an integer 0–23"); | ||
| } | ||
| } |
There was a problem hiding this comment.
No timezone validation — invalid timezone silently stored and causes quiet hours to silently fail
validateQuietHoursArgs validates the numeric range of quietHoursStart/quietHoursEnd but accepts any string for quietHoursTimezone. An invalid timezone (e.g. "America/Fake") is persisted to the database, and at enforcement time toLocalHour catches the RangeError and returns -1, causing isInQuietHours to always return false. Quiet hours silently never fire.
Consider validating the timezone at write time:
function validateQuietHoursArgs(args: {
quietHoursStart?: number;
quietHoursEnd?: number;
quietHoursTimezone?: string;
}) {
if (args.quietHoursStart !== undefined && (args.quietHoursStart < 0 || args.quietHoursStart > 23 || !Number.isInteger(args.quietHoursStart))) {
throw new ConvexError("quietHoursStart must be an integer 0–23");
}
if (args.quietHoursEnd !== undefined && (args.quietHoursEnd < 0 || args.quietHoursEnd > 23 || !Number.isInteger(args.quietHoursEnd))) {
throw new ConvexError("quietHoursEnd must be an integer 0–23");
}
if (args.quietHoursTimezone !== undefined) {
try { Intl.DateTimeFormat(undefined, { timeZone: args.quietHoursTimezone }); }
catch { throw new ConvexError("quietHoursTimezone is not a valid IANA timezone"); }
}
}Note: validateQuietHoursArgs is called in both setQuietHours and setQuietHoursForUser; the same fix covers both paths.
scripts/notification-relay.cjs
Outdated
| // Drain batch_on_wake held events for users whose quiet hours just ended | ||
| await drainBatchOnWake(enabledRules); |
There was a problem hiding this comment.
drainBatchOnWake runs on every incoming event
drainBatchOnWake(enabledRules) is called inside processEvent for every event that matches at least one rule. It filters for users with batch_on_wake configured who are outside quiet hours, and issues at least one Redis LLEN call per such user. Once quiet hours end, every event triggers O(batch_users) extra Redis round-trips indefinitely — even for users with an empty held-events list — until the 24h TTL on their key eventually expires.
With even a modest number of batch_on_wake users this adds measurable latency to every event dispatch. Consider throttling the drain check with a coarse-grained in-memory timestamp guard so it runs at most once per minute per user rather than on every event.
| async function holdEvent(userId, variant, eventJson) { | ||
| const key = `digest:quiet-held:${userId}:${variant}`; | ||
| await upstashRest('RPUSH', key, eventJson); | ||
| await upstashRest('EXPIRE', key, String(QUIET_HELD_TTL)); | ||
| } |
There was a problem hiding this comment.
Non-atomic RPUSH + EXPIRE — key may persist without a TTL
holdEvent issues two separate Redis commands. If RPUSH succeeds but the subsequent EXPIRE call fails (upstashRest returns null on HTTP error), the newly created key has no expiry and will never be cleaned up automatically. Neither return value is checked, so this failure is completely silent.
If drainBatchOnWake subsequently never drains the key (e.g. the user's channels are all invalid or no event ever triggers a drain), those held events persist in Redis indefinitely — contrary to the documented 24h TTL guarantee.
At minimum, log a warning when the expiry call returns null. A more robust fix would be to check for an existing TTL before issuing a fresh EXPIRE, or to use a Redis pipeline to execute both commands atomically.
…d events P1: drainBatchOnWake was only called inside processEvent, so held events never flushed when the queue was idle after quiet hours ended. Refactored to be self-contained (fetches its own rules) and wired into the poll loop on a 5-minute timer, independent of queue activity. P1: All four send helpers returned void so delivery failures were invisible. sendTelegram/Slack/Discord/Email now return boolean (true=2xx success). drainBatchOnWake uses anyDelivered pattern; DEL only fires on confirmed delivery. P2: Held events bypassed checkDedup, allowing duplicate alerts to accumulate in the Redis list. Added dedup check before holdEvent in the hold branch.
…tch_on_wake Held alerts were silently expiring (24h TTL) if a user disabled quiet hours or changed the override before the wake-up drain ran, because drainBatchOnWake only processes rules still matching batch_on_wake + not-in-quiet-hours. Fix: when set-quiet-hours is saved with quietHoursEnabled=false or any override other than batch_on_wake, the edge function publishes flush_quiet_held to the relay queue. The relay delivers the held batch immediately via drainHeldForUser, then DELs the key. If there's nothing held, it's a no-op. Extracted drainHeldForUser() helper to share logic between drainBatchOnWake (timer-driven wake-up) and processFlushQuietHeld (settings-change path).
processFlushQuietHeld was passing null to drainHeldForUser which meant "all verified channels", ignoring the rule's configured channel list. A user with only telegram enabled could have held alerts delivered to their slack or email after changing settings. Fix: fetch the alertRule for the user+variant and pass rule.channels as allowedChannels. Falls back to all verified channels only if the rule fetch fails, preserving at-least-once delivery semantics.
alertRules:getAlertRulesByUserId is an internalQuery — unreachable via ConvexHttpClient from the relay. Switched processFlushQuietHeld to use alertRules:getByEnabled (public) with an in-memory userId+variant filter, matching the same query already used in drainBatchOnWake. On lookup failure, return early (preserve held events) instead of falling back to all channels.
When no matching enabled rule exists for the user+variant, or the rule has an empty channels list, allowedChannels stays null. Without this guard the call falls through to drainHeldForUser(null) which delivers to all verified channels — defeating the channel-scoping contract. Now return early and preserve held events for the next drain cycle.
…validation, and Quiet Hours UI - notification-relay.cjs: add QUIET_HOURS_BATCH_ENABLED env flag (default on); when =0, batch_on_wake behaves as critical_only and drainBatchOnWake no-ops - convex/alertRules.ts: validate quietHoursTimezone via Intl.DateTimeFormat so only valid IANA strings (e.g. America/New_York) are accepted; throw ConvexError with descriptive message on invalid input - preferences-content.ts: add Quiet Hours section with enabled toggle, from/to hour selects, timezone text input, and override select; wired to setQuietHours with 800ms debounce; details panel shown/hidden on toggle change Fixes gaps D, E, H from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md
…BATCH_ENABLED=0 When the relay flag disables batch-on-wake, the option was still visible in the preferences UI. Users selecting it would silently get critical_only behaviour instead. Gate the option on VITE_QUIET_HOURS_BATCH_ENABLED to keep UI and relay contract in sync. Addresses P2 review finding on #2615
…n tests Covers three paths flagged as untested by reviewers: - VITE_QUIET_HOURS_BATCH_ENABLED gates batch_on_wake option rendering - setQuietHours (public) validates quietHoursTimezone via validateQuietHoursArgs - setQuietHoursForUser (internalMutation) also validates quietHoursTimezone to prevent silent bypass through the edge-to-Convex path
Summary
quietHoursEnabled,quietHoursStart,quietHoursEnd,quietHoursTimezone,quietHoursOverridefields to ConvexalertRulesschemasetQuietHours/setQuietHoursForUserConvex mutations with 0-23 bounds validationset-quiet-hoursaction wired through/relay/notification-channelsHTTP endpoint and edge functionQuietHoursOverridetype +setQuietHours()service function added to clientnotification-relay.cjs:resolveQuietAction()applied per rule before dispatchThree override modes:
critical_only(default)silence_allbatch_on_wakedrainBatchOnWake()runs on eachprocessEvent()call — flushes held batches as a mini-digest when quiet hours end for a user.Spans-midnight windows work correctly (e.g. 23:00-07:00):
start >= endinverts the comparison.Test plan
mediumseverity event — relay logsQuiet hours suppressbatch_on_wake; send event during quiet hours —LLEN digest:quiet-held:...shows 1 entryquietHoursEnd; next event triggers drain — held batch delivered and key deletedcriticalevent during any quiet hours mode — delivers immediatelyPost-Deploy Monitoring & Validation
[relay] Quiet hours suppress/[relay] Quiet hours hold/[relay] drainBatchOnWake: sentdrainBatchOnWake: channel fetch failedor keys persisting past 24h TTL without drainv.optional()