Skip to content

Ghost user cleanup#303

Draft
mikewillems wants to merge 18 commits intostagingfrom
mw/fix/ghost-user-cleanup
Draft

Ghost user cleanup#303
mikewillems wants to merge 18 commits intostagingfrom
mw/fix/ghost-user-cleanup

Conversation

@mikewillems
Copy link
Collaborator

@mikewillems mikewillems commented Feb 13, 2026

Ghost user Cleanup #282

What is in this PR?

After #281 established the heartbeat and mostRecentPresentTime, no backend mechanism detected when those updates stopped. Users who disconnected without explicitly leaving continued to appear as present participants, inflating room counts and voting populations. Additionally, room-based queries (breakoutRoomParticipantsStream, VoteToKick) did not filter by isPresent, and the UpdateLiveStreamParticipantCount cloud function's pre-check window excluded currently-running events, freezing livestream counts at 1.

This PR adds a scheduled cleanup function, fixes the livestream count function, closes query filter gaps, fixes participant count displays outside the meeting, hardens breakout room transitions, and reduces heartbeat frequency by 4x.

Commit 1 -- 97db878: Presence heartbeat semantics and breakout room null safety

  • Change _presenceUpdater conditions to a semantic boolean check (leftMeeting / enterMeetingPrescreen)
  • Replace _activeBreakoutRoomId with _presenceRoomId getter that falls back to currentBreakoutRoomId after Agora confirms connection, fixing breakout room grid cards showing 0 participants
  • Fix null checks and log messages for breakout room transition variable

Commits 2-4 -- 52e5fe5, 3768431, 18ed1d8: Breakout room transition timeout

  • Add transition timer to detect stalled breakout room joins (5s warning logging)
  • Cancel stalled transitions after 10 seconds, return user to main meeting with localized error message
  • Separate _cachedJoinInfoRoomId from _inTransitionToBreakoutRoomId for clean state tracking

Commit 5 -- 9682417: Composite Indexes

  • Add isPresent + mostRecentPresentTime index (collection group) for the cleanup query
  • Add currentBreakoutRoomId + isPresent index (collection) for filtered room queries
  • Deployed first so indexes are built before dependent code ships

Commit 6 -- 9a03aed: Filter breakoutRoomParticipantsStream by isPresent

  • Add isPresent == true to the Firestore query in breakoutRoomParticipantsStream()
  • All admin panel room counts and participant lists now exclude offline participants

Commit 7 -- 026def5: Filter VoteToKick participants by isPresent

  • Add isPresent == true to the participant population query in VoteToKick
  • Ghost participants no longer inflate the voting denominator

Commit 8 -- d4f5f1d: Stale Participant Cleanup Function

  • Scheduled function runs every minute via Cloud Scheduler
  • Internally subdivides into 6 passes (roughly every 10s) for faster ghost detection
  • Queries WHERE isPresent == true AND mostRecentPresentTime < (now - 45s)
  • Transaction-based updates with a freshness guard (re-reads timestamp before writing)
  • Sets isPresent: false, preserves currentBreakoutRoomId
  • Timeout derived from heartbeat interval: 2 x 20s plus 5s buffer = 45s
  • Add CleanupStaleParticipants() to main.dart

Commit 9 -- 71d098e: Unit Tests

  • Verify stale participants are marked offline and active ones are not

Commit 10 -- f4849a9: Java 21 in GH Workflow

Commit 11 -- 26e1e58: Fix firebase emulator port mismatch errors

  • Firestore rules tests were silently broken from commit 36209 on Nov 20, 2025, due to a port mismatch (8080 to 8081) between the test harness and firebase.json configuration. Tests never ran in CI since then because no PRs modified firebase/firestore/** files.

Commit 12 -- ba9381b: Fix count display on admin panel and event cards

  • Fix participant count display bug on admin panel and community page event cards

Commit 13 -- 2489f14: Fix participant estimates on out-of-meeting views

  • UpdateLiveStreamParticipantCount pre-check window changed from [now, now + 1 day] to [now - maxDuration x 1.5, now + 1 hour], derived from named constants
  • Per-event filter applies exact formula using actual durationInMinutes
  • EventProvider, EventParticipantsList, ParticipantsDialog, and EventButton now filter by isPresent when live participants exist
  • Creator no longer unconditionally shown in participant avatar prefix

Commit 14 -- 6184022: Reduce heartbeat frequency and tighten cleanup interval

  • Client heartbeat: 5s to 20s (75% fewer Firestore writes)
  • Add html.window.onOnline listener for immediate heartbeat on network reconnect
  • CleanupStaleParticipants: restructured to 6 passes per minute with 45s stale threshold
  • UpdateLiveStreamParticipantCount: update window widened from 19s to 40s to accommodate slower heartbeat

Commit 15: Add test coverage for VoteToKick isPresent filter and pre-check window fix

  • UpdateLiveStreamParticipantCount: add test for a currently-running event (scheduledTime in the past), which was the exact scenario broken by the old pre-check window
  • VoteToKick: add test verifying ghost participants (isPresent: false) are excluded from the kick vote denominator
  • event_test_utils: add currentBreakoutRoomId parameter to joinEvent()

Commit 16: Fix cleanup test failures and remove dead code from test utilities

  • All cleanup tests now call runCleanupPass(0) directly instead of action(MockEventContext()). The action() method schedules 6 passes over 50 seconds of real time, causing "fresh" timestamps set before the call to age past the 45s threshold by pass 5
  • Rename _runCleanupPass to runCleanupPass (public) so tests can invoke a single pass without the scheduling delay
  • Replace incorrect boundary test (tested at 59s, which is above the 45s threshold, with an expectation of isTrue) with two bracketing tests at 44s and 46s
  • Rename underscore-prefixed local helper functions per Dart lint rule no_leading_underscores_for_local_identifiers
  • Remove dead commented-out code in event_test_utils referencing services that are not available in the test utility class

Changes in the codebase

Presence timing constants

Component Value Rationale
Client heartbeat interval 20s 4x reduction from 5s
Stale threshold (cleanup) 45s 2 x 20s heartbeat plus 5s buffer
Cleanup check interval roughly 10s 6 passes per Cloud Scheduler minute
Livestream count update window roughly 40s 15s interval plus 25s heartbeat buffer
Pre-check lower bound now minus 360min maxEventDuration(240) x 1.5
Pre-check upper bound now plus 1hr Pre-event window

Interaction with existing cleanup (UpdatePresenceStatus)

The RTDB-based UpdatePresenceStatus continues to operate. It fires on RTDB /status/{uid} changes and sets isPresent: false AND currentBreakoutRoomId: null. The scheduled cleanup is a fallback for cases where:

  • RTDB presence detection is delayed (e.g., Firebase SDK reconnection window of 30-120s)
  • The device loses connectivity in a way that does not trigger the RTDB disconnect handler

The two mechanisms are complementary, not conflicting. If RTDB cleanup fires first, the scheduled cleanup skips the participant (already isPresent: false). If the scheduled cleanup fires first, RTDB cleanup will either skip it (via the lastUpdatedTime guard) or harmlessly re-set isPresent: false.

Key difference: UpdatePresenceStatus clears currentBreakoutRoomId to null. The scheduled cleanup preserves it. Both behaviors are safe because all room-based queries now filter by isPresent.

Three-layer disconnect detection

Layer Mechanism Latency Covers
1. Explicit write onBeforeUnload / dispose() Instant Tab close, in-app navigation, explicit leave
2. RTDB disconnect onDisconnect() to UpdatePresenceStatus 30-120s Network loss, browser crash (when Firebase SDK detects)
3. Scheduled cleanup CleanupStaleParticipants (every 10s) 45-55s Everything else (crash, kill, network loss bypassing RTDB)

Changes outside the codebase

None

Testing this PR

  • Unit tests: Mock Firestore, verify cleanup marks stale participants offline and skips fresh ones
  • Integration test: Join event, kill browser, confirm participant marked offline within 55s
  • Load test: 100+ participants, verify cleanup completes within scheduled interval
  • Regression: Verify breakoutRoomParticipantsStream and VoteToKick queries return only present participants
  • Edge case: Participant reconnects during cleanup window; verify onOnline heartbeat restores isPresent: true before cleanup fires
  • Edge case: Mobile browser backgrounded briefly (less than 45s); verify participant is not falsely cleaned up
  • Edge case: Breakout room transition stalls more than 10s; verify user is returned to main meeting with error toast
  • Edge case: Event running for 3+ hours; verify UpdateLiveStreamParticipantCount pre-check window still includes it

Future Work

  • Transition-only writes: Remove periodic heartbeat entirely; write isPresent only on join/leave/transition. Reduces Firestore writes by roughly 97%. Requires promoting CleanupStaleParticipants and RTDB disconnect to primary detection and increasing stale threshold. Estimated effort: roughly 2 days.
  • Composite index for livestream participant dialog: Add (status, isPresent, createdDate) index so ParticipantsDialog._buildLiveStreamEventParticipants can filter to present-only users
  • Metrics dashboard: Track cleanup frequency, stale participant rates, detection latency
  • checkAdvanceMeetingGuide isPresent filter: Currently mitigated by Agora presentIds cross-reference; add for consistency

…ean online indicator. Fix log messages and null checks for breakout room transition variable. Replaced the generic warning in _presenceRoomId with a descriptive message explaining the uncommon null state. Change isNullOrEmpty check to a strict null check.
…TransitionToBreakoutRoomId on successful Agora connection. Separate caching logic into _cachedJoinInfoRoomId. Add a timer that logs a warning every 5 seconds if a breakout room transition has not completed, indicating a possible connection issue.
…oes not confirm a breakout room connection within 10 seconds, cancel the transition, return the user to the main meeting, and display an error message.
firebase-tools now requires Java 21+, but the workflow relied on the
runner's default JDK (Java 17). Add actions/setup-java@v4 with Temurin
21 before the emulator steps to fix the version mismatch.
@github-actions
Copy link

github-actions bot commented Feb 13, 2026

Visit the preview URL for this PR (updated for commit ab8f525):

https://gen-hls-bkc-7627--pr303-mw-fix-ghost-user-cl-p3yywb2i.web.app

(expires Thu, 05 Mar 2026 19:46:00 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: eed668cca81618d491d024574a8f8a6003deaa8d

@mikewillems mikewillems marked this pull request as draft February 13, 2026 20:17
@mikewillems mikewillems marked this pull request as ready for review February 14, 2026 02:22
@epenn epenn self-requested a review February 17, 2026 16:40
@epenn
Copy link
Contributor

epenn commented Feb 17, 2026

Thanks Mike, I'll be able to review this within the next couple days.

@mikewillems mikewillems changed the title Mw/fix/ghost user cleanup Ghost user cleanup Feb 17, 2026
…s, streams now estimate in time-bounded range
@mikewillems mikewillems requested review from katherineqian and removed request for epenn February 20, 2026 02:46
@katherineqian katherineqian marked this pull request as draft March 5, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants