Skip to content

Create online heartbeat & clarify user meeting state variables. #297

Open
mikewillems wants to merge 28 commits intostagingfrom
mw/fix/presence-count
Open

Create online heartbeat & clarify user meeting state variables. #297
mikewillems wants to merge 28 commits intostagingfrom
mw/fix/presence-count

Conversation

@mikewillems
Copy link
Collaborator

@mikewillems mikewillems commented Feb 6, 2026

What is in this PR?

Before:

  • Users remained "present" indefinitely after disconnection
  • Failed breakout room joins left users in limbo
  • No visibility into connection issues

After:

  • Continuous heartbeat every 5 seconds updates presence timestamp
  • Backend cleanup function can detect stale participants (no recent heartbeat), though it doesn't yet
  • Users only counted in rooms after Agora confirms connection
  • Failed transitions automatically recover after 10 seconds
  • Clear logging for debugging connection issues

Codebase Changes / Technical Details

Heartbeat Flow:

  1. Timer fires every 5 seconds (unconditional)
  2. Calls updateMeetingPresence() with currentBreakoutRoomId: _presenceRoomId
  3. Updates mostRecentPresentTime server timestamp
  4. Backend cleanup function queries for participants with stale timestamps

Transition Flow:

  1. User assigned to breakout room -> currentBreakoutRoomId becomes non-null
  2. getBreakoutRoomFuture() sets _inTransitionToBreakoutRoomId and starts timer
  3. Heartbeat continues updating presence (user still online, not in any specific room yet)
  4. Agora connects -> ConferenceRoom.onConnected() writes room membership and clears transition
  5. If step 4 doesn't happen within 10 seconds -> automatic recovery

Caching:

  • _cachedJoinInfoRoomId prevents re-fetching join info on rebuilds
  • Separate from _inTransitionToBreakoutRoomId which tracks actual connection state
  • Both cleared when leaving room or switching to main meeting

Changes outside the codebase

None

Future Considerations

Potential enhancements to consider:

  1. Adaptive timeout: Make the 10-second timeout configurable or adaptive based on network conditions
  2. Retry logic: Instead of immediately returning to main meeting, could retry the connection once
  3. Metrics: Track transition success/failure rates and duration statistics
  4. User feedback: Add visual indicator during the transition phase (loading spinner with elapsed time)
  5. Enhanced logging: Include network quality metrics in transition logs
  6. Breakout cleanup guard: The _onLiveMeetingChange cleanup (line 500) only fires if _inTransitionToBreakoutRoomId != null. After commit 2, this is cleared on connection, so the guard may need broadening to also check isInBreakout or _breakoutRoomOverride != null to catch all edge cases when breakouts end server-side.

Additional information

Problem

Users who disconnected (closed browser, lost connection, etc.) continued to appear as present participants, causing inflated and inconsistent participant counts. Two root causes:

  1. No continuous heartbeat: The presence timer only fired when breakout room state changed, not continuously, so mostRecentPresentTime became stale for static participants
  2. Premature room membership writes: Users were counted in breakout rooms before Agora confirmed connection, causing ghost participants during failed transitions

I don't fully fix these issues in this PR, but I at least correct the existing variables so that the full fix can be done correctly and semantically.

Commits

This PR contains four commits, separated for the ability to cherry pick if we don't want some of the functionality.

Commit 1: Unconditional Heartbeat & Variable Clarity

  • Changed presence timer from conditional (only on breakout changes) to unconditional 5-second heartbeat
  • Timer now fires for all UI states except leftMeeting and enterMeetingPrescreen
  • Uses new _presenceRoomId getter to determine which room to report (main, breakout, or waiting room)
  • Renamed _activeBreakoutRoomId -> _inTransitionToBreakoutRoomId for semantic clarity
  • Fixed log message in _presenceRoomId to describe the uncommon null state properly
  • Changed isNullOrEmpty() check to strict != null (empty string is a bug, not valid state)
  • Fixed Cloud Function to write null instead of empty string for currentBreakoutRoomId

Commit 2: Deferred Room Membership + Transition Monitoring

  • Moved Firestore room membership write from getBreakoutRoomFuture() to ConferenceRoom.onConnected()
  • Users are now only counted in a room after Agora confirms connection
  • Introduced _cachedJoinInfoRoomId to separate caching concern from transition tracking
  • _inTransitionToBreakoutRoomId now cleared on successful connection (transition complete)
  • Added _transitionTimer that logs every 5 seconds if transition hasn't completed
  • Provides visibility into stalled connections for debugging

Commit 3: Automatic Recovery from Failed Transitions

  • If transition timer reaches 10 seconds without Agora confirmation, automatically:
    • Cancel the transition by calling leaveBreakoutRoom()
    • Return user to main meeting
    • Show user-facing toast message
    • Log the timeout for debugging
  • Prevents users from being stuck in "joining..." state indefinitely

Commit 4: Localization

  • Added breakoutRoomTransitionTimeout key to all language files (en, es, zh, zh_Hant_TW)
  • Replaced hardcoded timeout message with localized reference
  • Enables proper translation of error messages

Testing

  • Verified all existing tests pass
  • Manual testing recommended for:
    • Normal breakout room transitions (should complete in <2 seconds)
    • Failed transitions (disconnect network, verify 10-second recovery)
    • Ghost participant cleanup (disconnect without leaving, verify removal after heartbeat timeout)
    • Participant count accuracy during transitions

To integration test manually:

  • Create a meeting room.
  • Observe that changing between meeting rooms still works.
  • The presence counts for the various meeting rooms should be fixed as long as a user hasn't logged off yet. Removing them after disconnection happens in a subsequent ticket.

Addition 1: Improvements to Fix All Linting Errors & Observed Bugs in Files Touched

These are separate from the core PR change.
I made smaller commits and continued the numbering from the first on this branch, so these run 5 to 24.

SECTION 1: Clarification & observability

#5 b96284b added comments to clarify variable roles
Added inline comments to existing timing and debounce constants in
live_meeting_provider.dart to document what each role does. No code change.

#6 2bacbf9 added analytics to log breakout room transition times
New: fires AnalyticsBreakoutRoomTransitionEvent with elapsed seconds whenever
a breakout room transition is cleared (success or timeout). Gives visibility
into how long Agora takes to connect users to breakout rooms in production.

#8 107f92d remove dead breakout countdown state machine from BreakoutRoomsDialog
BreakoutRoomsDialogState._state was declared 'final' and initialized to
_BreakoutRoomsDialogState.start, meaning the entire countdown/timer branch
was permanently unreachable before. Removed the state enum, two timer fields
(Timer + Stopwatch), and the dead countdown UI branch. The ActionButton
widget's built-in loading state already handles feedback. No behavior change
for users; paying down some code debt by removing dead code.

#21 c547be9 converted inappropriate print() to Debug.log()
A bare print('updated live meeting participants') inside onConnected() in
conference_room.dart was emitting unconditionally to stdout in release builds.
Replaced with Debug.log() using the existing ConferenceRoom.* namespace
convention, consistent with all other diagnostic logging in the file.

SECTION 2: Magic Numbers -> Named Constants

#7 591785b timing magic nums -> constants in live_meeting_provider.dart
#9 d6481e0 timing magic nums -> constants in meeting video & participation UI
(conference_room.dart, networking_status_model.dart)
#10 779919 timing magic nums -> constants in meeting flow & agenda setup
(meeting_agenda_provider.dart, meeting_guide_card_store.dart)
#11 7cf3f9b timing magic nums -> constants in event entry & user auth
(event_page.dart, event_page_provider.dart, user_service.dart)
#12 73ccbe5 timing magic nums -> constants in live stream playback
(url_video_widget.dart)
#13 bcade12 timing magic nums -> constants in Firestore event queries
(firestore_event_service.dart)

Replaced all inline numeric literals used as timeouts, debounce delays,
throttle intervals, and query lookback windows with private named constants
placed at the top of the enclosing class or file (across 10 files). Also made
the names semantic (e.g. _dominantSpeakerSilenceHoldDuration rather than
a bare 3000)

No behavior changes in any of these commits makes future tuning auditable via git blame and
prevents silent inconsistency if the same logical delay appears more than once.

SECTION 3: Dependency bugs

These are actual bugs because pub can resolve them differently across environments and cause
runtime breakage or silent query errors when the packages are referenced in production:

#15 7c7bd51 registered collection as an explicit dependency in pubspec.yaml
The 'collection' package was imported by 17+ files but was only listed under
dependency_overrides, not under dependencies. This caused the analyzer to
emit "The imported library isn't a dependency of the importing package" for
every usage. In practice pub still resolved it via the override, but the
listing was incorrect and the linter errors masked any real unused-import
warnings in those files.

#17 08a5190 corrected enum_to_string dependency type
firestore_event_service.dart imports enum_to_string directly and uses
EnumToString.convertToString() in three query predicates. The package was
only present as a transitive dependency (not listed under dependencies),
which means a future pub resolution could silently drop or version-shift it.
Adding it explicitly pins the contract. Same class of issue as collection.

SECTION 4: Fixing improper async handling

#20 a430485 fixed use_build_context_synchronously across async gaps

BuildContext (or navigatorState.context) was used after an await without
a mounted guard in four files:

  • breakout_rooms_dialog.dart: Navigator.of(context).pop() after awaiting
    a Firestore write for startBreakouts(). If the user dismissed the dialog
    while the write was in flight, the widget would be unmounted and Flutter
    would throw "Looking up a deactivated widget's ancestor is unsafe".

  • conference_room.dart: navigatorState.context passed to showAlert() after
    awaiting Permission.camera.request() and Permission.microphone.request(),
    and to AudioVideoErrorDialog after awaiting a ConfirmDialog result.

  • event_page_provider.dart: navigatorState.context passed to alertOnError()
    after a chain of awaits including a ConfirmDialog.

  • live_meeting_provider.dart: Navigator.of(context).pop(true) called inside
    an onConfirm callback after an inner await.

    SENTRY IMPACT: These are probably responsible for some of the Sentry errors.
    app.dart wires both FlutterError.onError and runZonedGuarded to reportError()
    -> Sentry.captureException(). The Flutter framework error thrown when accessing
    a deactivated widget's context ("Looking up a deactivated widget's ancestor
    is unsafe") goes through FlutterError.onError and would be captured. The
    breakout_rooms_dialog case is the most likely to have been hit in production,
    since starting breakouts involves a visible network round-trip during which
    a user could reasonably close the dialog. The conference_room cases are lower
    risk (permission prompts are UI-blocking and fast) but still technically
    reachable.

SECTION 5: Linter / code quality fixes

No behavior changes with any of these.

#14 568fa69 removed unused imports
Removed three unused imports (styles.dart from live_meeting_provider.dart,
localization_helper.dart from meeting_agenda_provider.dart and
event_page_provider.dart). These were likely left after refactors.

#16 689de50 removed invalid @override annotations from FakeParticipant stubs
FakeParticipant in conference_room.dart had @override on three getters
(audioTracks, state, videoTracks) that do not exist in AgoraParticipant or
any of its mixins. The getters themselves are valid stub implementations; only
the annotations were wrong.

#18 57a78fe removed unused top-level duplicate of _kHoursAfterEventToShow
event_page.dart declared the constant twice: once at file scope (line 46,
unused) and once as static const inside EventPageState (line 102, the one
actually used). The file-scope copy was an unused_element linter error.

#19 bc9b619 fixed library_private_types_in_public_api in touched files
createState() in five StatefulWidget classes (breakout_rooms_dialog.dart,
url_video_widget.dart x2, meeting_guide_card.dart x2) was annotated with
the private concrete State type as the return type. Changed to
State<WidgetClass> in all cases. This is a purely static annotation fix;
no runtime impact (Flutter's framework only calls createState() and receives
the result as State<T>).

#22 f69fc74 added trailing commas where missing in touched files
Applied require_trailing_commas lint rule (enabled in analysis_options.yaml)
across all files touched by this PR that were missing them. Ensures
consistent auto-formatting and avoids future lint noise.

#23 982ef92 replaced deprecated AppAsset() constructor usage
One call site in event_page.dart was using the positional AppAsset(path)
constructor, which is marked @Deprecated('Use dedicated constructor').
Added AppAsset.backgroundGif() named constructor to app_asset.dart following
the existing pattern, and updated the call site to use it.

#24 982ef92 replaced the rest of the deprecated constructor usage in the (now touched)
app_asset.dart file.

…ean online indicator. Fix log messages and null checks for breakout room transition variable. Replaced the generic warning in _presenceRoomId with a descriptive message explaining the uncommon null state. Change isNullOrEmpty check to a strict null check.
…TransitionToBreakoutRoomId on successful Agora connection. Separate caching logic into _cachedJoinInfoRoomId. Add a timer that logs a warning every 5 seconds if a breakout room transition has not completed, indicating a possible connection issue.
…oes not confirm a breakout room connection within 10 seconds, cancel the transition, return the user to the main meeting, and display an error message.
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Visit the preview URL for this PR (updated for commit 9437429):

https://gen-hls-bkc-7627--pr297-mw-fix-presence-coun-spk5tuvf.web.app

(expires Mon, 16 Mar 2026 23:14:50 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: eed668cca81618d491d024574a8f8a6003deaa8d

firebase-tools now requires Java 21+, but the workflow relied on the
runner's default JDK (Java 17). Add actions/setup-java@v4 with Temurin
21 before the emulator steps to fix the version mismatch.
@mikewillems mikewillems force-pushed the mw/fix/presence-count branch from cf25e26 to 2e14112 Compare February 13, 2026 18:18
@mikewillems mikewillems marked this pull request as ready for review February 13, 2026 18:33
@katherineqian katherineqian removed the request for review from epenn February 25, 2026 15:52
Copy link
Collaborator

@katherineqian katherineqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate your thorough commit messages, organized commits, and the detailed PR description - made it so much easier to review!

I was testing the staging preview for manual testing and am getting an error when trying to enter randomly assigned breakout rooms - "There was an error loading event details", as shown below. When trying to enter the same event on staging, my user is able to join the waiting room as expected. Steps to reproduce:

  • Create a hostless meeting.
  • Join the meeting with 2 separate users.
  • As admin, open the Participants panel and select "Breakouts" > "Randomly Assign".
  • Observe the error below for both users.
Image

Mainly merging the Java version commit (already made here).
…nstead of direct state var reference for clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants