Skip to content

fix(phone-quickdrop): reliable WebSocket pipeline (heartbeat, resume, race fix, QR)#6

Open
codyk2 wants to merge 2 commits into
mainfrom
claude/upbeat-davinci-d923e2
Open

fix(phone-quickdrop): reliable WebSocket pipeline (heartbeat, resume, race fix, QR)#6
codyk2 wants to merge 2 commits into
mainfrom
claude/upbeat-davinci-d923e2

Conversation

@codyk2
Copy link
Copy Markdown
Collaborator

@codyk2 codyk2 commented Apr 19, 2026

Summary

Makes phone-quickdrop actually reliable end-to-end. Verified on a real iPhone over USB tether — two MP4s landed cleanly (ISO Media, MP4 Base Media v5) with no data loss.

Three real reliability bugs + one blocking UI issue fixed:

  • onstop racerecorder.onstop sent end to the server before the final chunk's await ev.data.arrayBuffer() resolved, so the last chunk arrived orphaned and was dropped as binary_without_session. Fixed by tracking in-flight promises and draining them before end.
  • No reconnect recovery — if the WebSocket died mid-recording, the phone's buffered chunks got flushed onto a fresh connection where role === null, and the server rejected them. Fixed with orphaned-session support: server keeps the file open 20s; phone re-sends start with {resume: true, sessionId} and keeps streaming into the same file.
  • No heartbeat — NAT-dropped WebSockets sat open as zombies. Added 20s ping/pong, terminate on missed pong.
  • Invisible QR — server rendered white modules on transparent, CSS then turned all white fills transparent → solid white card, unscannable. Fixed by switching to standard black-on-white and removing the CSS override.

Changes

File What
phone-quickdrop/server.js Heartbeat, orphan-session resume, QR color
phone-quickdrop/public/phone.html onstop drain, resume-on-reconnect, stash sessionMime/Ext
phone-quickdrop/public/desktop.html Drop CSS hack that blanked QR
phone-quickdrop/scripts/smoke-reconnect.js New regression test
phone-quickdrop/package.json npm run smoke:reconnect script

Test plan

  • npm run smoke — original round-trip still passes (1 MiB in ~190 ms)
  • npm run smoke:reconnect — new test: send 4 chunks → ws.terminate() → reconnect with resume:true → send 4 more → verify file equals byte-for-byte concatenation
  • Real iPhone over USB tether: recorded 4.6s (2.45 MB, 5 chunks) and 11.6s (7.56 MB, 11 chunks); both files are valid MP4
  • Server logs show orphaned → resumed → saved on simulated disconnect
  • Reviewer: try a real Wi-Fi dropout during a 30s+ recording to validate heartbeat + resume under natural conditions

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Recording sessions can now automatically resume if the connection is lost and restored within a time window.
  • Bug Fixes

    • Improved connection stability with liveness detection.
    • Updated QR code visual rendering for better clarity.

Three real reliability bugs plus a blocking UI issue that made the QR
unscannable. Verified end-to-end on a real iPhone over USB tether.

Server (server.js):
- WS heartbeat: ping every 20s, terminate sockets that miss a pong.
  Previously idle/NAT-dropped connections sat open forever.
- Resume across disconnect: on phone WS close mid-recording, keep the
  file open for 20s; a new WS with {type:"start", resume:true, sessionId}
  re-attaches and keeps appending to the same file. Prevents silent data
  loss when Wi-Fi blips.
- QR now renders black-on-white instead of white-on-transparent — the
  old combo collided with the CSS override and produced an invisible
  QR (solid white card).

Phone (public/phone.html):
- onstop race fix: track in-flight ev.data.arrayBuffer() promises and
  drain them before sending "end". Previously the final chunk could
  arrive after the server closed the session and be dropped as
  binary_without_session.
- On WS reconnect mid-recording, re-send start with resume:true +
  the original sessionId before flushing queued chunks, so buffered
  binary frames land in the original file.
- Stash sessionMime/sessionExt so the resume start carries them.

Dashboard (public/desktop.html):
- Remove the .qr-wrap svg[fill="#ffffff"] { fill: transparent } hack
  that was blanking out the QR modules.

Tests:
- scripts/smoke-reconnect.js: opens WS, sends half the chunks,
  terminate()s the socket, reconnects with resume:true, sends the
  rest, verifies on-disk bytes byte-for-byte match the concatenation.
  Runs via `npm run smoke:reconnect`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

This PR implements session resumption on reconnection. Changes include server-side orphaned session tracking with heartbeat monitoring, client-side session state persistence with a pending chunk queue, updated QR code styling, and a new reconnection-focused smoke test script.

Changes

Cohort / File(s) Summary
Testing Infrastructure
package.json, scripts/smoke-reconnect.js
Added new smoke:reconnect npm script and comprehensive end-to-end test script that validates the full reconnection/resume flow: initial connection, partial chunk transmission, simulated disconnect, reconnection with resume flag, remaining chunk transmission, and on-disk file verification.
Server Session Management
server.js
Implemented orphaned session tracking with configurable timeouts and heartbeat monitoring; added resume logic to re-attach reconnecting clients to existing RecordingSession instances; modified close handling to defer finalization and allow resume windows; added WebSocket liveness heartbeat with ping/pong to detect stale connections.
Client Recording State
public/phone.html
Added sessionMime and sessionExt state persistence; introduced pendingSends queue to track in-flight chunk conversions; made recorder.onstop async with drain step to ensure all queued chunks are sent before finalizing; extended WebSocket reconnection handler to send start message with resume:true when reconnecting mid-recording.
UI Styling
public/desktop.html
Removed SVG-specific styling rules that forced transparent fill on QR code elements, allowing the QR to rely on default generated attributes and background container styling.

Sequence Diagram(s)

sequenceDiagram
    actor Phone as Phone Client
    participant Server
    
    Note over Phone,Server: Initial Connection & Transfer Phase 1
    Phone->>Server: WebSocket connect
    Phone->>Server: send start {sessionId, mime, ext}
    Server->>Phone: started {resumed: false}
    Phone->>Server: binary chunk 1-N (half)
    Phone->>Phone: Simulate disconnect (ws.terminate)
    
    Note over Phone,Server: Reconnection & Resume
    Phone->>Server: WebSocket reconnect
    Phone->>Server: send start {sessionId, resume: true, mime, ext}
    Server->>Phone: started {resumed: true, bytes: <sent amount>}
    
    Note over Phone,Server: Transfer Phase 2 & Finalization
    Phone->>Server: binary chunk N+1-M (remaining)
    Phone->>Server: send end {totalBytes}
    Server->>Phone: saved {success: true}
    Phone->>Server: close
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 wiggles nose
When sockets slip and signals drop,
We bounce right back and never stop!
With heartbeats, orphans, chunks to send—
A smoky reconnect to the end! 🌫️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main changes: it identifies the core fix (reliable WebSocket pipeline) and lists the key technical improvements (heartbeat, resume, race fix, QR rendering).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/upbeat-davinci-d923e2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@phone-quickdrop/public/phone.html`:
- Around line 511-524: In recorder.onstop (the stop handler shown) don’t clear
session state (sessionId, sessionMime, sessionExt) immediately after sending the
"end" message; instead keep these variables intact and only null them when the
server sends a final acknowledgement type ("saved" or "aborted"). Update the ws
message handling/dispatch logic (the code that processes incoming socket
messages) to detect those ack messages and clear sessionId, sessionMime, and
sessionExt there; ensure the reconnect/reattach flow uses the preserved session*
values to reattach and flush pendingChunks if the socket was offline when "end"
was queued.

In `@phone-quickdrop/scripts/smoke-reconnect.js`:
- Around line 112-126: The fixed 200ms sleep before sending the resume "start"
makes the reconnect smoke test flaky; instead modify the reconnect phase in
smoke-reconnect.js (where connect(), collectMessages(ws) and ws.send(...) are
used) to poll/retry the resume handshake: after connecting, repeatedly send the
start resume message and await responses from collectMessages until you observe
a started message with started.resumed === true or a real timeout is reached,
then proceed or fail the test; ensure the retry loop has a reasonable backoff
and a clear timeout to avoid infinite waits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 73899127-157f-4c7e-9dec-4f55b931fb36

📥 Commits

Reviewing files that changed from the base of the PR and between 3a7e1bd and 3c3653c.

📒 Files selected for processing (5)
  • phone-quickdrop/package.json
  • phone-quickdrop/public/desktop.html
  • phone-quickdrop/public/phone.html
  • phone-quickdrop/scripts/smoke-reconnect.js
  • phone-quickdrop/server.js
💤 Files with no reviewable changes (1)
  • phone-quickdrop/public/desktop.html

Comment on lines +339 to 355
// If we died mid-recording, ask the server to resume this sessionId
// before we flush queued chunks. WebSocket preserves order within a
// single connection, so chunks sent next will land after the resume.
if (recording && sessionId) {
try {
ws.send(JSON.stringify({
type: "start",
sessionId,
mime: sessionMime || "application/octet-stream",
ext: sessionExt || "bin",
resume: true,
ts: startedAt,
}));
} catch { /* will close and reconnect */ }
}
for (const buf of pendingChunks) ws.send(buf);
pendingChunks = [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Track an unfinished upload, not recording, before sending resume.

recording && sessionId is the wrong gate here. On a cold connect it becomes true before the original queued "start" has reached the server, so this emits a second start and the queued one aborts/recreates the session in phone-quickdrop/server.js, Lines 401-413. After an offline stop it becomes false even though queued chunks and "end" still need the orphaned session to be reattached. Please drive resume off “session started but not yet saved/aborted” state instead.

Comment on lines +511 to +524
recorder.onstop = async () => {
// Drain any in-flight chunk conversions so the last chunk beats "end".
if (pendingSends.size) {
try { await Promise.allSettled([...pendingSends]); } catch {}
}
wsSend(JSON.stringify({
type: "end",
sessionId,
totalBytes: bytesSent,
chunks: chunkCount,
}));
sessionId = null;
sessionMime = null;
sessionExt = null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't drop the session metadata until the server confirms save/abort.

This clears sessionId, sessionMime, and sessionExt immediately after queueing "end". If the socket is still offline, the reconnect path no longer has enough state to reattach the orphaned session before flushing pendingChunks. Please keep this state until a "saved" or "aborted" ack arrives.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@phone-quickdrop/public/phone.html` around lines 511 - 524, In recorder.onstop
(the stop handler shown) don’t clear session state (sessionId, sessionMime,
sessionExt) immediately after sending the "end" message; instead keep these
variables intact and only null them when the server sends a final
acknowledgement type ("saved" or "aborted"). Update the ws message
handling/dispatch logic (the code that processes incoming socket messages) to
detect those ack messages and clear sessionId, sessionMime, and sessionExt
there; ensure the reconnect/reattach flow uses the preserved session* values to
reattach and flush pendingChunks if the socket was offline when "end" was
queued.

Comment on lines +112 to +126
// Wait briefly for server to register the close and orphan the session.
await sleep(200);

// ---- phase 2: reconnect with resume ----
ws = await connect();
msgs = collectMessages(ws);

ws.send(JSON.stringify({
type: "start",
sessionId,
mime: "video/mp4",
ext: "mp4",
resume: true,
ts: Date.now(),
}));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The reconnect sleep makes this smoke test flaky.

A fixed 200ms delay does not guarantee the server has already run its close handler and inserted the orphaned session. On a slower machine the resume "start" can arrive first and fail even though the feature works. Retrying the resume handshake until started.resumed === true or a real timeout would make this regression test deterministic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@phone-quickdrop/scripts/smoke-reconnect.js` around lines 112 - 126, The fixed
200ms sleep before sending the resume "start" makes the reconnect smoke test
flaky; instead modify the reconnect phase in smoke-reconnect.js (where
connect(), collectMessages(ws) and ws.send(...) are used) to poll/retry the
resume handshake: after connecting, repeatedly send the start resume message and
await responses from collectMessages until you observe a started message with
started.resumed === true or a real timeout is reached, then proceed or fail the
test; ensure the retry loop has a reasonable backoff and a clear timeout to
avoid infinite waits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant