fix(audit batch 3): 3 P1 from post-merge repository audit (SIG-A1, SOCK-01, TST-07)#534
Conversation
8-agent re-audit of develop @ 1e8cd36 (after #529 #532 #533 landed) surfaced these 3 critical bugs. Each verified against the actual code and spec before fixing. ## P1 — SIG-A1 — `extractIdentityFromPkmsg` reads wrong protobuf field libsignal.ts:222 was checking `fieldNumber === 5` for the identity key in a PreKeySignalMessage. Per the libsignal spec (WhisperTextProtocol.proto, `PreKeySignalMessage`): field 1: preKeyId (uint32, varint, wireType 0) field 2: baseKey (bytes 33, wireType 2) field 3: identityKey (bytes 33, wireType 2) ← what we wanted field 4: message (bytes, wireType 2) field 5: registrationId (uint32, varint, wireType 0) field 6: signedPreKeyId (uint32, varint, wireType 0) Field 5 is registrationId — a varint, not a 33-byte key. Combined with the wire-type filter (`wireType === 2`) the function NEVER matched anything and ALWAYS returned `undefined`. Direct consequence: identity-change detection in `decryptMessage`, the `identity.changed` event, and the reinstall-triggered session cleanup were all dead code in production. Anyone reinstalling WhatsApp could decrypt with the new identity key without us noticing. Fix: change to `fieldNumber === 3` and rewrite the docblock with the exact field assignments from the spec so this doesn't regress. ## P1 — SOCK-01 — `query()` returned `undefined` on timeout socket.ts: `waitForMessage` swallows the Boom timedOut and returns `undefined`. `query()` re-exported that contract to its callers, who mostly couldn't distinguish a timeout from a legitimate empty response. Concrete damage already in the codebase: - `uploadPreKeys` logged "uploaded pre-keys successfully" and reset `lastUploadTime` on timeout — silently exhausting the server-side pre-key pool; - `appPatch` bumped `app-state-sync-version` on timeout — drifting LTHash from the server's; - `fetchPrivacySettings` destructured `undefined` → opaque TypeError far from the source; - `assertSessions` set `didFetchNewSession = true` without injecting a single session. `query()` now converts the `undefined` sentinel into a typed Boom (`statusCode: DisconnectReason.timedOut`, `data: {msgId, timeoutMs}`). Callers that catch this can retry; callers that don't surface a clear failure instead of a downstream mystery. `waitForMessage` keeps its current shape — only `query()`'s contract tightens, which is the entrypoint the audit-flagged consumers use. ## P1 — TST-07 — H4 `it.failing` tests are dead trackers `signal.delete-migrate-race.test.ts:62,94` were `it.failing` suites authored before the H4 fix shipped (Stage 2, upstream WhiskeySockets#2572 — now at `libsignal.ts:777` and friends). They exercise a local `addTransactionCapability` mock with synthetic addresses, NOT the real `transactWith({ records: [{type:'session', id}] })` path the production fix uses. Like the H8/H9 placeholders we already demoted, they keep "failing as expected" no matter how good the production code gets, signalling a phantom open issue forever. Demoted to `it.skip` with a comment explaining the gap. The H4 fix remains live in production. ## SC-01 — yarn.lock checksums for 45 platform binaries The audit flagged 45 entries (`@esbuild/*` + `@unrs/resolver-binding-*`) without `checksum:` lines. Verified: `awk` count returns the missing entries. These are optional native deps for OS/arch combinations that yarn can't fetch on this Windows machine — so a local `yarn install` won't populate them. The fix is to run `yarn install` on a Linux runner (or via CI) so the lockfile gets the missing checksums and `--immutable` installs verify integrity properly. Not addressed in this PR; tracked as a CI-side follow-up. ## Validation - `tsc --noEmit -p tsconfig.json` — exit 0 - `eslint` on all touched files — exit 0 - Touched test suites pass; the lone failing suite locally is `signal-typed-backend` (better-sqlite3 native bindings on Windows toolchain, pre-existing) ## Customizations untouched Carousel / list / button / poll / view-once / biz quality_control paths intact. SIG-A1 only changes the field number that was already wrong; SOCK-01 hardens an existing contract without expanding it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Thanks for opening this pull request and contributing to the project! The next step is for the maintainers to review your changes. If everything looks good, it will be approved and merged into the main branch. In the meantime, anyone in the community is encouraged to test this pull request and provide feedback. ✅ How to confirm it worksIf you’ve tested this PR, please comment below with: This helps us speed up the review and merge process. 📦 To test this PR locally:If you encounter any issues or have feedback, feel free to comment as well. |
There was a problem hiding this comment.
Pull request overview
Addresses three P1 findings from the post-merge audit by restoring Signal identity-key extraction, making socket query() timeouts observable as typed errors, and demoting two non-actionable placeholder tests.
Changes:
- Fix Signal PreKeySignalMessage parsing by extracting
identityKeyfrom protobuf field 3 (not 5), restoring identity-change detection paths. - Tighten
query()timeout behavior by throwing a typedBoom(instead of returningundefined) whenwaitForMessagetimes out. - Convert two
it.failingtests toit.skipwith an explanatory note since they don’t exercise the production locking path.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Socket/socket.ts | Convert query() timeout sentinel (undefined) into a typed Boom timeout error. |
| src/Signal/libsignal.ts | Correct identity-key extraction to read protobuf field 3 and update the spec docblock. |
| src/tests/Utils/signal.delete-migrate-race.test.ts | Demote two placeholder H4 tests from it.failing to it.skip with rationale. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Three P1 bugs surfaced by the 8-agent re-audit of `develop @ 1e8cd36` (after #529 #532 #533 landed). Each verified against the actual code + spec before fixing.
SIG-A1 — `extractIdentityFromPkmsg` reads the wrong protobuf field
libsignal.ts:222 was checking `fieldNumber === 5` for the identity key. Per libsignal spec (WhisperTextProtocol.proto):
```
field 1: preKeyId (uint32, varint, wireType 0)
field 2: baseKey (bytes 33, wireType 2)
field 3: identityKey (bytes 33, wireType 2) ← what we wanted
field 4: message (bytes, wireType 2)
field 5: registrationId (uint32, varint, wireType 0)
field 6: signedPreKeyId (uint32, varint, wireType 0)
```
Field 5 is registrationId — a varint, not a 33-byte key. Combined with the wire-type filter (`wireType === 2`) the function NEVER matched anything and ALWAYS returned `undefined`. Direct consequence: identity-change detection in `decryptMessage`, the `identity.changed` event, and reinstall-triggered session cleanup were all dead code in production.
Fix: `fieldNumber === 3` + rewrite the docblock with exact spec field assignments.
SOCK-01 — `query()` returns `undefined` on timeout
`waitForMessage` swallows the Boom timedOut and returns `undefined`. `query()` re-exported that contract, leaving callers unable to distinguish timeout from a legit empty response. Concrete bugs already in the codebase:
`query()` now converts the `undefined` sentinel into a typed Boom (`statusCode: DisconnectReason.timedOut`, `data: {msgId, timeoutMs}`). `waitForMessage` keeps its current shape — only `query()`'s contract tightens, which is the entrypoint the audit consumers use.
TST-07 — H4 `it.failing` tests are dead trackers
`signal.delete-migrate-race.test.ts:62,94` exercise a local `addTransactionCapability` mock — NOT the real `transactWith({ records })` path the H4 fix uses (libsignal.ts:777). Same pattern as H8/H9 placeholders we already demoted. Demoted to `it.skip` with comment; the H4 fix remains live.
SC-01 — yarn.lock checksums (NOT addressed in this PR)
Audit flagged 45 entries (`@esbuild/` + `@unrs/resolver-binding-`) without `checksum:` lines. These are optional native deps for OS/arch combinations yarn can't fetch on this Windows machine. Fix is to run `yarn install` on a Linux runner / CI so the lockfile gets the missing checksums and `--immutable` installs verify integrity properly. Tracked as CI-side follow-up.
Test plan
Customizations untouched
Carousel / list / button / poll / view-once / biz quality_control paths intact. SIG-A1 only changes a field number that was wrong; SOCK-01 tightens an existing contract without expanding it.
🤖 Generated with Claude Code
Summary by cubic
Fixes three P1 issues from the audit: restores identity-change detection by parsing the correct Signal field, surfaces query timeouts as typed errors, and skips two placeholder tests that don’t reflect production behavior.
Bug Fixes
extractIdentityFromPkmsgnow readsidentityKeyfrom field 3 (was 5). Restores identity-change detection and related cleanup paths.query()now throws a Boom timeout (statusCode:DisconnectReason.timedOut, data:{ msgId, timeoutMs }``) instead of returningundefined.it.failingtests toit.skipwith a note; they target a mock path, not the realtransactWith({ records }).Migration
query()returningundefinedfor timeouts, catch the Boom error (DisconnectReason.timedOut) instead.Written for commit 15e97b0. Summary will update on new commits.