feat: 2-of-3 threshold ECDSA signing (CGGMP21) by Slymeofthemonth · Pull Request #14 · daydreamsai/agent-wallet

Slymeofthemonth · 2026-02-16T01:29:04Z

Adds threshold ECDSA signing to SAW using the CGGMP21 protocol. The full private key never exists on any single machine.

What's included

saw-mpc: Core MPC layer wrapping dfns/cggmp21 — keygen, signing, presignature pool, key share encryption (Argon2id + ChaCha20-Poly1305)
saw-daemon: Threshold signing path alongside existing single-key mode, persistent WS to policy server, background presignature refill
saw-policy: Policy co-signer server with rule engine (chain/recipient/value allowlists), deployable to Railway or Docker
saw-cli: keygen-local (generate all 3 shares locally) and keygen-threshold (distributed relay-based ceremony)
Docker recovery image: Party 2 container for emergency fund recovery
Docs: Spec, threat model, design rationale

Architecture

saw-daemon (party 0) ◄──WSS──► saw-policy (party 1)    saw-policy (party 2, recovery)

Normal ops: party 0 + party 1 auto-sign per policy rules. Any 2-of-3 can sign for recovery.

Tested

End-to-end on Base Sepolia: keygen → fund → threshold-sign → broadcast → confirmed
Policy server running on Railway, daemon on separate machine

Not included (future work)

mTLS / WS authentication
Solana support (needs FROST for Ed25519)
Key refresh (awaiting cggmp21 threshold refresh support)
TEE deployment mode

single-key mode unchanged.

Summary by CodeRabbit

New Features
- 2-of-3 threshold signing with presignature fast-path and full MPC fallback
- CLI: local and threshold key-generation workflows and relay/party modes
- Policy engine and standalone policy agent for evaluating/auto-approving signing requests
- Recovery cosigner scaffold for human-in-the-loop operations
- Encryption-at-rest for key shares (Argon2id + ChaCha20-Poly1305)
Documentation
- Threshold signing spec and threat model & design rationale
Chores
- Docker images and deployment configs for policy agent and recovery cosigner

- saw-mpc: shared MPC core wrapping cggmp21 (keygen, presigning, signing, transport) - saw-policy: policy agent binary (Share 2) with rule evaluation engine - saw-cosigner: human cosigner binary (Share 3) for recovery + override - All scaffolding with TODO markers at cggmp21 integration points

- Wired cggmp21 keygen, aux_info_gen, and signing into saw-mpc - Built in-memory transport layer (Delivery impl for testing) - Integration test: full 2-of-3 keygen → signing → verification - Presignature pool structure ready for background generation - All parties derive same ETH address from shared public key

Mode 1: Single-key (SAW as-is, no threshold) Mode 2: Self-hosted threshold (separate machine) Mode 3: TEE-hosted policy via Phala (future)

- WsConnection<S> generic over stream type (client + server) - ws_connect() for client, ws_accept() for server - into_delivery() bridges WS into cggmp21's Delivery trait - WireMessage routing: MPC messages go through delivery, others filtered - Integration test: full keygen → WS signing → verification passes - Clean up unused import in keygen.rs

saw-daemon: - ThresholdClient: connects to policy via WS, sends sign requests, participates in MPC signing if policy approves - ThresholdError with PolicyUnavailable/PolicyDenied/Escalated/Mpc variants - 5s timeout on policy decision (fail-fast as specced) saw-policy: - WebSocket server accepting persistent connections from saw-daemon - Receives SignRequest → evaluates policy rules → responds with decision - If approved, participates in MPC signing as party 1 - Sequential request processing per connection Integration test (daemon_policy_flow): - Full end-to-end: keygen → WS connect → sign request → policy approve → MPC signing → ECDSA verification — all passing

- config.rs: DaemonConfig with per-wallet SigningMode (single-key or threshold) loaded from config.yaml; defaults to single-key if no config file - Server now holds config, tokio runtime, and ThresholdClient instances - handle_sign_evm_tx branches on threshold mode: - Single-key: unchanged (local policy + local sign) - Threshold: compute sighash → ThresholdClient::sign() → reconstruct signed tx - Extract compute_evm_tx_sighash() and build_signed_evm_tx() helpers - Recovery id (v) computed by trying both parities via secp256k1 recovery - Existing tests still pass (backward compatible)

- handle_sign_eip2612_permit now branches on threshold mode - Derives owner address from key share's shared public key - Verifies owner matches payload if provided - Computes EIP-712 digest via extracted compute_permit_digest() - Proper recovery id: compares recovered pubkey against known key share pubkey - get_address returns address + public_key for threshold wallets (with mode: threshold) - Fixed EVM tx threshold path to also compare against known pubkey for v recovery - ThresholdClient::public_key() exposes the shared public key - All existing tests pass

New components: - saw-mpc/relay.rs: WebSocket relay server for multi-party MPC - run_relay(): accepts N parties, routes MPC messages (broadcast + P2P) - connect_to_relay(): connects as a party, returns Delivery-compatible channels - RelayEnvelope protocol: Join/Joined/AllJoined/Mpc/PhaseComplete - Phase-aware routing (aux, keygen, sign can use same relay) - saw-cli/keygen_threshold.rs: CLI for threshold keygen ceremony - run_relay(): start relay server on specified address - run_party(): connect to relay, run full ceremony (primes → aux → keygen → save) - Saves key share (0600) + metadata JSON to keys/threshold/ - saw keygen-threshold CLI command: --relay --listen <addr> Start relay --party <0|1|2> --wallet <name> --connect <url> Join as party - Integration test (keygen_ceremony.rs): relay-routed signing test (keygen via in-memory + signing via relay, verifies relay routing works) Usage: Machine A: saw keygen-threshold --relay --listen 0.0.0.0:9444 Machine B: saw keygen-threshold --party 0 --wallet main --connect ws://A:9444 Machine C: saw keygen-threshold --party 1 --wallet main --connect ws://A:9444 Machine D: saw keygen-threshold --party 2 --wallet main --connect ws://A:9444

- Add saw-mpc/encryption module: encrypt/decrypt with SAW1 binary format - Argon2id KDF (memory-hard) derives 256-bit key from passphrase + salt - ChaCha20-Poly1305 AEAD for authenticated encryption - Backward compatible: auto-detects plaintext vs encrypted key shares - SAW_PASSPHRASE env var controls encryption on keygen + decryption on load - saw-cli keygen-threshold encrypts if SAW_PASSPHRASE is set, warns if not - saw-daemon decrypts on startup, errors clearly if encrypted but no passphrase - 5 unit tests covering round-trip, wrong passphrase, passthrough, etc.

Presignature pool enables sub-50ms signing latency by splitting MPC into: - Offline phase (background): daemon + policy generate presignatures via MPC - Online phase (fast): local partial sig + exchange, no MPC rounds needed Changes: - saw-mpc/protocol: add PresignRequest, PresignReady, PartialSignRequest, PartialSignResponse wire messages - saw-mpc/signing: PresignaturePool now indexed by presig_index (BTreeMap) with reserve_index(), take(index), take_next() for coordinated consumption - saw-daemon/threshold: background refill loop (10s interval, target 5, refill at 2), sign() tries fast path first then falls back to sign_full - saw-policy/server: handles PresignRequest (MPC presign generation) and PartialSignRequest (policy eval + local partial sig + response) - saw-daemon/lib: starts presign refill tasks on Server::new() Signing paths: 1. Fast path: take presig from pool → issue_partial_signature locally → send PartialSignRequest to policy → receive policy's partial → combine_partial_signatures → done (no MPC rounds) 2. Slow path: full inline MPC signing (existing behavior, fallback)

Add PersistentWs struct for the daemon's fast-path signing: - Lazy connect on first use, auto-reconnect on failure - Exponential backoff on consecutive connection failures (1s → 30s max) - invalidate() marks connection dead so next call reconnects - send()/recv()/recv_timeout() handle connection lifecycle Signing strategy: - Fast path (presignature): uses persistent WS → avoids handshake overhead on every sign request, enables sub-50ms latency - Fast path failure: gracefully falls back to slow path instead of failing - Slow path (full MPC): separate per-request connection (not latency-critical) - Presign generation (background): separate connections (not latency-critical)

Add per-package opt-level=3 for heavy crypto dependencies in dev profile: - rug, gmp-mpfr-sys (big integer arithmetic) - paillier-zk (Paillier prime generation — the main bottleneck) - cggmp21 (MPC protocol) - generic-ec (elliptic curve ops) Result: keygen_2of3_and_sign integration test completes in ~195s (was 39+ min in pure debug mode, killed without completing). Paillier prime generation: ~60-80s each (was 39+ min for first prime). This avoids needing --release for MPC tests while keeping fast compile times for non-crypto code.

New CLI command that generates all 3 key shares in one process using in-memory MPC transport. Much simpler than the distributed ceremony for cases where you generate locally and distribute shares after. Usage: SAW_PASSPHRASE=secret saw keygen-local --wallet my-wallet Outputs 3 encrypted key share files + metadata: {wallet}_party0.json → saw-daemon {wallet}_party1.json → saw-policy (Railway) {wallet}_party2.json → cosigner (recovery)

- Update saw-policy main.rs: support KEY_SHARE_BASE64 and POLICY_YAML env vars for containerized deployments (no files needed) - Support PORT env var (Railway-style) for listen address - Add base64 dep for key share decoding - Add Dockerfile (multi-stage: rust:1.85 builder + debian-slim runtime) - Add railway.toml pointing to the Dockerfile - Deployed to: saw-policy-production.up.railway.app

start_presign_refill() called tokio::spawn synchronously from Server::new(), which runs outside the tokio runtime. Moved the refill loop to a standalone async function and spawn it from rt.spawn() so tokio::spawn has proper runtime context. Also added pool()/key_share_clone()/policy_url() accessors on ThresholdClient for external refill loop spawning.

EVM transaction sighash is already Keccak256'd. The MPC signing was additionally SHA256-hashing it via DataToSign::from_digest, causing the recovered signer address to differ from the threshold wallet address. Changed to DataToSign::from_scalar with raw bytes so ecrecover works correctly for EVM transactions.

- Replace verbose draft spec with concise documentation of what's built - Update recovery README with correct port and macOS/Linux base64 syntax - Add critical implementation note about DataToSign::from_scalar - Document all components, config, deployment, and recovery procedures

Add patterns for key shares (.json.enc), private keys (.key, .pem), env files (.env), and SAW data directories (.saw/).

coderabbitai · 2026-02-16T01:29:19Z

📝 Walkthrough

Walkthrough

Adds full 2-of-3 threshold MPC support: a new saw-mpc library with transports, keygen, presignature pool, and signing; a saw-policy agent (policy evaluation + WS server); a saw-cosigner scaffold; CLI keygen (local and threshold); daemon integration for threshold signing; Dockerfiles, tests, and docs.

Changes

Cohort / File(s)	Summary
Workspace & Build `\.gitignore`, `Cargo.toml`, `railway.toml`	Updated .gitignore for keys/envs; expanded workspace members to include new crates; added profile optimizations and railway dockerfilePath.
Core MPC crate (saw-mpc) `crates/saw-mpc/Cargo.toml`, `crates/saw-mpc/src/lib.rs`, `.../error.rs`, `.../encryption.rs`, `.../keygen.rs`, `.../protocol.rs`, `.../relay.rs`, `.../signing.rs`, `.../transport.rs`, `.../types.rs`	New comprehensive MPC library: MpcError, Argon2id+ChaCha20-Poly1305 key share encryption, keygen (pregenerate, aux, generate, complete), protocol messages, relay server/client, WS/in-memory transports, presignature pool, signing primitives, and public re-exports.
saw-mpc Tests `crates/saw-mpc/tests/...`	New integration tests covering keygen ceremonies, presignature flows, WS transport, and daemon↔policy signing flows.
Policy agent (saw-policy) `crates/saw-policy/Cargo.toml`, `crates/saw-policy/src/main.rs`, `crates/saw-policy/src/policy.rs`, `crates/saw-policy/src/server.rs`, `crates/saw-policy/Dockerfile`	New policy crate: config loading, rule/circuit-breaker engine, WS server handling presignature & partial/full signing, key share loading (with decryption), and Docker build.
Daemon integration (saw-daemon) `crates/saw-daemon/Cargo.toml`, `crates/saw-daemon/src/config.rs`, `crates/saw-daemon/src/lib.rs`, `crates/saw-daemon/src/threshold.rs`	Adds DaemonConfig, threshold client with fast-path presignature & slow-path MPC, presign refill loop, threshold-aware routing and signing helpers (EVM sighash/build signed tx), and server initialization changes.
CLI & Keygen `crates/saw-cli/Cargo.toml`, `crates/saw-cli/src/keygen_local.rs`, `crates/saw-cli/src/keygen_threshold.rs`, `crates/saw-cli/src/lib.rs`	Adds local in-process keygen and relay-based threshold keygen commands, wiring into CLI, plus new dependencies (cggmp21, tokio-tungstenite, saw-mpc).
Cosigner scaffold `crates/saw-cosigner/Cargo.toml`, `crates/saw-cosigner/src/main.rs`	New saw-cosigner binary stub with CLI parsing, logging, and TODOs for join/sign flows.
Docker & recovery `Dockerfile.policy`, `docker/recovery/Dockerfile`, `docker/recovery/.gitignore`, `docker/recovery/README.md`	Multi-stage Dockerfiles for policy and recovery co-signer, recovery README, and recovery .gitignore entry.
Docs `docs/threshold-signing-spec.md`, `docs/threat-model-and-design-rationale.md`	New documentation covering threshold signing spec, threat model, deployment, and recovery guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Daemon as Daemon<br/>(Party 0)
    participant Relay
    participant Policy as Policy<br/>(Party 1)
    participant Cosigner as Cosigner<br/>(Party 2)

    Note over Daemon,Cosigner: Threshold Signing Flow (Presignature Fast Path)

    Client->>Daemon: Sign Request
    Daemon->>Daemon: Compute EVM sighash
    Daemon->>Policy: PartialSignRequest<br/>(presign available)
    
    rect rgba(173,216,230,0.5)
        Note over Policy: Fast Path - Presignature
        Policy->>Policy: Evaluate Policy Rules
        alt Policy Approved
            Policy->>Policy: Issue Partial Signature<br/>(from presignature pool)
            Policy->>Daemon: PartialSignResponse<br/>(with partial_signature)
        else Policy Denied/Escalated
            Policy->>Daemon: PartialSignResponse<br/>(decision without signature)
        end
    end
    
    alt Presignature Available & Approved
        Daemon->>Daemon: Combine Partial Signatures<br/>(local + policy)
        Daemon->>Client: Signed Transaction
    else Presignature Unavailable or Denied
        rect rgba(255,182,193,0.5)
            Note over Daemon,Cosigner: Slow Path - Full MPC Signing
            Daemon->>Relay: Connect (Party 0)
            Policy->>Relay: Connect (Party 1)
            Cosigner->>Relay: Connect (Party 2)
            Relay->>Relay: AllJoined signal
            
            Daemon->>Relay: MPC Sign Messages
            Policy->>Relay: MPC Sign Messages
            Cosigner->>Relay: MPC Sign Messages
            Relay->>Daemon: MPC Messages routed
            Relay->>Policy: MPC Messages routed
            Relay->>Cosigner: MPC Messages routed
            
            Daemon->>Daemon: Complete Signature<br/>(from MPC rounds)
            Daemon->>Client: Signed Transaction
        end
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Poem

🐰 I munch on seeds of presign and lore,

three shares I hide behind the door,
relay hops and policy sings,
keys encrypt with cheery springs,
a rabbit hops—threshold dreams adore. 🥕✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: 2-of-3 threshold ECDSA signing (CGGMP21)' clearly and concisely summarizes the main feature addition: implementation of 2-of-3 threshold signing using the CGGMP21 protocol, which is the primary objective of this changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 85.34% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `master`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In `@crates/saw-cli/src/keygen_local.rs`:
- Around line 48-51: The ExecutionId values built from deterministic strings
("local-{wallet}-aux" and "local-{wallet}-dkg") can be reused across runs;
update the eid construction in the aux and dkg tasks so they include a
run-unique random nonce or high-resolution timestamp before calling
cggmp21::ExecutionId::new (the variables eid_bytes / eid and the calls to
keygen::generate_aux_info and keygen::generate_dkg_info are the places to
change). Generate a cryptographically random nonce (or use a UUID or
chrono::Utc::now().timestamp_nanos()), append it to the existing string when
creating eid_bytes for both the aux and dkg branches, and keep the rest of the
async spawn and generate_* calls unchanged.
- Around line 105-110: The code currently uses assert_eq!(address,
output.address, ...) which panics on mismatch; change this to return an Err from
the surrounding function (which returns Result<String, String>) with a clear
message including both addresses. Specifically, in the block that compares
address and output.address (where address, public_key, output.address,
output.public_key are used), replace the assert_eq! with an early return like
Err(format!(...)) describing the "address mismatch between parties" and include
the conflicting values so the caller can handle the error instead of panicking.

In `@crates/saw-daemon/src/lib.rs`:
- Around line 540-577: The loop that tries both y-parities to recover v is
duplicated in handle_sign_evm_tx_threshold and
handle_sign_eip2612_permit_threshold—extract it into a single helper fn
recover_y_parity(r_bytes: &[u8], s_bytes: &[u8], msg_hash: &[u8;32],
expected_uncompressed_pk: &[u8]) -> u8 that creates Secp256k1 and Message,
builds the 64-byte compact sig for v=0..1, attempts RecoveryId::from_i32,
RecoverableSignature::from_compact and secp.recover_ecdsa, compares
recovered.serialize_uncompressed()[1..] to expected_uncompressed_pk[1..],
returning the matching v (or 0 fallback); then replace the duplicated loops in
handle_sign_evm_tx_threshold and handle_sign_eip2612_permit_threshold to call
this helper with the local r_bytes, s_bytes, &sighash (as [u8;32]) and
expected_pk.to_bytes(false).as_ref().
- Around line 161-165: The current code uses a single hardcoded default
"keys/threshold/key_share.json" for every wallet when wcfg.key_share_path is
None, causing different threshold wallets to share the same key share; update
the logic in the wallet setup (where wcfg, key_share_path, root and full_path
are used) to either (a) require wcfg.key_share_path when wcfg.mode ==
"threshold" and return an error if missing, or (b) compute a unique default per
wallet by incorporating a wallet identifier (e.g., wcfg.name or the wallet ID)
into the path (example: format!("keys/threshold/{}_key_share.json",
wallet_name)) before joining with root to form full_path; pick one approach and
ensure validation/error messages reference wcfg.key_share_path and the wallet
identifier.

In `@crates/saw-daemon/src/threshold.rs`:
- Around line 541-545: The PresignRequest is being sent with wallet:
String::new(), which can break routing; update the WireMessage::PresignRequest
construction to pass the actual wallet name used elsewhere (match the wallet
field used in the SignRequest path) instead of an empty string so the policy
server can look up the correct key share (modify the PresignRequest wallet value
to use the wallet variable/identifier used in your SignRequest flow).

In `@crates/saw-mpc/tests/daemon_policy_flow.rs`:
- Around line 289-295: The verification is double-hashing the message by using
DataToSign::from_digest with SHA-256; change the verification to mirror the
signing path by constructing the DataToSign from the raw message hash scalar
instead. Replace the DataToSign::from_digest(...) call used before
sig_daemon.verify(...) with
DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(message_hash))
so verification uses the same scalar representation as the signing code in
signing.rs (the instances at DataToSign::from_scalar in that file).

In `@crates/saw-mpc/tests/keygen_and_sign.rs`:
- Around line 104-109: The test constructs DataToSign via
DataToSign::from_digest which double-hashes the message and so won't match the
signing path; change the construction to use
DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(message_hash))
so verification matches how signatures are produced (see issue_partial_signature
and sign_full which use from_scalar), ensuring
signatures[0].verify(&key_shares[0].shared_public_key, &data) uses the same
raw-scalar DataToSign as the signer.

In `@crates/saw-mpc/tests/keygen_ceremony.rs`:
- Around line 81-90: listener2 is bound to addr2 and remains alive, causing
run_relay (which calls TcpListener::bind) to fail with EADDRINUSE; fix by
dropping or moving ownership of listener2 before spawning the relay (i.e.,
release the TcpListener held in listener2 after obtaining addr2 and before
calling tokio::spawn for relay_task) or modify relay::run_relay to accept an
existing TcpListener and pass listener2 directly so run_relay doesn't re-bind
the same address (referencing listener2, addr2, run_relay, and relay_task).

In `@crates/saw-mpc/tests/ws_transport.rs`:
- Around line 130-136: The test verifies the signature using
DataToSign::from_digest which re-hashes message_hash, but the signing code
(sign_full) used DataToSign::from_scalar with
generic_ec::Scalar::from_be_bytes_mod_order(message_hash); update the
verification to construct the same DataToSign via
DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(&message_hash))
so sig_daemon.verify(&complete_shares[0].shared_public_key, &data) checks the
same value that was signed.

In `@crates/saw-policy/Dockerfile`:
- Around line 14-34: The Dockerfile currently runs the runtime stage as root;
modify the runtime stage to create a dedicated non-root user (e.g., group/user
like sawpolicy or use UID 1000), change ownership of the copied binary and /data
to that user, and add a USER directive before CMD so saw-policy runs
unprivileged; update references in the stage that use /data (WORKDIR, ENV like
SAW_ROOT, POLICY_PATH, KEY_SHARE_PATH) to preserve functionality under the new
user and ensure the binary at /usr/local/bin/saw-policy is executable by that
user (adjust ownership/permissions where needed).

In `@crates/saw-policy/src/policy.rs`:
- Around line 116-124: The tripped_breakers lookup in evaluate is dead because
CircuitBreaker configs (struct CircuitBreaker with fields condition, action,
cooldown_hours) are parsed but never evaluated or inserted into
state.tripped_breakers; fix by implementing circuit-breaker evaluation where
requests are processed (e.g., inside evaluate or the request pre-check flow):
iterate configured breakers, evaluate each breaker.condition against the current
request/context, and when a breaker fires insert an entry into
state.tripped_breakers (keyed by request.wallet) with the breaker name and
cooldown expiry (use cooldown_hours to compute expiry), then have evaluate
return a Deny (or the configured action) with matched_rule set to the breaker
name; alternatively if you intended scaffolding, remove the tripped_breakers
check from evaluate or add a clear TODO comment in evaluate referencing
CircuitBreaker, condition, action, and cooldown_hours so users aren’t misled.
- Around line 147-195: matches_rule currently ignores RuleConditions fields
max_value_usd, max_daily_spend_usd, and max_per_minute which lets rules that
specify only those conditions match everything; update matches_rule (in function
matches_rule) to fail-closed by returning false when any of
rule.conditions.max_value_usd, rule.conditions.max_daily_spend_usd, or
rule.conditions.max_per_minute is Some(...) and the corresponding enforcement is
not yet implemented, or alternatively implement the actual checks (compare tx
value in USD for max_value_usd, consult spend-tracking for max_daily_spend_usd,
and apply rate tracking for max_per_minute); ensure you reference the
RuleConditions fields (max_value_usd, max_daily_spend_usd, max_per_minute) and
add/adjust tests to cover the fail-closed behavior so a rule that only sets
these fields no longer matches by default.

In `@crates/saw-policy/src/server.rs`:
- Around line 465-475: The signing result is only logged and never sent to the
daemon; after awaiting sign_task (the block around sign_task.await and the match
on result) construct a WireMessage::SignResult (or equivalent) containing
sign_req.request_id and either success info or the error string, then send it
back over the same outbound websocket channel used elsewhere in this file (the
outbound/sender object used for out_task) and await/handle the send result
(log/send error if it fails); ensure you send the message before aborting or
dropping out_task so the daemon receives the explicit success/failure signal.
- Around line 43-60: The current loop uses listener.accept().await.map_err(|e|
MpcError::Transport(...)) which returns Err on any accept failure and kills the
server; change the accept logic in run() to match on listener.accept().await
instead of mapping the error: on Ok((tcp_stream, addr)) proceed as before (clone
key_share and policy_config and tokio::spawn calling handle_connection), and on
Err(e) log the failure (e.g., tracing::warn!(%e, "accept failed; continuing"))
and continue the loop instead of returning MpcError::Transport; this removes the
immediate propagation of accept errors while preserving the existing connection
handler behavior.

In `@docker/recovery/Dockerfile`:
- Around line 5-11: The Dockerfile currently relies on sensitive environment
variables (SAW_PASSPHRASE, KEY_SHARE_BASE64, POLICY_YAML) which can be leaked;
change the image and entrypoint so secrets are read from files (e.g.
/run/secrets/ or a mounted file) instead of env vars: update the Dockerfile to
document and expect docker secrets or a mounted secrets file (and remove
guidance to pass SAW_PASSPHRASE/KEY_SHARE_BASE64 via -e), and modify the
container startup/entrypoint logic to load SAW_PASSPHRASE, KEY_SHARE_BASE64 and
POLICY_YAML from secure file paths (and support an --env-file fallback only with
strict file permissions) so sensitive values are not passed as process
environment variables.
- Around line 21-30: Add a dedicated non-root user and switch to it in the
Dockerfile: create a system user/group (e.g., "saw") before copying or after
COPY, chown the binary /usr/local/bin/saw-policy and the data directory (/data)
to that user, set WORKDIR ownership, and add USER saw (or UID/GID) before the
CMD so the container does not run as root; ensure the user has no interactive
login (system user) and keep EXPOSE/CMD unchanged.

In `@Dockerfile.policy`:
- Around line 14-34: The runtime Docker stage runs as root; add a non-root user
and switch to it to reduce blast radius: in the stage that builds from
debian:bookworm-slim (the one that COPYs /build/target/release/saw-policy and
sets WORKDIR /data), create a dedicated unprivileged user (e.g., user "saw" or
UID 1000), change ownership of /data and the copied binary (saw-policy) to that
user, set HOME/ENV as needed, and add USER <username> before CMD so the
container runs the saw-policy binary as the non-root user instead of root.
Ensure directories and the binary remain readable/executable by that user.

In `@docs/threat-model-and-design-rationale.md`:
- Around line 60-65: The Accepted risks section is incomplete—add explicit
entries for missing attack vectors: document risks and mitigations for
Unauthenticated WebSocket/relay transport (MITM, message injection,
impersonation) and note design decision "Not included: mTLS/WS auth"; for Policy
server compromise (policy signer root compromise, bypassing rules, exfiltration
of signer share) reference "policy signer" and explain detection/response and
acceptance or hardening options; for Presignature pool attacks (exhaustion,
tampering, integrity loss) reference "presignature pool" and "presignature" and
describe rate-limits, authentication, integrity checks, and pool
monitoring/rehydration; for Denial-of-service on signing ceremonies (relay
flood, dropped party) reference MPC ceremonies and add timeouts, retries,
fallback policies; and for Key share encryption at rest (Argon2id +
ChaCha20-Poly1305) describe threats from disk theft/backups and mitigations like
stronger KDF params, hardware-backed keys, and key rotation/re-key procedures or
explicitly mark as accepted risk—update the Accepted risks list to include these
items and link to existing items like "2-of-2 mode" and "policy_unavailable"
where relevant.

In `@docs/threshold-signing-spec.md`:
- Around line 156-159: The Transport section currently documents an
unauthenticated plaintext WebSocket between "Daemon ↔ Policy" and lists message
types like the JSON-serialized WireMessage (SignRequest, PolicyDecision,
MpcWireMessage, PresignRequest) and “mTLS” as future — add a tracked task:
create a repository issue titled something like "Add WebSocket auth for
Daemon↔Policy (mTLS/token)" describing required options (mTLS with pinned certs,
token-based auth, and migration plan), assign owner and milestone, then update
the docs/threshold-signing-spec.md Transport paragraph to include a link to that
issue and a TODO note next to "mTLS" so reviewers can find the tracking item
before production; ensure the issue references WireMessage/PolicyDecision to
show impact on co-signing.

🟡 Minor comments (11)

docs/threat-model-and-design-rationale.md-31-31 (1)
31-31: ⚠️ Potential issue | 🟡 Minor

Clarify the complexity of share rotation.

Line 31 states "rotate the shares via a new keygen ceremony and the stolen share becomes useless," which makes recovery sound straightforward. However, line 65 reveals "No key refresh in v1" and that mitigation requires "a full re-keygen to a new wallet and on-chain fund transfer." This is a significant operational burden (new address, fund migration) that should be acknowledged here to avoid misleading readers about the ease of recovery.
📝 Suggested clarification
-With MPC, a compromise of one machine is recoverable — rotate the shares via a new keygen ceremony and the stolen share becomes useless.
+With MPC, a compromise of one machine is recoverable — generate a new wallet via a fresh keygen ceremony, transfer funds on-chain, and the stolen share becomes useless (note: in-place key refresh is not supported in v1).
docs/threat-model-and-design-rationale.md-56-56 (1)
56-56: ⚠️ Potential issue | 🟡 Minor

TEE/premium threshold mode is not yet implemented.

Line 56 presents "TEE/premium threshold" as mode 3, but the PR objectives explicitly list "TEE deployment" under "Not included / future work." This could mislead readers into thinking this deployment mode is currently available. Either mark this as planned/future work or remove it from the current deployment modes list.
📝 Suggested clarification
-3. **TEE/premium threshold** — Managed co-signing service running in a Trusted Execution Environment. Appropriate for agents handling real money where operators want professional-grade security without running their own infrastructure.
+3. **TEE/premium threshold (planned)** — Managed co-signing service running in a Trusted Execution Environment. Appropriate for agents handling real money where operators want professional-grade security without running their own infrastructure. *(Not yet implemented; see future work.)*
docs/threat-model-and-design-rationale.md-64-64 (1)

64-64: ⚠️ Potential issue | 🟡 Minor

The 2-of-2 mode is implemented but untested — clarify support status in documentation.

The 2-of-2 mode is actually implemented (new_2of2() function exists in crates/saw-mpc/src/types.rs), not theoretical. However, it appears completely untested: there are no test cases for 2-of-2 in the provided test suite (all tests use 2-of-3). Claiming it is "Supported" in the documentation is misleading without evidence that 2-of-2 signing actually works. Either add integration tests for 2-of-2 or qualify the documentation to state it is "available as a configuration option" rather than actively "supported" until validated.
docs/threshold-signing-spec.md-70-70 (1)
70-70: ⚠️ Potential issue | 🟡 Minor

Typo: mywallt → mywallet.
-SAW_PASSPHRASE="secret" saw keygen-local --wallet mywallt --root ~/.saw
+SAW_PASSPHRASE="secret" saw keygen-local --wallet mywallet --root ~/.saw
crates/saw-policy/Dockerfile-5-5 (1)
5-5: ⚠️ Potential issue | 🟡 Minor

Add --no-install-recommends to apt-get install invocations.

Both apt-get install lines (builder Line 5, runtime Line 17) pull in recommended packages unnecessarily, increasing image size and attack surface.
Proposed fix
-RUN apt-get update && apt-get install -y m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
-RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
Also applies to: 17-17
crates/saw-daemon/src/config.rs-48-59 (1)
48-59: ⚠️ Potential issue | 🟡 Minor

Err(_) silently swallows permission and other I/O errors.

Currently, any read_to_string failure (including permission denied, I/O errors) falls through to defaults without warning. Only a missing file should silently use defaults.
Proposed fix
     match std::fs::read_to_string(&config_path) {
         Ok(contents) => {
             serde_yaml::from_str(&contents).unwrap_or_else(|e| {
                 eprintln!("warning: invalid config.yaml: {e}, using defaults");
                 DaemonConfig::default()
             })
         }
-        Err(_) => DaemonConfig::default(),
+        Err(e) if e.kind() == std::io::ErrorKind::NotFound => DaemonConfig::default(),
+        Err(e) => {
+            eprintln!("warning: could not read config.yaml: {e}, using defaults");
+            DaemonConfig::default()
+        }
     }
crates/saw-policy/src/main.rs-125-135 (1)
125-135: ⚠️ Potential issue | 🟡 Minor

Environment variable silently overrides --config CLI flag.

If POLICY_YAML is set, load_policy will use it regardless of whether --config <path> was explicitly passed. This could silently ignore the user's explicit CLI flag. Consider checking the env var only when no CLI path was provided, or at least logging which source was used.
♻️ Suggested approach: check env var before CLI resolution

Move the POLICY_YAML check into the config_path resolution block (lines 89-91) so that CLI flags take precedence:
 fn load_policy(path: &PathBuf) -> Result<policy::PolicyConfig, String> {
-    let contents = if let Ok(yaml) = std::env::var("POLICY_YAML") {
-        tracing::info!("loading policy from POLICY_YAML env var");
-        yaml
-    } else {
-        std::fs::read_to_string(path)
-            .map_err(|e| format!("read policy {}: {e}", path.display()))?
-    };
+    let contents = std::fs::read_to_string(path)
+        .map_err(|e| format!("read policy {}: {e}", path.display()))?;
     serde_yaml::from_str(&contents).map_err(|e| format!("parse policy: {e}"))
 }
And in the config_path resolution:
 let config_path = config_path.unwrap_or_else(|| {
-    PathBuf::from(std::env::var("POLICY_PATH").unwrap_or_else(|_| "policy.yaml".into()))
+    if let Ok(_yaml) = std::env::var("POLICY_YAML") {
+        // POLICY_YAML will be handled inline by load_policy
+        PathBuf::from("(inline)")
+    } else {
+        PathBuf::from(std::env::var("POLICY_PATH").unwrap_or_else(|_| "policy.yaml".into()))
+    }
 });
Alternatively, keep the current behavior but add a warning log when CLI flag is overridden.
crates/saw-policy/src/server.rs-330-340 (1)

330-340: ⚠️ Potential issue | 🟡 Minor

Inconsistent error handling for bad message hash between PartialSignRequest and SignRequest.

For PartialSignRequest (Lines 266-284), an invalid hash returns a denial response to the daemon. For SignRequest (Lines 331-337), an invalid hash propagates an MpcError that terminates the connection. The SignRequest path should also send an error response and continue to remain consistent and avoid dropping the connection.
crates/saw-mpc/src/keygen.rs-141-149 (1)
141-149: ⚠️ Potential issue | 🟡 Minor

Doc comment is misleading — no plaintext fallback implemented.

The doc says "If the data is not encrypted, falls back to plaintext deserialization", but the implementation unconditionally calls crate::encryption::decrypt(data, passphrase)?, which will fail on plaintext JSON input. Either update the doc to match the behavior or implement the fallback.
Option A: Fix the doc
-/// Decrypt and deserialize a KeyShare from storage.
-/// If the data is not encrypted, falls back to plaintext deserialization.
+/// Decrypt and deserialize a KeyShare from encrypted storage.
Option B: Implement the fallback
 pub fn deserialize_key_share_encrypted(
     data: &[u8],
     passphrase: &[u8],
 ) -> Result<cggmp21::KeyShare<Secp256k1>, MpcError> {
-    let plaintext = crate::encryption::decrypt(data, passphrase)?;
-    serde_json::from_slice(&plaintext).map_err(MpcError::Serde)
+    match crate::encryption::decrypt(data, passphrase) {
+        Ok(plaintext) => serde_json::from_slice(&plaintext).map_err(MpcError::Serde),
+        Err(_) => serde_json::from_slice(data).map_err(MpcError::Serde),
+    }
 }
crates/saw-daemon/src/threshold.rs-208-229 (1)
208-229: ⚠️ Potential issue | 🟡 Minor

Presignature is consumed even when the fast path fails — expensive resource leak.

At Line 210–212, the presignature is removed from the pool via take_next(). If sign_with_presignature fails (Line 221), the presignature is already consumed and cannot be returned to the pool. Since presignature generation requires multiple MPC rounds over the network, each lost presignature is expensive.

Consider returning the presignature to the pool on recoverable failures (e.g., transport errors), or at minimum logging at a higher severity level:
Suggested improvement
             match self
                 .sign_with_presignature(wallet, action.clone(), tx_details.clone(), message_hash, presig_index, presignature)
                 .await
             {
                 Ok(result) => return Ok(result),
                 Err(e) => {
-                    // Fast path failed (e.g. presig index mismatch after reconnect).
-                    // Fall through to slow path rather than losing the request.
-                    eprintln!("fast path failed: {e}, falling back to slow path");
+                    // Fast path failed — presignature is lost (cannot be safely reused
+                    // after partial signature was issued). Fall through to slow path.
+                    eprintln!("WARN: fast path failed, presignature {presig_index} lost: {e}");
                 }
             }
Note: once issue_partial_signature has been called (Line 248), the presignature must not be reused (security invariant). So recovery is only possible if the failure occurred before that point.
crates/saw-mpc/src/signing.rs-183-183 (1)

183-183: ⚠️ Potential issue | 🟡 Minor

Remove unused import.

The use sha2::Digest as _; import at the end of the file is not required. The trait is not used anywhere in the code — sha2::Sha256 appears only as a type parameter in generic bounds (lines 110, 160), not for trait method dispatch. No trait methods from Digest are invoked. Remove this import.

🧹 Nitpick comments (36)

docker/recovery/Dockerfile (2)
18-18: Add --no-install-recommends to both apt-get install calls.

Avoids pulling unnecessary recommended packages, reducing image size and attack surface (Trivy DS-0029).
Proposed fix
-RUN apt-get update && apt-get install -y m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
-RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
Also applies to: 22-22

15-16: Add a .dockerignore file to exclude unnecessary files from the build context.

The COPY . . at lines 15-16 sends the entire repository to the Docker daemon. Without a .dockerignore, directories like target/, .git/, node_modules/, and local configuration files will be included, increasing build context size and risking accidental exposure of sensitive files in the builder layer.

Create .dockerignore at the repository root with at minimum:
.git
target
node_modules
.env
.env.local
*.log
crates/saw-daemon/Cargo.toml (1)

6-27: Consider consolidating cggmp21 version via workspace dependencies.

Same observation as saw-cli: cggmp21 appears as both a direct dependency and a transitive one through saw-mpc. With three crates all pinning cggmp21 = "0.6" independently, a [workspace.dependencies] entry would be the standard way to keep versions synchronized as the project evolves.
Dockerfile.policy (1)
5-5: Add --no-install-recommends to apt-get install in both stages.

Reduces image size and attack surface by skipping unnecessary recommended packages. Flagged by Trivy DS-0029.
Proposed fix
-RUN apt-get update && apt-get install -y m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends m4 pkg-config libssl-dev && rm -rf /var/lib/apt/lists/*
-RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
Also applies to: 17-17
docs/threshold-signing-spec.md (1)

12-12: Add language identifiers to fenced code blocks.

The ASCII diagram blocks at lines 12 and 37 are missing language specifiers (flagged by markdownlint MD040). Using ```text would silence the lint and improve rendering.

Also applies to: 37-37

crates/saw-cli/src/keygen_local.rs (1)

12-14: Unix-only import: PermissionsExt prevents compilation on Windows.

std::os::unix::fs::PermissionsExt (Line 13) and from_mode(0o600) (Line 126) are Unix-only. If cross-platform support is ever needed, this would need a #[cfg(unix)] gate. Fine if the target is strictly Linux/macOS.
crates/saw-mpc/src/protocol.rs (2)
22-25: from_str method shadows the std::str::FromStr trait convention.

Defining an inherent from_str on SessionId is confusing — it looks like a trait impl but isn't. Users expecting "x".parse::<SessionId>() will find it doesn't work. Consider implementing From<&str> or the FromStr trait instead.
Proposed alternatives
-    /// Create from a known string (e.g., received from coordinator).
-    pub fn from_str(s: &str) -> Self {
-        Self(s.to_string())
-    }
Then either:
impl From<&str> for SessionId {
    fn from(s: &str) -> Self {
        Self(s.to_string())
    }
}
or implement std::str::FromStr.
46-48: message_hash as a hex String is fragile — consider [u8; 32] with hex serde.

Passing the 32-byte hash as a hex-encoded String in SignRequest and PartialSignRequest pushes encoding/decoding responsibility to every call site and allows malformed values (odd-length hex, wrong length, non-hex chars) to slip through at runtime. A typed [u8; 32] with a hex serde adapter (e.g., serde_with or a custom module) catches errors at deserialization time.

Also applies to: 118-119
crates/saw-policy/Dockerfile (1)

8-9: Docker layer caching is suboptimal — dependency changes always rebuild everything.

COPY crates/ crates/ on Line 9 copies all source files in a single layer. Any source change invalidates the cargo build cache. A common pattern is to first copy only Cargo.toml/Cargo.lock files, run a dummy build to cache dependencies, then copy sources. However, with a workspace this is admittedly non-trivial to set up, so this is fine for now.
crates/saw-mpc/tests/ws_transport.rs (1)
101-102: sleep(50ms) for server readiness is a flaky-test risk.

Under heavy CI load, the server task may not have accepted the listener before the client connects. A synchronization primitive (e.g., tokio::sync::oneshot signaled after ws_accept starts) or a connect-with-retry loop would be more robust.
Sketch: retry loop alternative
// Client side: retry connection
let conn = loop {
    match transport::ws_connect(&ws_url, PARTY_DAEMON, PARTY_POLICY, n).await {
        Ok(c) => break c,
        Err(_) => tokio::time::sleep(std::time::Duration::from_millis(10)).await,
    }
};
crates/saw-mpc/tests/keygen_and_sign.rs (2)
114-114: Move import to the top of the file with other imports.

The use sha2::Digest; import is placed at the very end of the file (line 114), detached from the other imports at lines 4-6. This is unconventional and easy to miss.
Suggested move
 use saw_mpc::keygen;
 use saw_mpc::signing;
 use saw_mpc::transport;
+use sha2::Digest;
And remove line 114.
15-17: Prime generation is sequential despite the comment saying "do in parallel".

pregenerate_primes() is CPU-intensive and called sequentially in the .map().collect(). For a test this is likely acceptable, but the comment on line 16 is misleading.
crates/saw-mpc/tests/keygen_ceremony.rs (1)
19-21: First listener and relay_url are unused — dead code.

The TcpListener bound at line 19 and relay_url at line 21 are never used. The test performs keygen in-memory and only uses a second listener for the relay signing phase. This is likely a leftover from an earlier iteration.
Suggested cleanup
-    let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
-    let addr = listener.local_addr().unwrap();
-    let relay_url = format!("ws://{addr}");
-
-    // We need the relay to accept connections for TWO phases (aux + keygen).
-    // The current relay accepts N parties then finishes.
-    // For multi-phase, we run the relay once per phase.
-    // Actually, let's test the connect_to_relay function directly
-    // with our existing in-memory approach for keygen, and just
-    // verify the relay routing works for a single phase.
-
-    // --- Test relay with a simple signing phase ---
-    // First, do keygen in-memory (proven), then test relay routing for signing.
-
crates/saw-cli/src/lib.rs (1)
611-627: CliError::InvalidArg is misused for non-argument errors.

Runtime creation failures (line 615), relay server errors (line 620), and party keygen errors (line 626) are all mapped to CliError::InvalidArg, which is semantically incorrect. This will produce misleading error messages like "invalid argument: relay error: ...".

Consider adding a variant to CliError (e.g., Keygen(String)) for these operational errors.
♻️ Sketch
 pub enum CliError {
     MissingArg(&'static str),
     InvalidArg(String),
+    Threshold(String),
     GenKey(GenKeyError),
     // ...
 }
Then use CliError::Threshold(e) in the threshold/keygen command handlers.
crates/saw-cosigner/src/main.rs (1)

1-67: Scaffold looks reasonable.

The CLI structure follows the same pattern as saw-policy. Multiple TODO comments mark the unimplemented functionality clearly.

Would you like me to open an issue to track the --join implementation and the other TODO items (lines 51-53, 62-64)?
crates/saw-policy/src/main.rs (1)
125-125: Prefer &Path over &PathBuf for function parameters.

Both load_policy and load_key_share take &PathBuf as parameters. Idiomatic Rust prefers &Path since PathBuf auto-derefs to Path, and &Path is more general (accepts both &PathBuf and &Path).
♻️ Proposed fix
-fn load_policy(path: &PathBuf) -> Result<policy::PolicyConfig, String> {
+fn load_policy(path: &Path) -> Result<policy::PolicyConfig, String> {
-fn load_key_share(path: &PathBuf) -> Result<saw_mpc::KeyShare<saw_mpc::Secp256k1>, String> {
+fn load_key_share(path: &Path) -> Result<saw_mpc::KeyShare<saw_mpc::Secp256k1>, String> {
Add use std::path::Path; alongside the existing PathBuf import.
Also applies to: 137-137
crates/saw-mpc/src/encryption.rs (1)

70-74: Silent passthrough for unencrypted data — all call sites properly guard against downgrade attacks.

The backward-compatible passthrough is intentionally documented for graceful fallback, but does create a downgrade risk if callers don't validate. Verification confirms both discovered call sites (saw-policy/src/main.rs:156 and saw-daemon/src/lib.rs:175) correctly check is_encrypted() before calling deserialize_key_share_encrypted(). Current usage is safe. Consider adding a prominent note in function documentation emphasizing that callers expecting encrypted data must explicitly validate with is_encrypted() first.

crates/saw-mpc/src/types.rs (2)

62-63: Doc says "Ethereum address" but Chain can be Sol.

The doc comment on address field says "Ethereum address derived from the combined public key", but KeyShareData embeds a ThresholdConfig that supports Chain::Sol. Consider generalizing the doc to "derived address" or making it chain-agnostic.

33-54: Constructors accept any party_id without bounds check.

new_2of3 and new_2of2 don't validate party_id < num_parties. Callers like keygen_threshold::run_party do check before calling, but a defensive debug_assert! or explicit validation here would prevent misuse from other call sites.
crates/saw-cli/src/keygen_threshold.rs (2)
64-68: pregenerate_primes is synchronous and CPU-intensive — consider spawn_blocking.

keygen::pregenerate_primes() is described as taking ~1 minute. It's called directly on the async runtime, which will block the Tokio executor thread. Wrap in tokio::task::spawn_blocking to avoid starving other tasks (e.g., the relay connection timeout).
Suggested fix
-    let primes = keygen::pregenerate_primes();
+    let primes = tokio::task::spawn_blocking(keygen::pregenerate_primes)
+        .await
+        .map_err(|e| format!("prime gen task: {e}"))?;
132-145: Metadata file not restricted to 0o600 and chain is hardcoded to Evm.

Two observations:

The key share file gets chmod 0o600 (Line 129), but the metadata file at meta_path (Line 145) does not. While metadata doesn't contain secret key material, it does reveal wallet address/public key — applying the same permissions would be consistent.

Chain::Evm is hardcoded at Line 134. The run_party function signature doesn't accept a chain parameter, so Solana keygen would produce incorrect metadata. Consider adding chain as a parameter for forward-compatibility.
Suggested fix for permissions
     fs::write(&meta_path, meta_json).map_err(|e| format!("write meta: {e}"))?;
+    fs::set_permissions(&meta_path, fs::Permissions::from_mode(0o600))
+        .map_err(|e| format!("chmod meta: {e}"))?;
crates/saw-mpc/tests/daemon_policy_flow.rs (1)

183-184: 50ms sleep for server readiness is potentially flaky.

The 50ms sleep between starting the policy server task and connecting the daemon client may not be enough under load (e.g., CI). Consider using a readiness signal (e.g., a tokio::sync::Notify or an oneshot channel) after the listener is bound, or retry the connection with backoff.
crates/saw-policy/src/server.rs (2)
396-463: Busy-wait polling loop with 50ms timeout in production server code.

The MPC message feed loop uses tokio::time::timeout(50ms, ws_rx.next()) as a poll, burning CPU cycles in a tight loop. This pattern is acceptable in tests but suboptimal for a production server.

Consider using tokio::select! to concurrently await both the WebSocket message and the sign task completion:
Sketch
loop {
    tokio::select! {
        result = &mut sign_task => {
            // sign_task completed
            break;
        }
        msg = ws_rx.next() => {
            // process incoming WS message
        }
    }
}
76-76: Hardcoded presignature pool parameters.

PresignaturePool::new(10, 2) uses magic numbers. These should come from configuration (e.g., PolicyConfig or a separate pool config) to allow tuning without code changes.
crates/saw-mpc/src/lib.rs (1)
21-27: Consider re-exporting Chain alongside ThresholdConfig.

Chain is a field of ThresholdConfig (which is re-exported) and is needed by consumers (e.g., saw-cli/src/keygen_threshold.rs imports saw_mpc::types::Chain directly). Re-exporting it at the crate root would be consistent with the other type re-exports.
-pub use types::{KeyShareData, PartyId, ThresholdConfig};
+pub use types::{Chain, KeyShareData, PartyId, ThresholdConfig};
crates/saw-mpc/src/relay.rs (3)
161-161: Dead code: _n is assigned but never used.
-        let _n = expected_parties;
297-335: Unnecessary Arc<Mutex> wrapping for ws_tx and ws_rx in connect_to_relay.

Both ws_tx (Line 299) and ws_rx (Line 329) are wrapped in Arc<Mutex<>> but are each consumed by exactly one spawned task. They could be moved directly into their respective tasks without the synchronization overhead.
Suggested simplification for ws_tx
-    let ws_tx = Arc::new(Mutex::new(ws_tx));
-    let ws_tx_out = ws_tx.clone();
     tokio::spawn(async move {
         use cggmp21::round_based::MessageDestination;
 
         while let Some(outgoing) = outgoing_rx.next().await {
             // ...
-            let mut tx = ws_tx_out.lock().await;
-            if tx.send(WsMessage::Text(json.into())).await.is_err() {
+            if ws_tx.send(WsMessage::Text(json.into())).await.is_err() {
                 break;
             }
         }
     });
83-86: WebSocket handshake failure kills the relay server.

If one party's WS handshake fails (e.g., TLS issue, protocol mismatch), the entire relay returns an error and all connected parties are dropped. For a ceremony relay this may be acceptable since all parties are required, but consider logging and waiting for a retry from that party instead.
crates/saw-mpc/src/transport.rs (1)
338-339: Minor: out_rxs.remove(0) in a loop is O(n²).

Each remove(0) shifts all remaining elements. For the typical n=3, this is negligible. For clarity, consider draining:
-    for sender_idx in 0..n {
-        let mut rx = out_rxs.remove(0);
+    for (sender_idx, mut rx) in (0..n).zip(out_rxs.drain(..)) {
crates/saw-daemon/src/lib.rs (2)

1247-1320: sign_evm_tx duplicates the sighash + build logic now extracted into helpers.

compute_evm_tx_sighash and build_signed_evm_tx were extracted for the threshold path, but sign_evm_tx (the single-key path) still has its own copy of the same RLP encoding logic. Consider refactoring sign_evm_tx to use these helpers to keep EIP-1559 encoding in one place.

Also applies to: 1322-1385

532-538: block_on in the synchronous request handler blocks the entire daemon.

While rt.block_on() is safe here (no nested runtime), the daemon's synchronous serve_loop processes one request at a time. During threshold signing — especially the slow MPC path which involves multiple WS round-trips and up to 5-second timeouts — the daemon is completely unresponsive to all other requests. This may be acceptable for an MVP but should be noted as a scaling limitation.

crates/saw-policy/src/policy.rs (1)

78-96: PolicyState fields are never read or written.

daily_spend, rate_history, and tripped_breakers are initialized in new() but never populated or queried anywhere in this module (aside from the read-only tripped_breakers check in evaluate). Consider adding a doc comment noting these are scaffolding for the unimplemented TODO conditions, so future contributors know the state management is intentionally deferred.
crates/saw-daemon/src/threshold.rs (4)
263-280: Persistent WS lock held for up to 5 seconds during fast-path signing.

The ws mutex is acquired at Line 263 and held through the 5-second timeout recv loop (Line 267). While this is safe today (the daemon is single-threaded due to block_on), if the daemon is ever made concurrent, this would block all other WS operations for the timeout duration. Consider releasing the lock after send and re-acquiring for recv, or using a split lock pattern.

382-470: ~180 lines of nearly identical MPC-over-WS dispatch code duplicated between sign_full_mpc and generate_one_presignature.

The outgoing forwarding task and the incoming message dispatch loop are structurally identical in both functions. Extract a shared helper, e.g.:
async fn run_mpc_over_ws<F, T>(
    ws_tx: Arc<Mutex<WsTx>>,
    ws_rx: &mut WsRx,
    session_id: SessionId,
    mpc_future: F,
) -> Result<T, ThresholdError>
where
    F: Future<Output = Result<T, MpcError>>,
This would eliminate the duplication and make future changes (e.g., adding message type filtering, better error handling) easier to maintain.

Also applies to: 558-645

249-251: Dead computation — our_partial_bytes serialized but immediately discarded.

Lines 249–251 serialize the partial signature into bytes and then immediately suppress the unused variable warning. If this is scaffolding for a future optimization, a TODO comment with the planned usage would be more informative than the current pattern.

488-518: Presign refill loop has no graceful shutdown mechanism.

The presign_refill_loop runs an infinite loop (Line 493) with no cancellation token or shutdown signal. It will run until the tokio runtime is dropped. While the runtime drop will abort the task, this means in-progress presignature generation (which involves WS connections and MPC rounds) will be abruptly terminated. Consider accepting a CancellationToken or watch::Receiver for clean shutdown.

crates/saw-cli/src/keygen_local.rs

crates/saw-daemon/src/lib.rs

crates/saw-daemon/src/threshold.rs

coderabbitai · 2026-02-16T01:49:39Z

docker/recovery/Dockerfile

+# Usage:
+#   docker build -t saw-recovery .
+#   docker run -p 8080:8080 \
+#     -e SAW_PASSPHRASE="your-passphrase" \
+#     -e KEY_SHARE_BASE64="<base64-encoded-party2-share>" \
+#     -e POLICY_YAML="<base64-encoded-policy>" \
+#     saw-recovery


⚠️ Potential issue | 🟠 Major

Secrets passed as environment variables are easily leaked.

SAW_PASSPHRASE and KEY_SHARE_BASE64 contain highly sensitive cryptographic material. Environment variables are visible via docker inspect, /proc/*/environ, orchestrator dashboards, and crash logs. For a recovery co-signer, this is a meaningful security exposure.

Prefer Docker secrets (docker run --secret), a mounted secrets file, or an environment-file (--env-file) with restricted permissions, and read the values from files at runtime instead.

🤖 Prompt for AI Agents

In `@docker/recovery/Dockerfile` around lines 5 - 11, The Dockerfile currently relies on sensitive environment variables (SAW_PASSPHRASE, KEY_SHARE_BASE64, POLICY_YAML) which can be leaked; change the image and entrypoint so secrets are read from files (e.g. /run/secrets/ or a mounted file) instead of env vars: update the Dockerfile to document and expect docker secrets or a mounted secrets file (and remove guidance to pass SAW_PASSPHRASE/KEY_SHARE_BASE64 via -e), and modify the container startup/entrypoint logic to load SAW_PASSPHRASE, KEY_SHARE_BASE64 and POLICY_YAML from secure file paths (and support an --env-file fallback only with strict file permissions) so sensitive values are not passed as process environment variables.

docker/recovery/Dockerfile

Dockerfile.policy

coderabbitai · 2026-02-16T01:49:39Z

docs/threat-model-and-design-rationale.md

+### Accepted risks
+
+- **Root compromise + active policy signer**: An attacker with root on the agent machine can sign transactions as long as they pass policy rules and the policy signer is reachable. Mitigation: strict policy rules (allowlists, rate limits, circuit breakers).
+- **Policy signer downtime**: Signing fails fast (5s timeout) and returns `policy_unavailable`. The agent decides whether to retry, queue, or skip. No silent failures.
+- **2-of-2 mode (no recovery share)**: Supported but explicitly warned against. If either share is lost, the wallet is irrecoverable. Only appropriate when operators accept this tradeoff.
+- **No key refresh in v1**: The cggmp21 crate doesn't support threshold key refresh. If a share is suspected compromised, the mitigation is a full re-keygen to a new wallet and on-chain fund transfer. This is a known limitation we accept for v1.


⚠️ Potential issue | 🟠 Major

Threat model is incomplete—several attack vectors are missing.

The "Accepted risks" section focuses primarily on the agent compromise scenario, but several other threat vectors relevant to the MPC architecture are not addressed:

Unauthenticated WebSocket/relay transport: PR objectives state "Not included: mTLS/WS auth," meaning the party-to-party communication during MPC ceremonies is unauthenticated. This opens risks of man-in-the-middle attacks, protocol message injection, or impersonation during keygen/signing.

Policy server compromise: The doc discusses agent compromise but not policy co-signer compromise. What are the implications if an attacker gains root on the policy server? Can they bypass rules, exfiltrate their key share, or cause signing to fail?

Presignature pool attacks: The PR adds a presignature pool for performance. What happens if an attacker can exhaust the pool, cause presignature generation to fail, or tamper with stored presignatures?

Denial-of-service on signing ceremonies: MPC protocols require multi-party coordination. What happens if an attacker disrupts the relay, floods the WebSocket, or causes one party to drop during signing?

Key share encryption at rest: The PR implements "key-share encryption using Argon2id + ChaCha20-Poly1305," but the threat model doesn't discuss the scenario where an attacker gains disk access without runtime access (e.g., stolen drive, backup compromise). What protections exist?

These should be documented with their mitigations or acknowledged as accepted risks. A threat model document should comprehensively cover the attack surface introduced by the architecture.

Would you like me to draft expanded threat scenarios covering these attack vectors?

🤖 Prompt for AI Agents

In `@docs/threat-model-and-design-rationale.md` around lines 60 - 65, The Accepted risks section is incomplete—add explicit entries for missing attack vectors: document risks and mitigations for Unauthenticated WebSocket/relay transport (MITM, message injection, impersonation) and note design decision "Not included: mTLS/WS auth"; for Policy server compromise (policy signer root compromise, bypassing rules, exfiltration of signer share) reference "policy signer" and explain detection/response and acceptance or hardening options; for Presignature pool attacks (exhaustion, tampering, integrity loss) reference "presignature pool" and "presignature" and describe rate-limits, authentication, integrity checks, and pool monitoring/rehydration; for Denial-of-service on signing ceremonies (relay flood, dropped party) reference MPC ceremonies and add timeouts, retries, fallback policies; and for Key share encryption at rest (Argon2id + ChaCha20-Poly1305) describe threats from disk theft/backups and mitigations like stronger KDF params, hardware-backed keys, and key rotation/re-key procedures or explicitly mark as accepted risk—update the Accepted risks list to include these items and link to existing items like "2-of-2 mode" and "policy_unavailable" where relevant.

coderabbitai · 2026-02-16T01:49:39Z

docs/threshold-signing-spec.md

+- **Daemon ↔ Policy:** WebSocket (plaintext WS over Railway's TLS termination)
+- **Message format:** JSON-serialized `WireMessage` enum (SignRequest, PolicyDecision, MpcWireMessage, PresignRequest, etc.)
+- **Reconnection:** Persistent WS with exponential backoff (1s → 30s)
+- **Future:** mTLS with pinned certificates


⚠️ Potential issue | 🟠 Major

Transport section acknowledges no WS authentication — ensure this is tracked.

The doc notes that auth is "planned for production" and mTLS is future work. Given that the policy server co-signs transactions, an unauthenticated WebSocket is a significant gap. Consider linking to a tracking issue so this doesn't slip through the cracks.

Would you like me to open an issue to track adding WebSocket authentication (mTLS / token-based) before production deployment?

🤖 Prompt for AI Agents

In `@docs/threshold-signing-spec.md` around lines 156 - 159, The Transport section currently documents an unauthenticated plaintext WebSocket between "Daemon ↔ Policy" and lists message types like the JSON-serialized WireMessage (SignRequest, PolicyDecision, MpcWireMessage, PresignRequest) and “mTLS” as future — add a tracked task: create a repository issue titled something like "Add WebSocket auth for Daemon↔Policy (mTLS/token)" describing required options (mTLS with pinned certs, token-based auth, and migration plan), assign owner and milestone, then update the docs/threshold-signing-spec.md Transport paragraph to include a link to that issue and a TODO note next to "mTLS" so reviewers can find the tracking item before production; ensure the issue references WireMessage/PolicyDecision to show impact on co-signing.

coderabbitai

Actionable comments posted: 9

🤖 Fix all issues with AI agents

In `@crates/saw-cli/src/keygen_local.rs`:
- Around line 136-138: The code currently does fs::write(&share_path,
&share_data) then fs::set_permissions(&share_path, ...), which temporarily
creates the file with the process umask; instead open and write the file with
explicit restrictive mode atomically: replace the fs::write +
fs::set_permissions sequence with creating the file via
OpenOptions::new().create(true).write(true).mode(0o600).open(&share_path), write
share_data into that handle (e.g., write_all), and handle errors similarly so
the file is never observable with default umask; update the error messages that
referenced fs::write and fs::set_permissions to reference the new open/write
operation and keep the same formatting (e.g., "write party {i}: {e}").

In `@crates/saw-daemon/src/lib.rs`:
- Around line 1281-1304: The function recover_y_parity currently falls back to
returning 0 on failure which yields an invalid v value; change its signature to
return a Result<u8, E> (e.g., Result<u8, anyhow::Error> or a custom error) and
on success return Ok(v_candidate), but if no recovery id matches return Err with
a descriptive message mentioning the input r_bytes/s_bytes/msg_hash/expected_pk
context; update callers to propagate or handle the error (or log and abort)
instead of assuming 0. Keep the existing recovery loop using Secp256k1::new(),
RecoveryId::from_i32, RecoverableSignature::from_compact, and
secp.recover_ecdsa; ensure the error path includes which stage failed (no
matching recovery id / recovery failed) to aid debugging.

In `@crates/saw-daemon/src/threshold.rs`:
- Around line 248-251: The code serializes the partial signature into
our_partial_bytes via serde_json::to_vec and then immediately discards it;
remove that dead serialization and the unused our_partial_bytes binding: keep
the call to signing::issue_partial_signature to produce our_partial, delete the
serde_json::to_vec(...).map_err(|e| ThresholdError::Mpc(format!("serialize
partial: {e}")))? expression and the let _ = our_partial_bytes; line so no
unused variable or unnecessary serialization remains.

In `@crates/saw-mpc/src/relay.rs`:
- Around line 269-286: The loop in connect_to_relay that awaits AllJoined via
ws_rx currently blocks indefinitely; wrap the waiting logic in a configurable
timeout (e.g., using tokio::time::timeout with a Duration pulled from
configuration or a new parameter) so that if AllJoined is not received within
the timeout you return a meaningful MpcError (e.g., MpcError::Transport or a new
MpcError::Timeout) instead of hanging; keep the existing envelope parsing
(RelayEnvelope) and error handling for Joined/Error cases but place the entire
receive-and-match loop under the timeout future and surface the timeout error to
the caller.
- Around line 90-103: The join read loop blocks forever when waiting for a
RelayEnvelope; wrap the receive logic (the loop that consumes ws_rx and
constructs join_msg) in a tokio::time::timeout (choose a 10–30s Duration, e.g.,
15s) so the await fails if no Join arrives; on timeout return an appropriate
crate::error::MpcError::Transport (e.g., "join timeout") instead of hanging.
Locate the loop that assigns join_msg, the ws_rx usage, and the RelayEnvelope
deserialization to apply the timeout wrapper and map the timeout error into
MpcError::Transport.

In `@crates/saw-policy/src/server.rs`:
- Line 246: The partial-sign path inconsistently swallows send_wire errors
(calls like let _ = send_wire(&ws_tx, &resp).await;) while one call uses ?; make
the error-handling consistent by either propagating errors from send_wire
(replace the ignore with send_wire(&ws_tx, &resp).await? so the connection
handler can clean up) or by explicitly logging and continuing (match
send_wire(...).await { Ok(_) => {}, Err(e) => tracing::error!("send_wire failed:
{:?}", e) }). Update all occurrences in the partial-sign flow that reference
send_wire, ws_tx, and resp to use the same chosen strategy.
- Around line 78-82: The outer message loop currently breaks unconditionally
when read_next_wire(&mut ws_rx).await returns Err, which disconnects the client
on any malformed frame; change the logic in that loop (the match around
read_next_wire in server.rs) to inspect the error variant returned by
read_next_wire: log and continue the loop for deserialization/parse errors so a
single bad frame is skipped, but only break/return on transport-level errors
(EOF/connection closed) so the session stays alive; use the existing logger to
record the parse error and which client/ws_rx message was dropped.

In `@docs/threshold-signing-spec.md`:
- Around line 67-73: Add a security note under the "Local keygen (current)"
section clarifying that the keygen-local mode (invoked via the keygen-local
command/SAW_PASSPHRASE example) generates all shares in one process and
therefore temporarily creates full signing capability on a single machine; mark
it explicitly as a convenience/testing mode and warn that it should not be used
in production where the threat model requires that no single party ever holds
all shares, and recommend using distributed key generation (distributed keygen)
for production deployments instead.
- Line 90: The spec currently says "detects plaintext shares and loads them
directly" which is a security risk; update the spec text to require a
mitigation: either emit a clear, high-severity warning when plaintext shares are
detected, require an explicit opt-in flag (e.g., --allow-plaintext-shares)
before loading plaintext shares, or mandate automatic encryption-on-first-load
(and refuse subsequent unencrypted loads) with details on behavior and failure
modes; refer to the existing phrase "detects plaintext shares and loads them
directly" when replacing it and add the chosen mitigation option and required
CLI/behavioral semantics.

🧹 Nitpick comments (17)

docs/threshold-signing-spec.md (2)
56-62: Document the rationale for memory-only presignatures.

The presignature pool is marked as "memory-only" and "not persisted to disk," which means they're regenerated on every restart. Consider adding a brief explanation of why this design choice was made (e.g., security trade-offs, performance considerations, or attack surface reduction) to help readers understand the availability implications.

12-12: Address markdown linting issues.

Three minor formatting issues flagged by markdownlint:

Lines 12 & 37: Fenced code blocks (ASCII diagrams) should specify a language. Use ```text for ASCII art.

Line 122: The table at line 122 should be surrounded by blank lines (add a blank line before it).
✨ Proposed formatting fixes

Fix 1: Add language specifier to architecture diagram (line 12)
-```
+```text
 Agent Machine                 Policy Server (Railway/VPS)      Human (Docker/offline)
Fix 2: Add language specifier to signing flow diagram (line 37)
-```
+```text
 Agent ──signTx()──► saw-daemon ──SignRequest──► saw-policy
Fix 3: Add blank line before table (line 121-122)
 | `SAW_PASSPHRASE` | Passphrase to decrypt key share |
 | `PORT` | Listen port (default: 9443) |
+
 ## Deployment
As per coding guidelines, markdownlint-cli2 flagged these formatting issues.

Also applies to: 37-37, 122-122
crates/saw-mpc/src/signing.rs (2)
46-52: Consider documenting thread-safety expectations.

PresignaturePool uses &mut self throughout, so callers (e.g., the daemon doing background refill alongside signing) must wrap it in a Mutex or similar. This is a reasonable design, but a brief doc comment on the struct noting it's not internally synchronized would help prevent misuse.

135-137: Extract DataToSign construction into a shared helper.

The DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(message_hash)) pattern is duplicated in issue_partial_signature (Line 137) and sign_full (Line 171). A small helper like fn data_to_sign(hash: &[u8; 32]) -> DataToSign<Secp256k1> would reduce duplication and centralize the "no double-hash" invariant.
♻️ Proposed helper
+/// Build a DataToSign from a pre-hashed message (no double-hashing).
+fn data_to_sign(message_hash: &[u8; 32]) -> DataToSign<Secp256k1> {
+    DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(message_hash))
+}
+
 /// Issue a partial signature from a presignature (local, no network).
 pub fn issue_partial_signature(
     presig: cggmp21::Presignature<Secp256k1>,
     message_hash: &[u8; 32],
 ) -> PartialSignature<Secp256k1> {
-    let data = DataToSign::from_scalar(generic_ec::Scalar::from_be_bytes_mod_order(message_hash));
+    let data = data_to_sign(message_hash);
     presig.issue_partial_signature(data)
 }
And similarly in sign_full.
Also applies to: 169-171
crates/saw-mpc/tests/ws_transport.rs (1)

99-100: 50ms sleep is a race-prone synchronization mechanism.

While acceptable for tests, the sleep between server bind and client connect can fail on slow CI runners. A more robust approach is to use a tokio::sync::Notify or an oneshot channel signaled after ws_accept is ready.

crates/saw-cli/src/keygen_local.rs (1)

148-149: Metadata hardcodes Chain::Evm — future chains will need plumbing here.

ThresholdConfig::new_2of3(0, wallet, Chain::Evm) locks local keygen to EVM. Consider accepting a --chain CLI flag so this generalizes when other chains are supported.

crates/saw-mpc/src/protocol.rs (1)

22-25: from_str method name shadows the FromStr trait convention.

Naming an inherent method from_str is confusing because it shadows std::str::FromStr::from_str. If someone imports FromStr and calls SessionId::from_str(s), the inherent method wins silently. Consider renaming to from_string or from_id, or implementing From<&str> / FromStr instead.
crates/saw-mpc/src/relay.rs (2)
296-329: Unnecessary Arc<Mutex> wrappers around ws_tx and ws_rx.

ws_tx (line 298) is only used by the outgoing task, and ws_rx (line 328) is only used by the incoming task. Each can be moved directly into its respective spawned task without Arc<Mutex> overhead.
Proposed fix (ws_tx)
-    let ws_tx = Arc::new(Mutex::new(ws_tx));
-    let ws_tx_out = ws_tx.clone();
     tokio::spawn(async move {
         use cggmp21::round_based::MessageDestination;
 
         while let Some(outgoing) = outgoing_rx.next().await {
             // ...
-            let mut tx = ws_tx_out.lock().await;
-            if tx.send(WsMessage::Text(json.into())).await.is_err() {
+            if ws_tx.send(WsMessage::Text(json.into())).await.is_err() {
                 break;
             }
         }
     });
308-311: Serialization/deserialization errors silently dropped.

At lines 308-311 and 359, failures to serialize outgoing messages or deserialize incoming M are silently ignored with continue. These should at least emit a tracing::warn! to aid debugging MPC protocol issues.

Also applies to: 359-374
crates/saw-daemon/src/threshold.rs (2)

371-470: Massive code duplication between sign_full_mpc and generate_one_presignature.

The outgoing task (lines 382–407 vs 560–585) and the WS-to-MPC bridging loop (lines 419–470 vs 596–647) are nearly identical. Extract a shared helper, e.g. run_mpc_over_ws(ws_tx, ws_rx, session_id, incoming_tx) -> ..., to reduce ~100 lines of duplication and prevent the two paths from diverging.

Also applies to: 550-647

419-470: Polling loop with 50ms timeout is less efficient than tokio::select!.

The loop at lines 419–470 polls ws_rx.next() with a 50ms timeout and checks sign_task.is_finished() each iteration. This introduces up to 50ms latency between task completion and detection. Using tokio::select! on both the WS stream and the sign task's JoinHandle would be more responsive and idiomatic.

crates/saw-daemon/src/lib.rs (2)

534-534: rt.block_on() will panic if called from within an existing tokio runtime.

handle_sign_evm_tx_threshold and handle_sign_eip2612_permit_threshold use rt.block_on(...) (lines 534, 846). If serve_loop is ever invoked from within a tokio context (e.g., integration tests or future refactoring to async), this panics with "Cannot start a runtime from within a runtime". The current synchronous Unix socket server is safe, but document this constraint or consider rt.spawn + a blocking channel for resilience.

Also applies to: 846-846

303-326: Threshold address derivation duplicates logic from derive_evm_address.

Lines 303–326 re-implement Keccak256-based address derivation from an uncompressed public key. This duplicates the pattern in derive_evm_address (lines 1155–1179), using a different public key source. Consider extracting a address_from_uncompressed_pubkey(pub_bytes: &[u8]) -> String helper to keep the derivation logic in one place.

crates/saw-mpc/tests/daemon_policy_flow.rs (2)

182-182: Fixed sleep before connect is a test-fragility risk.

The 50ms sleep between server bind and client connect is a race. Under CI load, the listener may not be ready. Consider passing the bound addr to the daemon task and retrying the connect in a loop, or using a tokio::sync::oneshot to signal readiness.

130-180: Significant duplication of the MPC↔WS routing loop between daemon and policy sides.

The outgoing router (serialize → WS send) and incoming router (WS recv → deserialize → channel) are nearly identical between the two tasks, differing only in from party ID. Consider extracting a helper to reduce ~90 lines of copy-paste. This is test code, so it's a nice-to-have.

Also applies to: 233-278

crates/saw-policy/src/server.rs (1)

86-216: Extract the MPC↔WS routing loop into a shared helper.

The presign flow (Lines 86–216) and sign flow (Lines 349–470) contain nearly identical MPC message routing: outgoing task (serialize → WS send), incoming loop (WS recv → deserialize → channel feed), and result collection. The only differences are the MPC operation called and the execution ID prefix.

Extracting a generic run_mpc_over_ws(...) helper that takes a future-producing closure would eliminate ~120 lines of duplication and make it easier to keep both paths consistent (e.g., error handling, timeout values).

Also applies to: 349-470

crates/saw-policy/Dockerfile (1)

8-9: Add a .dockerignore to exclude unnecessary build artifacts and sensitive files from the Docker build context.

Without a .dockerignore, running docker build from the repository root sends the entire repo — including target/, .git/, and any local key material — to the Docker daemon, unnecessarily increasing build context size and exposing potential secrets. Add a .dockerignore file at the repository root with entries for target/, .git/, node_modules/, and other build artifacts.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-cli/src/keygen_local.rs

+        fs::write(&share_path, &share_data).map_err(|e| format!("write party {i}: {e}"))?;
+        fs::set_permissions(&share_path, fs::Permissions::from_mode(0o600))
+            .map_err(|e| format!("chmod party {i}: {e}"))?;


⚠️ Potential issue | 🟡 Minor

Key share written with default umask before set_permissions narrows it.

Between fs::write and fs::set_permissions, the file briefly exists with whatever the process umask allows (often 0o644). For secret key material, consider using OpenOptions::new().mode(0o600).write(true).create(true) to create the file with restrictive permissions atomically.

Proposed fix

- fs::write(&share_path, &share_data).map_err(|e| format!("write party {i}: {e}"))?; - fs::set_permissions(&share_path, fs::Permissions::from_mode(0o600)) - .map_err(|e| format!("chmod party {i}: {e}"))?; + { + use std::fs::OpenOptions; + let mut f = OpenOptions::new() + .write(true) + .create(true) + .truncate(true) + .mode(0o600) + .open(&share_path) + .map_err(|e| format!("write party {i}: {e}"))?; + f.write_all(&share_data) + .map_err(|e| format!("write party {i}: {e}"))?; + }

🤖 Prompt for AI Agents

In `@crates/saw-cli/src/keygen_local.rs` around lines 136 - 138, The code currently does fs::write(&share_path, &share_data) then fs::set_permissions(&share_path, ...), which temporarily creates the file with the process umask; instead open and write the file with explicit restrictive mode atomically: replace the fs::write + fs::set_permissions sequence with creating the file via OpenOptions::new().create(true).write(true).mode(0o600).open(&share_path), write share_data into that handle (e.g., write_all), and handle errors similarly so the file is never observable with default umask; update the error messages that referenced fs::write and fs::set_permissions to reference the new open/write operation and keep the same formatting (e.g., "write party {i}: {e}").

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-daemon/src/lib.rs

+fn recover_y_parity(
+    r_bytes: &[u8],
+    s_bytes: &[u8],
+    msg_hash: &[u8; 32],
+    expected_pk: &[u8],
+) -> u8 {
+    let secp = Secp256k1::new();
+    let msg = Message::from_digest_slice(msg_hash).expect("valid hash");
+    let mut sig_bytes = [0u8; 64];
+    sig_bytes[..32].copy_from_slice(r_bytes);
+    sig_bytes[32..].copy_from_slice(s_bytes);
+    for v_candidate in 0u8..2 {
+        if let Ok(rid) = secp256k1::ecdsa::RecoveryId::from_i32(v_candidate as i32) {
+            if let Ok(rec_sig) = RecoverableSignature::from_compact(&sig_bytes, rid) {
+                if let Ok(recovered) = secp.recover_ecdsa(&msg, &rec_sig) {
+                    if recovered.serialize_uncompressed()[1..] == expected_pk[1..] {
+                        return v_candidate;
+                    }
+                }
+            }
+        }
+    }
+    0 // fallback
+}


⚠️ Potential issue | 🟠 Major

recover_y_parity silently returns 0 on failure — produces an invalid transaction.

If neither recovery ID matches the expected public key (e.g., due to a bug or key mismatch), the fallback 0 at line 1303 silently produces a transaction with the wrong v value. This would result in a valid-looking but unbroadcastable transaction, making the root cause hard to diagnose.

Consider returning a Result or at least logging a warning on fallback.

Proposed fix

-fn recover_y_parity( +fn recover_y_parity( r_bytes: &[u8], s_bytes: &[u8], msg_hash: &[u8; 32], expected_pk: &[u8], -) -> u8 { +) -> Result<u8, String> { let secp = Secp256k1::new(); let msg = Message::from_digest_slice(msg_hash).expect("valid hash"); let mut sig_bytes = [0u8; 64]; sig_bytes[..32].copy_from_slice(r_bytes); sig_bytes[32..].copy_from_slice(s_bytes); for v_candidate in 0u8..2 { if let Ok(rid) = secp256k1::ecdsa::RecoveryId::from_i32(v_candidate as i32) { if let Ok(rec_sig) = RecoverableSignature::from_compact(&sig_bytes, rid) { if let Ok(recovered) = secp.recover_ecdsa(&msg, &rec_sig) { if recovered.serialize_uncompressed()[1..] == expected_pk[1..] { - return v_candidate; + return Ok(v_candidate); } } } } } - 0 // fallback + Err("failed to recover y_parity: no recovery ID matched the expected public key".into()) }

🤖 Prompt for AI Agents

In `@crates/saw-daemon/src/lib.rs` around lines 1281 - 1304, The function recover_y_parity currently falls back to returning 0 on failure which yields an invalid v value; change its signature to return a Result<u8, E> (e.g., Result<u8, anyhow::Error> or a custom error) and on success return Ok(v_candidate), but if no recovery id matches return Err with a descriptive message mentioning the input r_bytes/s_bytes/msg_hash/expected_pk context; update callers to propagate or handle the error (or log and abort) instead of assuming 0. Keep the existing recovery loop using Secp256k1::new(), RecoveryId::from_i32, RecoverableSignature::from_compact, and secp.recover_ecdsa; ensure the error path includes which stage failed (no matching recovery id / recovery failed) to aid debugging.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-daemon/src/threshold.rs

+        let our_partial = signing::issue_partial_signature(presignature, message_hash);
+        let our_partial_bytes = serde_json::to_vec(&our_partial)
+            .map_err(|e| ThresholdError::Mpc(format!("serialize partial: {e}")))?;
+        let _ = our_partial_bytes; // used for potential future optimization


🛠️ Refactor suggestion | 🟠 Major

Dead code: our_partial_bytes is computed then immediately discarded.

our_partial_bytes is serialized on line 249 and then explicitly dropped on line 251. Remove the unused serialization.

Proposed fix

// Issue our partial signature locally (zero network!) let our_partial = signing::issue_partial_signature(presignature, message_hash); - let our_partial_bytes = serde_json::to_vec(&our_partial) - .map_err(|e| ThresholdError::Mpc(format!("serialize partial: {e}")))?; - let _ = our_partial_bytes; // used for potential future optimization

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let our_partial = signing::issue_partial_signature(presignature, message_hash);

let our_partial_bytes = serde_json::to_vec(&our_partial)

.map_err(|e| ThresholdError::Mpc(format!("serialize partial: {e}")))?;

let _ = our_partial_bytes; // used for potential future optimization

let our_partial = signing::issue_partial_signature(presignature, message_hash);

🤖 Prompt for AI Agents

In `@crates/saw-daemon/src/threshold.rs` around lines 248 - 251, The code serializes the partial signature into our_partial_bytes via serde_json::to_vec and then immediately discards it; remove that dead serialization and the unused our_partial_bytes binding: keep the call to signing::issue_partial_signature to produce our_partial, delete the serde_json::to_vec(...).map_err(|e| ThresholdError::Mpc(format!("serialize partial: {e}")))? expression and the let _ = our_partial_bytes; line so no unused variable or unnecessary serialization remains.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-mpc/src/relay.rs

+        let join_msg = loop {
+            match ws_rx.next().await {
+                Some(Ok(WsMessage::Text(t))) => {
+                    break serde_json::from_str::<RelayEnvelope>(&t)
+                        .map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;
+                }
+                Some(Ok(WsMessage::Binary(d))) => {
+                    break serde_json::from_slice::<RelayEnvelope>(&d)
+                        .map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;
+                }
+                Some(Ok(_)) => continue,
+                _ => return Err(crate::error::MpcError::Transport("connection lost before join".into())),
+            }
+        };


⚠️ Potential issue | 🟠 Major

No timeout on Join message read — a slow or malicious peer blocks the entire relay.

The relay loops indefinitely at line 90 waiting for a Join message from each incoming connection. A client that connects but never sends data will prevent additional parties from joining. Add a timeout (e.g., 10–30 seconds) around this read loop.

Proposed fix

// Read Join message - let join_msg = loop { - match ws_rx.next().await { + let join_msg = tokio::time::timeout(Duration::from_secs(30), async { + loop { + match ws_rx.next().await { + // ... existing arms ... + } + } + }) + .await + .map_err(|_| crate::error::MpcError::Transport("join timeout".into()))??;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let join_msg = loop {

match ws_rx.next().await {

Some(Ok(WsMessage::Text(t))) => {

break serde_json::from_str::<RelayEnvelope>(&t)

.map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;

}

Some(Ok(WsMessage::Binary(d))) => {

break serde_json::from_slice::<RelayEnvelope>(&d)

.map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;

}

Some(Ok(_)) => continue,

_ => return Err(crate::error::MpcError::Transport("connection lost before join".into())),

}

};

let join_msg = tokio::time::timeout(Duration::from_secs(30), async {

loop {

match ws_rx.next().await {

Some(Ok(WsMessage::Text(t))) => {

break serde_json::from_str::<RelayEnvelope>(&t)

.map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;

}

Some(Ok(WsMessage::Binary(d))) => {

break serde_json::from_slice::<RelayEnvelope>(&d)

.map_err(|e| crate::error::MpcError::Transport(format!("bad join: {e}")))?;

}

Some(Ok(_)) => continue,

_ => return Err(crate::error::MpcError::Transport("connection lost before join".into())),

}

}

})

.await

.map_err(|_| crate::error::MpcError::Transport("join timeout".into()))??;

🤖 Prompt for AI Agents

In `@crates/saw-mpc/src/relay.rs` around lines 90 - 103, The join read loop blocks forever when waiting for a RelayEnvelope; wrap the receive logic (the loop that consumes ws_rx and constructs join_msg) in a tokio::time::timeout (choose a 10–30s Duration, e.g., 15s) so the await fails if no Join arrives; on timeout return an appropriate crate::error::MpcError::Transport (e.g., "join timeout") instead of hanging. Locate the loop that assigns join_msg, the ws_rx usage, and the RelayEnvelope deserialization to apply the timeout wrapper and map the timeout error into MpcError::Transport.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-mpc/src/relay.rs

+    loop {
+        let raw = match ws_rx.next().await {
+            Some(Ok(WsMessage::Text(t))) => t.into_bytes(),
+            Some(Ok(WsMessage::Binary(d))) => d.to_vec(),
+            Some(Ok(_)) => continue,
+            _ => return Err(crate::error::MpcError::Transport("connection lost".into())),
+        };
+        let env: RelayEnvelope = serde_json::from_slice(&raw)
+            .map_err(|e| crate::error::MpcError::Transport(format!("bad msg: {e}")))?;
+        match env {
+            RelayEnvelope::Joined { .. } => continue,
+            RelayEnvelope::AllJoined { .. } => break,
+            RelayEnvelope::Error { message } => {
+                return Err(crate::error::MpcError::Transport(format!("relay error: {message}")));
+            }
+            _ => continue,
+        }
+    }


⚠️ Potential issue | 🟠 Major

No timeout on AllJoined wait — client blocks indefinitely if not all parties connect.

connect_to_relay loops forever at line 269 waiting for AllJoined. If the relay never sends it (e.g., a party is missing), the caller hangs with no recourse. Wrap in a configurable timeout.

🤖 Prompt for AI Agents

In `@crates/saw-mpc/src/relay.rs` around lines 269 - 286, The loop in connect_to_relay that awaits AllJoined via ws_rx currently blocks indefinitely; wrap the waiting logic in a configurable timeout (e.g., using tokio::time::timeout with a Duration pulled from configuration or a new parameter) so that if AllJoined is not received within the timeout you return a meaningful MpcError (e.g., MpcError::Transport or a new MpcError::Timeout) instead of hanging; keep the existing envelope parsing (RelayEnvelope) and error handling for Joined/Error cases but place the entire receive-and-match loop under the timeout future and surface the timeout error to the caller.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-policy/src/server.rs

+    loop {
+        let wire = match read_next_wire(&mut ws_rx).await {
+            Ok(msg) => msg,
+            Err(_) => break,
+        };


⚠️ Potential issue | 🟡 Minor

Any deserialization failure on the outer message loop disconnects the client.

read_next_wire returns Err for malformed messages, and Line 81 unconditionally breaks the connection loop. A single garbled or unexpected frame from the daemon kills the entire session. Consider logging and continuing for parse errors, only breaking on transport-level errors (close/EOF).

Proposed approach

let wire = match read_next_wire(&mut ws_rx).await { Ok(msg) => msg, - Err(_) => break, + Err(MpcError::Transport(_)) => break, + Err(e) => { + tracing::warn!(error = %e, "ignoring malformed message"); + continue; + } };

🤖 Prompt for AI Agents

In `@crates/saw-policy/src/server.rs` around lines 78 - 82, The outer message loop currently breaks unconditionally when read_next_wire(&mut ws_rx).await returns Err, which disconnects the client on any malformed frame; change the logic in that loop (the match around read_next_wire in server.rs) to inspect the error variant returned by read_next_wire: log and continue the loop for deserialization/parse errors so a single bad frame is skipped, but only break/return on transport-level errors (EOF/connection closed) so the session stays alive; use the existing logger to record the parse error and which client/ws_rx message was dropped.

coderabbitai · 2026-02-16T02:42:45Z

crates/saw-policy/src/server.rs

+                        reason: decision.reason,
+                        partial_signature: None,
+                    });
+                    let _ = send_wire(&ws_tx, &resp).await;


⚠️ Potential issue | 🟡 Minor

Inconsistent error handling on send_wire calls in the partial-sign path.

Lines 246, 261, and 281 discard send errors with let _ =, but Line 303 propagates with ?. If the WebSocket is broken, the let _ = calls silently swallow the failure (the daemon never learns why its request was denied), while Line 303 kills the connection handler. Pick a consistent strategy — either always propagate (so the handler cleans up) or always log-and-continue.

Also applies to: 261-261, 281-281, 303-303

🤖 Prompt for AI Agents

In `@crates/saw-policy/src/server.rs` at line 246, The partial-sign path inconsistently swallows send_wire errors (calls like let _ = send_wire(&ws_tx, &resp).await;) while one call uses ?; make the error-handling consistent by either propagating errors from send_wire (replace the ignore with send_wire(&ws_tx, &resp).await? so the connection handler can clean up) or by explicitly logging and continuing (match send_wire(...).await { Ok(_) => {}, Err(e) => tracing::error!("send_wire failed: {:?}", e) }). Update all occurrences in the partial-sign flow that reference send_wire, ws_tx, and resp to use the same chosen strategy.

coderabbitai · 2026-02-16T02:42:45Z

docs/threshold-signing-spec.md

+### Local keygen (current)
+Generate all 3 shares in one process, then distribute:
+```bash
+SAW_PASSPHRASE="secret" saw keygen-local --wallet mywallet --root ~/.saw
+```
+Outputs 3 encrypted share files + metadata with the derived address.
+


⚠️ Potential issue | 🟠 Major

Local keygen undermines threshold security during setup.

The local keygen mode generates all 3 shares in a single process before distribution. This means the full signing capability exists on one machine during keygen, which contradicts the core security property stated in the overview ("The full private key never exists on any single machine"). While shares are encrypted before distribution, an attacker with access to that machine during keygen could compromise all shares.

Consider adding a security note that local keygen is a convenience mode for testing/development and that distributed keygen should be used for production deployments where the threat model requires that no single party ever holds all shares.

🤖 Prompt for AI Agents

In `@docs/threshold-signing-spec.md` around lines 67 - 73, Add a security note under the "Local keygen (current)" section clarifying that the keygen-local mode (invoked via the keygen-local command/SAW_PASSPHRASE example) generates all shares in one process and therefore temporarily creates full signing capability on a single machine; mark it explicitly as a convenience/testing mode and warn that it should not be used in production where the threat model requires that no single party ever holds all shares, and recommend using distributed key generation (distributed keygen) for production deployments instead.

coderabbitai · 2026-02-16T02:42:45Z

docs/threshold-signing-spec.md

+- AEAD: ChaCha20-Poly1305
+- Format: `SAW1` magic bytes + salt + nonce + ciphertext + tag
+- Passphrase via `SAW_PASSPHRASE` env var
+- Backward-compatible: detects plaintext shares and loads them directly


⚠️ Potential issue | 🟡 Minor

Backward compatibility with plaintext shares poses a security risk.

The system detects and loads plaintext shares directly. This creates a risk where shares might be accidentally stored or remain unencrypted without the user's knowledge, potentially exposing the key material. Consider one of the following mitigations:

Emit a loud warning when plaintext shares are detected

Require an explicit flag like --allow-plaintext-shares to load unencrypted shares

Automatically encrypt plaintext shares on first load and refuse to load them subsequently

🤖 Prompt for AI Agents

In `@docs/threshold-signing-spec.md` at line 90, The spec currently says "detects plaintext shares and loads them directly" which is a security risk; update the spec text to require a mitigation: either emit a clear, high-severity warning when plaintext shares are detected, require an explicit opt-in flag (e.g., --allow-plaintext-shares) before loading plaintext shares, or mandate automatic encryption-on-first-load (and refuse subsequent unencrypted loads) with details on behavior and failure modes; refer to the existing phrase "detects plaintext shares and loads them directly" when replacing it and add the chosen mitigation option and required CLI/behavioral semantics.

Slymeofthemonth added 24 commits February 15, 2026 07:36

docs: add threshold signing protocol spec

e535176

fix: correct hd-wallet feature name for cggmp21

519022f

docs: update spec with 3-mode deployment strategy

e20b941

Mode 1: Single-key (SAW as-is, no threshold) Mode 2: Self-hosted threshold (separate machine) Mode 3: TEE-hosted policy via Phala (future)

docs: add threat model and design rationale for threshold signing

1ffdfff

chore: remove debug logging from threshold signing

48107f7

feat: add Docker recovery co-signer for party 2 key share

ea60b3e

fix: bump Rust to 1.85 for edition2024 support

3b8d7f2

chore: harden .gitignore against accidental secret commits

7f89383

Add patterns for key shares (.json.enc), private keys (.key, .pem), env files (.env), and SAW data directories (.saw/).

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

Slymeofthemonth marked this pull request as draft February 16, 2026 01:58

fix: address CodeRabbit review feedback on PR daydreamsai#14

fdaa4e1

This was referenced Feb 16, 2026

Implement circuit breaker evaluation in policy engine #15

Closed

Add WebSocket auth for Daemon↔Policy transport (mTLS/token) #16

Closed

fix: address remaining CodeRabbit review feedback

eb4a6b2

Slymeofthemonth marked this pull request as ready for review February 16, 2026 02:25

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

Conversation

Slymeofthemonth commented Feb 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

Architecture

Tested

Not included (future work)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Slymeofthemonth commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 16, 2026 •

edited

Loading