Skip to content

Boltz Swap/VHTLC pkg’s added#113

Open
sekulicd wants to merge 9 commits into
arkade-os:masterfrom
sekulicd:2026-03-17-move-swap-boltz-vhtlc
Open

Boltz Swap/VHTLC pkg’s added#113
sekulicd wants to merge 9 commits into
arkade-os:masterfrom
sekulicd:2026-03-17-move-swap-boltz-vhtlc

Conversation

@sekulicd
Copy link
Copy Markdown
Contributor

@sekulicd sekulicd commented Mar 19, 2026

This moves Boltz’s Swap & VHTLC pkg’s from fulmine to go-sdk.
This adjust testing infrastructure and add adjusted tests(without Fulmine clinet) for swaps/vhtlc.

Summary by CodeRabbit

  • New Features

    • Ark↔Bitcoin chain swaps, LN submarine/reverse swaps, Boltz REST/WebSocket client, BOLT12 invoice/offer parsing, MuSig2 cooperative signing, VHTLC/Taproot claim & refund tooling, swap monitoring/resume and recovery flows.
  • Infrastructure

    • Deterministic mock Boltz server, expanded regtest integration stack, provisioning scripts, Docker images, and a balances/reporting utility.
  • Tests

    • New comprehensive end-to-end swap suite; updated e2e helpers and fixtures.
  • Chores

    • CI Go toolchain bumped to 1.26.1; Makefile targets reorganized (up/down/balances and bump-* helpers).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds Boltz chain-swap and Lightning swap support (REST + WS), MuSig2 signing and batch-session handlers, VHTLC/Taproot utilities, chain-swap orchestration/validation, PSBT/tapscript tooling, explorer integration, extensive infra and e2e tests, and CI/Makefile/tooling updates.

Changes

Cohort / File(s) Summary
CI & Build
\.github/workflows/integration.yaml, \.github/workflows/unit.yaml, Makefile, go.mod, test/infra/arkd.Dockerfile, test/infra/wallet.Dockerfile
Bump Go toolchain to 1.26.1; CI integration job runs make up (replaces regtest); Makefile adds up/down/balances and bump targets; go.mod dependency version updates and replace removal; Dockerfile builder base bumps.
Boltz client & WS
swap/boltz/boltz.go, swap/boltz/types.go, swap/boltz/status.go, swap/boltz/utils.go, swap/boltz/ws.go
New Boltz API client (typed requests/responses), HTTP helpers with HTTPError, event enum mapping, Retry helper, and a reconnecting WebSocket client with subscribe/updates.
Swap core & utilities
swap/swap.go, swap/utils.go, swap/batch_handler.go, swap/btc_util.go, swap/explorer.go
New SwapHandler surface and orchestration (Pay/Get/Claim/Refund/Settle), PSBT/tapscript verification and intent/checkpoint logic, explorer-backed broadcast/fee/status, batch-session for PSBT forfeits.
MuSig & signing
swap/musig.go, swap/musig2_handler.go
MuSig2 cooperative signing utilities (nonce generation/aggregation, partial sig parsing/combination, tweaked output key) and batch-session Musig2 handler for nonce/signature exchange.
Chain-swap domain & handlers
swap/chainswap.go, swap/chainswap_monitor.go, swap/chainswap_resume.go, swap/chainswap_validate.go, swap/chainswap_arkt_btc_handler.go, swap/chainswap_btc_ark_handler.go
ChainSwap model, resume API, monitor loop dispatching Boltz WS events, HTLC/VHTLC validation, swap-tree merkle/Taproot derivation, and direction-specific handlers implementing cooperative/script-path claim and refund flows.
VHTLC package & tests
vhtlc/opts.go, vhtlc/utils.go, vhtlc/vhtlc.go, vhtlc/vhtlc_fixtures.json, vhtlc/vhtlc_test.go
Opts validation and closure factories, closure parsers, VHTLC taproot construction/address derivation, fixtures and comprehensive unit tests.
Explorer & fee/status
swap/explorer.go
HTTP-backed ExplorerClient implementation for tx broadcast, fee estimates, block height, tx status and raw tx retrieval.
Infra & Mock Boltz
test/infra/docker-compose.yml, test/infra/setup_infra.sh, test/infra/balances.sh, test/infra/mockboltz/*, test/infra/*Dockerfile
New integration docker-compose stack, infra setup script, balances script, mock-boltz server (HTTP/WS + admin/event injection, MuSig/refund endpoints), and supporting Dockerfiles.
E2E tests & utilities
test/e2e/swap_test.go, test/e2e/utils_test.go, other test/e2e/*_test.go
Comprehensive swap e2e suite (real and mock Boltz), test helpers (setupClient now returns ArkClient + privkey), mock-boltz admin helpers, regtest/LND helpers, and minor test call-site fixes to ignore extra return value.
Removed legacy infra
test/docker/docker-compose.yml, test/docker/setup.go
Removed legacy compose and setup CLI in favor of new test/infra/ tooling.
Misc tests
test/e2e/*_test.go
Minor test updates: ignore extra return value from setupClient in many tests and a small sleep increase in one test.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SwapHandler
    participant BoltzAPI as Boltz REST
    participant BoltzWS as Boltz WS
    participant Ark as Ark Client
    participant BTC as Bitcoin Network
    participant Explorer as ExplorerClient

    User->>SwapHandler: ChainSwapArkToBtc(amount, btcAddr)
    SwapHandler->>BoltzAPI: CreateChainSwap(request)
    BoltzAPI-->>SwapHandler: swapResp (swapId, swapTree, lockup)
    SwapHandler->>Ark: SendOffChain(vhtlcAddr)
    Ark-->>SwapHandler: userLockTxid
    SwapHandler->>BoltzWS: Subscribe(swapId)
    BoltzWS-->>SwapHandler: swap.created
    BoltzWS-->>SwapHandler: transaction.server.mempool
    SwapHandler->>SwapHandler: claimBtcLockup() (MuSig flow)
    SwapHandler->>BoltzAPI: GetChainSwapClaimDetails()
    BoltzAPI-->>SwapHandler: pubNonce, serverPartial
    SwapHandler->>SwapHandler: Combine partials -> finalSig
    SwapHandler->>BTC: Broadcast(claimTx)
    BTC-->>Explorer: tx confirmed
    Explorer-->>SwapHandler: confirmation
    SwapHandler->>Ark: Settle(ctx)
    Ark-->>SwapHandler: settledTxid
    User-->>SwapHandler: ChainSwapStatus = Claimed
Loading
sequenceDiagram
    participant User
    participant SwapHandler
    participant BoltzAPI
    participant BoltzWS
    participant Ark

    User->>SwapHandler: PayInvoice(invoice)
    SwapHandler->>BoltzAPI: CreateSwap(submarine)
    BoltzAPI-->>SwapHandler: boltInvoice
    SwapHandler->>Ark: SendOffChain(vhtlcAddr)
    BoltzWS-->>SwapHandler: invoice.settled
    SwapHandler->>SwapHandler: ClaimVHTLC(preimage)
    SwapHandler->>Ark: Submit PSBT/checkpoint, verify sigs
    Ark-->>SwapHandler: txid
    User-->>SwapHandler: SwapStatus = Success
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • louisinger
  • altafan
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@arkanaai
Copy link
Copy Markdown

arkanaai Bot commented Mar 19, 2026

🔍 Arkana PR Review — arkade-os/go-sdk#113

PR: Boltz Swap/VHTLC packages moved from fulmine to go-sdk
Author: sekulicd | Size: +12,777 / -484


Summary

This PR moves the Boltz swap infrastructure (submarine swaps, reverse swaps, chain swaps) and the VHTLC (Virtual HTLC) package from fulmine into the go-sdk. It also overhauls the test infrastructure with a new test/infra/ setup including mock Boltz server, CLN, and Fulmine integration.


Security Review

✅ Positive Findings

  1. VHTLC Script Construction (vhtlc/vhtlc.go): The six-closure tapscript structure (claim, refund, refund-without-receiver, plus unilateral variants of each) correctly covers all exit paths. Unilateral exits have proper CSV timelocks, and the refund-without-receiver path uses CLTV — this is the right design for ensuring the sender can always recover funds even if the receiver vanishes.

  2. HTLC Validation (swap/chainswap_validate.go): Thorough byte-level script parsing of Boltz claim and refund leaf scripts. The validateClaimLeafScript and parseRefundHTLCScriptManually functions manually parse opcodes rather than trusting Boltz-provided data — good defensive practice. The minimum refund timeout check (MinRefundTimeout = 144 blocks ≈ 1 day) is a sensible safety floor.

  3. VHTLC Address Verification: validateVHTLC independently recomputes the VHTLC address from parameters and compares it to what Boltz provides — critical for preventing address substitution attacks.

  4. BTC Lockup Address Validation (validateBtcLockupAddress): Independently derives the P2TR address from MuSig2 key aggregation + swap tree merkle root — prevents Boltz from providing a lockup address they control.

  5. MuSig2 Implementation (swap/musig.go): Clean implementation with proper taproot tweak handling. Good use of WithTaprootSignTweak(merkleRoot) for signing and WithTaprootTweakedCombine for verification. The VerifyFinalSig debug gate is a good safety net.

  6. Forfeit Transaction Construction (swap/batch_handler.go): Proper handling of connector inputs, tapscript proof generation, and preimage injection for claim path. The separation of claimForfeitTxBuilder and refundForfeitTxBuilder is clean.

⚠️ Concerns & Observations

  1. crypto/rand usage in chainswap.go: Line uses crypto/rand for preimage generation — this is correct and critical. Just confirming it's not math/rand.

  2. WebSocket Reconnection (boltz/ws.go): The reconnection loop at the bottom of the Connect() goroutine runs forever without backoff or max retry count. If Boltz is persistently down, this will spin. Consider adding exponential backoff or a max retry limit.

  3. Panic Recovery in WS Goroutine: The recover() in the WS read goroutine silently swallows panics with only a log. The Updates channel is never closed in this path, which could leave consumers hanging. Consider closing the channel or signaling the error.

  4. HTTP Timeout (boltz/boltz.go): The defaultHTTPTimeout = 15s is shared across all API calls. Some operations (like FetchBolt12Invoice) might need longer timeouts depending on the Lightning network path. Not a bug, but worth noting if you see timeout failures in production.

  5. Partial Signature Scalar Parsing (musig.go): ParsePartialSignatureScalar32 correctly checks for scalar overflow via SetByteSlice — good. The comment about not using sig.Decode for Boltz format is important context for future maintainers.

  6. Mock Boltz Server (test/infra/mockboltz/server.go): At 1,347 lines, this is a substantial mock. While comprehensive mocks are good for testing, any drift from real Boltz API behavior could mask issues. Worth keeping in sync with Boltz API updates.

  7. Dependency Bumps: Major version bumps across btcd, btcwallet, lnd, and grpc. The btcec/v2 bump from v2.3.3 to v2.3.6 and lnd from v0.18.2 to v0.19.1 are significant — ensure these don't introduce MuSig2 behavioral changes. The replace directive for btcec/v2 was removed, suggesting the upstream fix is now incorporated.

🔒 Exit Path Integrity

The VHTLC design preserves the Ark protocol's core guarantee: users can always exit unilaterally. The three unilateral paths cover:

  • Claim (with preimage): Receiver can claim with CSV delay
  • Refund (with receiver): Both parties can cooperatively refund with CSV delay
  • Refund (without receiver): Sender can force refund solo with CSV delay

This matches the expected security model for Ark-based atomic swaps.


Cross-Repo Impact

  • fulmine: This is the source of the moved code. After merge, fulmine should import from go-sdk instead of maintaining its own copy. Check for duplicate definitions.
  • arkd: The ark-lib and client-lib dependency bumps to 20260319 commits suggest coordinated changes — verify those arkd commits are merged/stable.
  • ts-sdk / dotnet-sdk / rust-sdk: No direct impact, but the VHTLC interface defined here becomes the reference implementation other SDKs should match.

Verdict

Solid move of a critical subsystem. The security-sensitive code (script validation, MuSig2, VHTLC construction) is well-implemented with proper defensive validation. The main suggestions are operational robustness improvements (WS reconnection backoff, panic handling). No blocking issues found.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (26)
test/infra/mockboltz/server.go-589-608 (1)

589-608: ⚠️ Potential issue | 🟠 Major

Potential race condition: client.subs map modified without proper synchronization.

The subscription map client.subs is modified at lines 596-598 while only client.mu is used for WriteJSON. However, pushSwapUpdate (lines 945-948) reads client.subs under wsMu.RLock without holding client.mu, which could race with concurrent subscribe operations.

🔒 Proposed fix: Use client.mu consistently for subs access
 		if msg.Op == "subscribe" && msg.Channel == "swap.update" {
 			s.wsMu.RLock()
 			client := s.wsClients[conn]
 			s.wsMu.RUnlock()
 			if client == nil {
 				continue
 			}
+			client.mu.Lock()
 			for _, id := range msg.Args {
 				client.subs[id] = struct{}{}
 			}
+			client.mu.Unlock()

 			ack := map[string]any{

And in pushSwapUpdate:

 	for conn, client := range s.wsClients {
+		client.mu.Lock()
-		if _, ok := client.subs[id]; !ok {
+		_, subscribed := client.subs[id]
+		client.mu.Unlock()
+		if !subscribed {
 			continue
 		}
-		client.mu.Lock()
+		client.mu.Lock()
 		err := conn.WriteJSON(payload)
 		client.mu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 589 - 608, The subscribe handler
modifies client.subs without holding client.mu, which races with pushSwapUpdate
reading subs; update the subscribe branch to acquire client.mu before modifying
client.subs (and while writing the ack if desired) and release afterwards, and
modify pushSwapUpdate to acquire client.mu when accessing/iterating client.subs
(while still using s.wsMu.RLock to access s.wsClients if needed) so all
reads/writes to client.subs are protected by client.mu; refer to client.subs,
client.mu, the subscribe handler (conn.WriteJSON/ack block) and pushSwapUpdate
to implement the changes.
test/e2e/swap_test.go-48-53 (1)

48-53: ⚠️ Potential issue | 🟠 Major

Synchronize events access in the async callbacks.

eventCallback appends to events from handler goroutines while the test goroutine later reads the same slice without locking. Since integrationtest runs with -race, this pattern can fail the suite. A mutex or channel-backed collector is needed here, and in the repeated callback blocks below.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/swap_test.go` around lines 48 - 53, The test currently appends to
the shared slice "events" from the asynchronous "eventCallback" goroutines (type
swap.ChainSwapEvent) without synchronization; replace unsynchronized appends
with a thread-safe collector—either protect "events" with a sync.Mutex
(lock/unlock around reads/writes in eventCallback and any test reads) or switch
to a channel-backed collector that the test goroutine consumes and stores into
"events"—and update all callback occurrences that append to "events" to use the
chosen synchronization mechanism so the race reported by -race is eliminated.
swap/btc_util.go-213-236 (1)

213-236: ⚠️ Potential issue | 🟠 Major

Use the swap tree's actual leaf versions when building the control block.

computeSwapTreeMerkleRoot hashes swapTree.ClaimLeaf.Version and swapTree.RefundLeaf.Version, but this helper recomputes the sibling hash with NewBaseTapLeaf and hardcodes BaseLeafVersion in the control-block header. If Boltz returns a non-base leaf version, the control block will not match the merkle root above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/btc_util.go` around lines 213 - 236, The control-block uses a hardcoded
txscript.BaseLeafVersion and rebuilds siblingLeaf with NewBaseTapLeaf, which can
mismatch computeSwapTreeMerkleRoot that uses swapTree.ClaimLeaf.Version and
swapTree.RefundLeaf.Version; update the logic that constructs siblingLeaf and
leafVersionByte so the leaf version comes from the corresponding swapTree leaf
(e.g., swapTree.ClaimLeaf.Version or swapTree.RefundLeaf.Version) instead of
txscript.BaseLeafVersion, and ensure the sibling hash (siblingLeaf.TapHash()) is
computed from a TapLeaf built with that same version so the controlBlock (the
bytes appended to controlBlock and the leafVersionByte) matches the merkle root
computed by computeSwapTreeMerkleRoot.
swap/utils.go-188-194 (1)

188-194: ⚠️ Potential issue | 🟠 Major

>= 512 is not a safe seconds-vs-block discriminator.

This converts every block delay of 512+ into LocktimeTypeSecond, so common CSV values like 2016 blocks will round-trip incorrectly. If locktime is BIP68-encoded, inspect the type bit; if it is already normalized, the original type has to be preserved instead of inferred from the magnitude.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/utils.go` around lines 188 - 194, parseLocktime currently infers seconds
vs blocks by magnitude (>=512) which misclassifies many BIP68-encoded values;
update parseLocktime to inspect the BIP68 "type" bit instead of using a numeric
threshold: detect the BIP68 type bit on the incoming uint32 and return
arklib.RelativeLocktime with Type set to arklib.LocktimeTypeSecond or
arklib.LocktimeTypeBlock based on that bit, and use the remaining encoded value
as Value; if the input is already a normalized relative locktime representation,
preserve its original type rather than re-inferring from magnitude. Ensure you
reference and use parseLocktime, arklib.RelativeLocktime, LocktimeTypeSecond and
LocktimeTypeBlock when making the change.
swap/utils.go-31-33 (1)

31-33: ⚠️ Potential issue | 🟠 Major

Don't ignore checkpoint tapscript decode errors.

A malformed cfg.CheckpointTapscript currently returns an empty or partial byte slice and pushes the real failure into later signing/verification code. Fail fast here instead.

💡 Fail fast on bad config
-func checkpointExitScript(cfg clientTypes.Config) []byte {
-	buf, _ := hex.DecodeString(cfg.CheckpointTapscript)
-	return buf
+func checkpointExitScript(cfg clientTypes.Config) ([]byte, error) {
+	buf, err := hex.DecodeString(cfg.CheckpointTapscript)
+	if err != nil {
+		return nil, fmt.Errorf("invalid checkpoint tapscript: %w", err)
+	}
+	return buf, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/utils.go` around lines 31 - 33, The function checkpointExitScript
currently ignores hex.DecodeString errors which hides malformed
cfg.CheckpointTapscript; change checkpointExitScript to return ([]byte, error)
instead of just []byte, check the error from
hex.DecodeString(cfg.CheckpointTapscript) and return it immediately on failure,
and update all callers to handle the error (propagate or handle appropriately)
so configuration decode failures fail fast and don't surface later during
signing/verification.
Makefile-17-20 (1)

17-20: ⚠️ Potential issue | 🟠 Major

Scope the catch-all rule to the bump commands only.

The global %: fallback makes any misspelled make target succeed as a no-op, which can silently skip CI or local automation. Keep the dummy target generation limited to the positional COMMIT argument instead of installing a repo-wide catch-all.

💡 Safer pattern
-# Allow positional commit arguments, e.g.:
-#   make bump-client-lib <git-sha-or-tag>
-%:
-	@:
+# Allow positional commit arguments only for the bump targets.
+ifneq ($(filter bump-client-lib bump-ark-lib,$(firstword $(MAKECMDGOALS))),)
+ifneq ($(COMMIT),)
+$(eval $(COMMIT):;@:)
+endif
+endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 17 - 20, Remove the global catch-all rule `%:` and
replace it with a scoped pattern for the bump commands only (e.g., use a
`bump-%:` pattern target or explicit `bump-client-lib:` target) so misspelled
make targets no longer silently succeed; ensure the new target accepts the
positional COMMIT variable (the existing usage like `make bump-client-lib
<git-sha-or-tag>` still works) and delete the original `%:` block from the
Makefile.
go.mod-3-3 (1)

3-3: ⚠️ Potential issue | 🟠 Major

Align the module go version with CI before merging.

The workflows are still pinned to go 1.26.0 while this PR updates go.mod to 1.26.1. This mismatch will cause go mod tidy to behave differently in CI than locally, breaking builds. Either update .github/workflows/unit.yaml and .github/workflows/integration.yaml to use go-version: '1.26.1', or revert the module directive until runners are updated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 3, go.mod was bumped to "go 1.26.1" but CI workflows are
still pinned to 1.26.0, causing inconsistent go mod behavior; either update the
CI workflow files named unit.yaml and integration.yaml to set go-version:
'1.26.1' so CI matches go.mod, or revert the module directive in go.mod back to
"go 1.26.0" to match current runners — pick one approach and apply it
consistently across go.mod and the two workflow files.
swap/utils.go-269-285 (1)

269-285: ⚠️ Potential issue | 🟠 Major

Honor context cancellation during the backoff sleep.

After fn returns (false, nil), the loop blocks on time.After(interval) and will not observe ctx.Done() until the entire interval elapses. That keeps retries alive past cancellation/timeouts.

💡 Context-aware sleep
-			<-time.After(interval)
+			timer := time.NewTimer(interval)
+			select {
+			case <-ctx.Done():
+				if !timer.Stop() {
+					<-timer.C
+				}
+				if errors.Is(ctx.Err(), context.DeadlineExceeded) {
+					return fmt.Errorf("timed out")
+				}
+				return ctx.Err()
+			case <-timer.C:
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/utils.go` around lines 269 - 285, The retry loop currently blocks on
<-time.After(interval) and doesn't observe ctx cancellation during the sleep;
change the sleep to be context-aware by selecting between a timer (or time.After
channel) and ctx.Done() so that when fn returns (false, nil) the code waits for
either the interval or context cancellation/timeout, returning ctx.Err() if
cancelled; update the loop around fn(ctx) and the sleep (the select after the fn
call that currently uses time.After) to use select { case
<-time.After(interval): continue; case <-ctx.Done(): return ctx.Err() } or an
equivalent timer-based select to avoid extra retries after cancellation.
swap/bolt12.go-115-118 (1)

115-118: ⚠️ Potential issue | 🟠 Major

Reject non-whole-satoshi amounts instead of truncating them.

Both paths currently do amount / 1000, which silently floors. For example, 1500 becomes 1 sat. That undercounts the requested amount instead of surfacing that it cannot be represented exactly in sats.

Suggested fix
-	invoiceData.AmountInSats, err = safecast.ToUint64(invoiceData.Amount / 1000)
+	if invoiceData.Amount%1000 != 0 {
+		return nil, errors.New("invoice amount must be a whole number of satoshis")
+	}
+	invoiceData.AmountInSats, err = safecast.ToUint64(invoiceData.Amount / 1000)
@@
-	offerData.AmountInSats, err = safecast.ToUint64(offerData.Amount / 1000)
+	if offerData.Amount%1000 != 0 {
+		return nil, errors.New("offer amount must be a whole number of satoshis")
+	}
+	offerData.AmountInSats, err = safecast.ToUint64(offerData.Amount / 1000)

Also applies to: 167-170

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/bolt12.go` around lines 115 - 118, The code currently computes sats by
doing invoiceData.Amount / 1000 which silently truncates fractional sats;
instead, detect and reject non-whole-satoshi amounts by checking that
invoiceData.Amount % 1000 == 0 before converting, and return a clear error if
there is a remainder. Replace the two spots that use invoiceData.Amount / 1000
(the block that sets invoiceData.AmountInSats and the similar code at the later
occurrence) with logic that verifies exact divisibility, then compute
safecast.ToUint64(invoiceData.Amount/1000) only after the divisibility check
succeeds (referencing invoiceData.Amount and invoiceData.AmountInSats to locate
the changes).
swap/chainswap_resume.go-133-171 (1)

133-171: ⚠️ Potential issue | 🟠 Major

Do not tie resumed swap monitoring to the caller's context.

ResumeChainSwap starts background goroutines, but both inherit the resume call's ctx. If the caller uses a timeout or request-scoped context, the resumed swap can stop monitoring immediately after this method returns and miss claim/refund transitions. Use a handler-owned lifecycle context instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_resume.go` around lines 133 - 171, The background goroutines
in ResumeChainSwap are using the caller's ctx, which can be cancelled by the
caller and prematurely stop monitoring; change them to use a handler-owned
lifecycle context (e.g., h.lifecycleCtx or h.backgroundCtx) instead of the
passed-in ctx/params ctx so resumed swaps run independent of request-scoped
timeouts. Update the two goroutine invocations that call
monitorAndClaimArkToBtcSwap(...) and monitorBtcToArkChainSwap(...) to pass the
handler-owned context, and if the handler does not yet have such a context add
one to the handler (and ensure it is cancelled when the handler is shut down).
Ensure all other parameters (h.privateKey, preimage, &swapResp, swap,
params.EventCallback, params.UnilateralRefundCB, params.UserBtcAddress) remain
unchanged.
swap/batch_handler.go-398-413 (1)

398-413: ⚠️ Potential issue | 🟠 Major

Enforce the single-forfeit assumption in collaborative refunds.

This constructor accepts any number of vtxosToForfeit, but the finalization path only consumes one partialForfeitTx, uses connectors[0], and submits a single signed tx. If more than one VTXO needs a collaborative refund, the rest are dropped silently.

Suggested fix
 	if len(handler.vtxosToForfeit) > 0 && partialForfeitTx == "" {
 		return nil, fmt.Errorf("missing partial forfeit tx")
 	}
+	if len(handler.vtxosToForfeit) > 1 {
+		return nil, fmt.Errorf("collaborative refund supports exactly one forfeitable vtxo")
+	}
 	return &collabRefundBatchSessionHandler{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/batch_handler.go` around lines 398 - 413, The constructor accepts
multiple vtxosToForfeit but the collabRefundBatchSessionHandler finalization
only handles one partialForfeitTx and uses connectors[0], dropping extras; fix
by enforcing the single-forfeit assumption: after newBatchSessionHandler
returns, add a guard that if len(handler.vtxosToForfeit) > 1 then return an
error (e.g., "multiple vtxosToForfeit not supported") unless you also accept and
thread multiple partialForfeitTx values and connector handling; reference
newBatchSessionHandler, collabRefundBatchSessionHandler, vtxosToForfeit,
partialForfeitTx and connectors to locate where to add the check.
swap/boltz/types.go-189-213 (1)

189-213: ⚠️ Potential issue | 🟠 Major

GetSwapTree can panic when the selected leg has no BTC tree.

SwapLeg.SwapTree is a pointer and this same file marks it as BTC-specific, but GetSwapTree dereferences it unconditionally. A malformed/error response or wrong-leg selection will panic during resume/validation instead of returning a normal error.

Suggested fix
-func (c CreateChainSwapResponse) GetSwapTree(isArkToBtc bool) SwapTree {
-	if isArkToBtc {
-		return SwapTree{
-			ClaimLeaf: SwapTreeLeaf{
-				Version: c.ClaimDetails.SwapTree.ClaimLeaf.Version,
-				Output:  c.ClaimDetails.SwapTree.ClaimLeaf.Output,
-			},
-			RefundLeaf: SwapTreeLeaf{
-				Version: c.ClaimDetails.SwapTree.RefundLeaf.Version,
-				Output:  c.ClaimDetails.SwapTree.RefundLeaf.Output,
-			},
-		}
-	}
-
-	return SwapTree{
-		ClaimLeaf: SwapTreeLeaf{
-			Version: c.LockupDetails.SwapTree.ClaimLeaf.Version,
-			Output:  c.LockupDetails.SwapTree.ClaimLeaf.Output,
-		},
-		RefundLeaf: SwapTreeLeaf{
-			Version: c.LockupDetails.SwapTree.RefundLeaf.Version,
-			Output:  c.LockupDetails.SwapTree.RefundLeaf.Output,
-		},
-	}
+func (c CreateChainSwapResponse) GetSwapTree(isArkToBtc bool) (*SwapTree, error) {
+	var leg SwapLeg
+	if isArkToBtc {
+		leg = c.ClaimDetails
+	} else {
+		leg = c.LockupDetails
+	}
+	if leg.SwapTree == nil {
+		return nil, fmt.Errorf("missing BTC swap tree in response")
+	}
+	return leg.SwapTree, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/types.go` around lines 189 - 213, GetSwapTree currently
dereferences BTC-specific pointer fields unconditionally which can panic when
SwapLeg.SwapTree is nil; update CreateChainSwapResponse.GetSwapTree to nil-check
the pointer fields before accessing them (both c.ClaimDetails.SwapTree and
c.LockupDetails.SwapTree / their ClaimLeaf/RefundLeaf) and return a clear error
or zero-value SwapTree when the selected leg has no BTC tree, e.g., detect nil
and handle by returning an error (or a safe default) instead of dereferencing;
ensure the function signature and callers of GetSwapTree
(CreateChainSwapResponse.GetSwapTree) are updated to handle the error return if
you choose to return an error.
swap/batch_handler.go-148-151 (1)

148-151: ⚠️ Potential issue | 🟠 Major

Filter OnBatchFailed by batch session ID to avoid aborting on unrelated batch failures.

OnBatchFinalized already filters events by event.Id == h.batchSessionId, and OnBatchStarted only sets batchSessionId when the intent matches. OnBatchFailed must apply the same check—otherwise, failures for unrelated batches will abort this swap. Since batchSessionId starts empty, also guard against that.

Suggested fix
 func (h *batchSessionHandler) OnBatchFailed(
 	ctx context.Context, event client.BatchFailedEvent,
 ) error {
+	if h.batchSessionId == "" || event.Id != h.batchSessionId {
+		return nil
+	}
 	return fmt.Errorf("batch failed: %s", event.Reason)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/batch_handler.go` around lines 148 - 151, OnBatchFailed should ignore
failures for unrelated batches: add a guard in batchSessionHandler.OnBatchFailed
that first checks h.batchSessionId is non-empty and then returns nil unless
event.Id == h.batchSessionId; only format and return the error
(fmt.Errorf("batch failed: %s", event.Reason)) when the IDs match. This mirrors
the filtering used in OnBatchFinalized and prevents aborting the swap on
unrelated batch failures.
swap/explorer.go-111-121 (1)

111-121: ⚠️ Potential issue | 🟠 Major

Route every explorer call through e.client.

Three methods bypass the client configured in NewExplorerClient, so they also bypass its 10-second timeout and any future custom transport/retry settings. A stalled explorer call in GetCurrentBlockHeight, GetTransactionStatus, or GetTransaction can hang swap monitoring indefinitely instead of timing out as intended.

Suggested fix
-	resp, err := http.Get(endpoint)
+	resp, err := e.client.Get(endpoint)

Apply the same replacement at line 117 (GetCurrentBlockHeight), line 147 (GetTransactionStatus), and line 189 (GetTransaction).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/explorer.go` around lines 111 - 121, The three methods
GetCurrentBlockHeight, GetTransactionStatus, and GetTransaction currently call
the global http.Get and bypass the configured e.client (and its
timeout/transport), so replace each http.Get call with e.client.Get (keeping
endpoint construction via url.JoinPath and existing response handling/Close),
ensuring all explorer HTTP requests use the client initialized in
NewExplorerClient so they respect the 10s timeout and custom transport/retry
settings.
test/e2e/utils_test.go-243-259 (1)

243-259: ⚠️ Potential issue | 🟠 Major

Add timeouts to all mock-boltz HTTP helper functions.

All mock-boltz helpers (setMockBoltzConfig, resetMockBoltz, injectMockBoltzSwapEvent, getMockBoltzSwap, waitForMockBoltzHealth, injectMockBoltzSwapEventWithTx, getMockBoltzSwapTyped) use http.DefaultClient.Do() or http.Get() without timeouts. A stuck or slow mock-boltz container can hang the entire E2E test suite indefinitely. Create a package-level HTTP client with a timeout (e.g., var e2eHTTPClient = &http.Client{Timeout: 5 * time.Second}) and use it consistently across all these helpers to fail fast on stalled connections.

Suggested fix
+var e2eHTTPClient = &http.Client{Timeout: 5 * time.Second}
+
 func setMockBoltzConfig(t *testing.T, cfg map[string]any) {
 	t.Helper()
@@
-	resp, err := http.DefaultClient.Do(req)
+	resp, err := e2eHTTPClient.Do(req)
@@
-	resp, err := http.Get(url) //nolint:gosec
+	resp, err := e2eHTTPClient.Get(url)

Apply the same replacement throughout all mock-boltz helpers at lines 243–259, 262–272, 276–293, 296–307, 310–324, 430–451, and 463–474.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/utils_test.go` around lines 243 - 259, Create a package-level HTTP
client with a timeout (e.g., var e2eHTTPClient = &http.Client{Timeout: 5 *
time.Second}) and replace all uses of http.DefaultClient.Do(...) and
http.Get(...) in the mock-boltz helpers so they use e2eHTTPClient (for example,
in setMockBoltzConfig, resetMockBoltz, injectMockBoltzSwapEvent,
getMockBoltzSwap, waitForMockBoltzHealth, injectMockBoltzSwapEventWithTx, and
getMockBoltzSwapTyped); ensure request creation and resp handling stays the same
but calls use e2eHTTPClient.Do(req) or e2eHTTPClient.Get(url) to enforce the
timeout and fail fast on stalled connections.
vhtlc/vhtlc.go-88-105 (1)

88-105: ⚠️ Potential issue | 🟠 Major

Reject mismatched preimage hashes when rebuilding a VHTLC.

NewVhtlcScript builds preimageCondition from preimageHash, but never verifies that claimLeaf and unilateralClaimLeaf embed that same condition. That lets callers construct a VHTLCScript whose spend paths use one hash while Opts() reports another, so round-trips and downstream validation become unsound.

Suggested fix
 import (
+	"bytes"
 	"encoding/hex"
 	"fmt"
@@
 	unilateralClaimClosure, err := parseUnilateralClaimClosure(unilateralClaimLeaf)
 	if err != nil {
 		return nil, err
 	}
+	if !bytes.Equal(claimClosure.Condition, preimageCondition) {
+		return nil, fmt.Errorf("claim leaf preimage condition mismatch")
+	}
+	if !bytes.Equal(unilateralClaimClosure.Condition, preimageCondition) {
+		return nil, fmt.Errorf("unilateral claim leaf preimage condition mismatch")
+	}

Also applies to: 107-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vhtlc/vhtlc.go` around lines 88 - 105, NewVhtlcScript builds
preimageCondition from preimageHash (via makePreimageConditionScript) but does
not verify that claimLeaf and unilateralClaimLeaf actually embed that same
condition, allowing inconsistent scripts; update NewVhtlcScript to extract the
embedded preimage condition from claimLeaf and unilateralClaimLeaf (or parse
their script templates) and compare it to the computed preimageCondition,
returning an error if any mismatch is found; do the same verification in the
alternate construction path that handles the other branch (the similar code
later that also builds preimageCondition) so Opts() and the actual spend paths
remain consistent.
swap/boltz/ws.go-42-42 (1)

42-42: ⚠️ Potential issue | 🟠 Major

Make subscription acks non-blocking.

subscriptions is unbuffered, so if the ack arrives after the timeout path fires—or Boltz sends a duplicate ack—the reader goroutine blocks forever on the send and stops processing updates.

Suggested fix
-	subscriptions chan bool
+	subscriptions chan struct{}
@@
-		subscriptions: make(chan bool),
+		subscriptions: make(chan struct{}, 1),
@@
 				case "subscribe":
-					boltz.subscriptions <- true
+					select {
+					case boltz.subscriptions <- struct{}{}:
+					default:
+					}
 					continue

Also applies to: 67-67, 156-158, 199-204

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/ws.go` at line 42, The subscriptions channel currently declared as
"subscriptions chan bool" must be made non-blocking: change it to a buffered
channel (e.g., buffer size 1, or use chan struct{} with buffer 1) and update all
send sites that ack subscriptions (the reader goroutine and the places that send
subscription acks) to perform a non-blocking send using a select with a default
case so duplicate or late acks do not block (e.g., select { case subscriptions
<- true: default: }). Ensure receivers still read/drain the channel where
intended.
swap/chainswap_monitor.go-180-183 (1)

180-183: ⚠️ Potential issue | 🟠 Major

Don’t translate ctx.Done() into a swap failure.

Context cancellation is a local lifecycle event, not evidence that the on-chain swap failed. With the new resume flow, shutting down or restarting monitoring would incorrectly persist the swap as failed instead of leaving it resumable.

Suggested fix
 		case <-ctx.Done():
 			log.Infof("Context cancelled for swap %s monitoring", swapId)
-			chainSwapState.Swap.Fail("context cancelled")
 			return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_monitor.go` around lines 180 - 183, The ctx.Done() case should
not mark the on-chain swap as failed; remove or stop calling
chainSwapState.Swap.Fail("context cancelled") in the select branch that handles
<-ctx.Done(). Instead, keep a clear log using swapId and chainSwapState (e.g.,
log.Infof("Context cancelled for swap %s monitoring", swapId)) and simply return
so the monitor exits without persisting a failure, leaving the swap resumable by
the resume flow.
swap/chainswap_btc_ark_handler.go-272-277 (1)

272-277: ⚠️ Potential issue | 🟠 Major

Validate the claim hash length before signing.

hex.DecodeString only guarantees valid hex. copy(msg[:], txHashBytes) silently zero-pads or truncates anything not exactly 32 bytes, so you can submit a partial signature over the wrong message.

Suggested fix
 	txHashBytes, err := hex.DecodeString(claimDetails.TransactionHash)
 	if err != nil {
 		return fmt.Errorf("decode transaction hash: %w", err)
 	}
+	if len(txHashBytes) != 32 {
+		return fmt.Errorf("transaction hash must be 32 bytes, got %d", len(txHashBytes))
+	}
 	var msg [32]byte
 	copy(msg[:], txHashBytes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_btc_ark_handler.go` around lines 272 - 277, The code decodes
claimDetails.TransactionHash into txHashBytes and then blindly copies into msg
[32]byte, which can silently truncate or zero-pad; update the logic in the
handler (where txHashBytes, msg, and copy are used) to validate that
len(txHashBytes) == 32 after hex.DecodeString and return a clear error if not
(e.g., "invalid transaction hash length: expected 32 bytes, got X") before
performing the copy/signing step so you never sign a partial or padded message.
vhtlc/vhtlc.go-284-294 (1)

284-294: ⚠️ Potential issue | 🟠 Major

Return a copy of PreimageHash from Opts().

This currently exposes a subslice of internal state. For scripts built via NewVHTLCScriptFromOpts, mutating the returned PreimageHash also mutates the stored claim condition, so the same VHTLCScript instance can be corrupted after construction.

Suggested fix
 func (v *VHTLCScript) Opts() Opts {
+	preimageHash := append([]byte(nil), v.preimageConditionScript[2:2+hash160Len]...)
 	return Opts{
 		Sender:                               v.Sender,
 		Receiver:                             v.Receiver,
 		Server:                               v.Server,
-		PreimageHash:                         v.preimageConditionScript[2 : 2+hash160Len],
+		PreimageHash:                         preimageHash,
 		RefundLocktime:                       v.RefundWithoutReceiverClosure.Locktime,
 		UnilateralClaimDelay:                 v.UnilateralClaimClosure.Locktime,
 		UnilateralRefundDelay:                v.UnilateralRefundClosure.Locktime,
 		UnilateralRefundWithoutReceiverDelay: v.UnilateralRefundWithoutReceiverClosure.Locktime,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vhtlc/vhtlc.go` around lines 284 - 294, Opts() currently returns PreimageHash
as a subslice of v.preimageConditionScript exposing internal state; change it to
return a copy (e.g. use append([]byte(nil),
v.preimageConditionScript[2:2+hash160Len]) or allocate and copy) so modifying
the returned Opts.PreimageHash does not mutate
VHTLCScript.preimageConditionScript; update VHTLCScript.Opts() to produce that
copied byte slice.
swap/chainswap.go-920-943 (1)

920-943: ⚠️ Potential issue | 🟠 Major

Make the confirmation wait loop cancellable.

After broadcast, this loop sleeps/polls for up to two hours without checking ctx.Done(). That makes shutdown and request timeouts ineffective and can pin goroutines under load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap.go` around lines 920 - 943, The confirmation wait loop that
polls GetTransactionStatus for claimTxid (using startTime, maxWaitTime,
pollInterval) is not observing context cancellation; refactor the loop in the
method containing this code to use a select that listens on ctx.Done() as well
as a time.After/poll ticker so you can cancel immediately: call
h.explorerClient.GetTransactionStatus(ctx, ...) if that client supports context
or invoke it inside the select when ready, and when ctx is canceled return/exit
the function (or set an appropriate error) instead of continuing to sleep —
ensure confirmed handling (setting confirmed = true and breaking) remains
unchanged when a confirmation is received.
swap/chainswap.go-654-661 (1)

654-661: ⚠️ Potential issue | 🟠 Major

Emit the BTC funding address in the create event.

NewChainSwap fires CreateEvent immediately, but this call passes "" and only fills UserBtcLockupAddress afterward. Any listener persisting the create event will store an empty funding address for BTC→ARK swaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap.go` around lines 654 - 661, CreateEvent is fired inside
NewChainSwap but the BTC funding address is passed as an empty string, so
listeners persist an empty address; fix by supplying the real address
(swapResp.LockupDetails.LockupAddress) when constructing the swap instead of ""
(or move setting chainSwap.UserBtcLockupAddress to before NewChainSwap is
invoked). Update the NewChainSwap call at the call site (and its parameter list
usage) so the funding address is provided when CreateEvent is emitted.
swap/swap.go-821-827 (1)

821-827: ⚠️ Potential issue | 🟠 Major

Don't detach the collaborative refund from the swap context.

This failure path switches to context.Background() before calling RefundSwap. If Boltz or arkd stalls here, submarineSwap can block indefinitely and never reach the unilateral fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/swap.go` around lines 821 - 827, The RefundSwap call is using
context.Background() which detaches the collaborative refund from the swap’s
context and can block forever; change the call to pass the existing swap context
(e.g., ctx or the swap-specific context used in submarineSwap) into h.RefundSwap
so cancellation/timeout propagate and the unilateral fallback can proceed if
Boltz/arkd stalls; update the call site of h.RefundSwap (the RefundSwap
invocation within submarineSwap) to use the existing context and ensure that
context carries any deadlines/cancellation already applied to the swap.
swap/chainswap.go-47-77 (1)

47-77: 🛠️ Refactor suggestion | 🟠 Major

The exported mutable fields undermine the mutex.

Status, Error, the txids, and UserBtcLockupAddress are updated under mu, but callers can still read or write them directly without taking the lock. If ChainSwap is meant to be thread-safe, those fields need to be private behind getters and transition methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap.go` around lines 47 - 77, The exported mutable fields on
ChainSwap (Status, Error, UserBtcLockupAddress, UserLockTxid, ServerLockTxid,
ClaimTxid, RefundTxid) bypass the mutex; make these fields unexported (e.g.
status, err, userBtcLockupAddress, userLockTxid, serverLockTxid, claimTxid,
refundTxid) and provide thread-safe accessor and transition methods on ChainSwap
(GetStatus, SetStatus/TransitionToX, GetError/SetError, GetUserBtcLockupAddress,
GetUserLockTxid, etc.) that use mu (RLock/RUnlock for getters, Lock/Unlock for
setters/transitions); update all callers/tests to use the new accessors and any
swap-state transition logic to go through the ChainSwap methods so the mutex
actually protects those fields.
swap/chainswap.go-691-693 (1)

691-693: ⚠️ Potential issue | 🟠 Major

Use the provided context for collaborative refunds.

This method accepts ctx but detaches the actual RefundSwap call from it. A stuck Boltz/ARK request can't be canceled and delays the fallback path indefinitely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap.go` around lines 691 - 693, The RefundSwap call is using
context.Background(), detaching the operation from the caller's cancelation;
change the h.RefundSwap invocation in chainswap.go to pass the provided ctx
instead of context.Background() so the RefundSwap(SwapTypeChain, swapId, true,
vhtlcOpts) call can be cancelled when the outer context is cancelled, preserving
the existing arguments (swapId, SwapTypeChain, true, vhtlcOpts) and ensuring
cooperative cancellation for the Boltz/ARK request.
swap/swap.go-169-177 (1)

169-177: ⚠️ Potential issue | 🟠 Major

Don't silently drop extra VTXOs for the same script.

getVHTLCFunds can return multiple outputs, but these paths either take only vtxos[0] or sum all amounts while still passing just the first input into the batch session. That can strand funds or make the session amount no longer match its actual inputs.

Also applies to: 315-323, 1141-1164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/swap.go` around lines 169 - 177, getVHTLCFunds can return multiple vtxos
but the code currently only uses vtxos[0] (e.g., the vtxo variable) while
elsewhere sometimes sums all amounts, which can strand funds or make batch
session amounts inconsistent; update the logic in functions referencing
getVHTLCFunds (examples around the vtxos handling at the vHTLC lookup and the
code paths at lines noted) to either (A) consume all returned vtxos by passing
every output as inputs into the batch/session creation and use the summed amount
consistently, or (B) select a single output deterministically and only use its
amount when creating the session; ensure ErrorNoVtxosFound is still returned
when len(vtxos)==0 and replace usages of vtxos[0] with a loop/collection of
inputs (or a deterministic selector) so the session inputs and session amount
always match the actual inputs.
🟡 Minor comments (3)
swap/boltz/status.go-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor

Typo in constant name: TransactionServerMempoool has an extra 'o'.

Should be TransactionServerMempool with two 'o's.

✏️ Proposed fix
-	TransactionServerMempoool
+	TransactionServerMempool

And update the map key reference on line 51:

-	"transaction.server.mempool":   TransactionServerMempoool,
+	"transaction.server.mempool":   TransactionServerMempool,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/status.go` at line 25, Rename the misspelled constant
TransactionServerMempoool to TransactionServerMempool (remove the extra 'o') and
update any usages that reference the old name—specifically update the map key
that currently uses TransactionServerMempoool to use TransactionServerMempool so
all references (constant declaration and map lookup) match the corrected
identifier.
swap/bolt12_test.go-42-49 (1)

42-49: ⚠️ Potential issue | 🟡 Minor

Arguments to require.Equal are in reversed order.

Same issue as above - the expected value should come before the actual value.

🐛 Proposed fix for correct argument order
 	decoded_invoice, err := swap.DecodeBolt12Invoice(invoice)
 	require.NoError(t, err)
-	require.Equal(t, decoded_invoice.AmountInSats, uint64(5000))
+	require.Equal(t, uint64(5000), decoded_invoice.AmountInSats)
 	require.Equal(
 		t,
-		hex.EncodeToString(decoded_invoice.PaymentHash160),
 		"c4f9f68a9a92d115970386c1e4bd7308941278fc",
+		hex.EncodeToString(decoded_invoice.PaymentHash160),
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/bolt12_test.go` around lines 42 - 49, The require.Equal assertions have
the expected and actual arguments reversed; update the two require.Equal calls
so the expected value comes first and the actual value second (i.e., call
require.Equal(t, uint64(5000), decoded_invoice.AmountInSats) and
require.Equal(t, "c4f9f68a9a92d115970386c1e4bd7308941278fc",
hex.EncodeToString(decoded_invoice.PaymentHash160))) to correct the order for
decoded_invoice.AmountInSats and
hex.EncodeToString(decoded_invoice.PaymentHash160).
swap/bolt12_test.go-27-35 (1)

27-35: ⚠️ Potential issue | 🟡 Minor

Arguments to require.Equal are in reversed order.

The testify convention is require.Equal(t, expected, actual), but the code passes actual first. This causes confusing failure messages when tests fail (showing "expected X but got Y" with X and Y swapped).

🐛 Proposed fix for correct argument order
 	decoded_offer, err := swap.DecodeBolt12Offer(offer)
 	require.NoError(t, err)
-	require.Equal(t, decoded_offer.AmountInSats, uint64(5000))
-	require.Equal(t, decoded_offer.DescriptionStr, "fun")
+	require.Equal(t, uint64(5000), decoded_offer.AmountInSats)
+	require.Equal(t, "fun", decoded_offer.DescriptionStr)
 	require.Equal(
 		t,
-		decoded_offer.ID,
 		"4435a4b44691f3e2164dae39814ba0ed13b5557ed1e11bc72f7f2e80e336d4c3",
+		decoded_offer.ID,
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/bolt12_test.go` around lines 27 - 35, The require.Equal calls in the
test using decoded_offer are passing actual before expected; change them to the
testify convention require.Equal(t, expected, actual). Specifically update the
assertions that reference decoded_offer.AmountInSats,
decoded_offer.DescriptionStr, and decoded_offer.ID in the Test that calls
swap.DecodeBolt12Offer(offer) so each call becomes require.Equal(t, <expected
literal>, decoded_offer.<field>) (leave require.NoError as-is).
🧹 Nitpick comments (7)
test/infra/mockboltz/server.go (1)

1151-1195: HTTP client lacks timeout in fetchArkSignerPubKey.

The http.Get call uses the default client with no timeout. While the retry loop limits total attempts, each individual request could hang indefinitely.

♻️ Add HTTP client timeout
 func fetchArkSignerPubKey(arkdURL string) (*btcec.PublicKey, error) {
 	if strings.TrimSpace(arkdURL) == "" {
 		return nil, fmt.Errorf("MOCK_BOLTZ_ARKD_URL is required")
 	}
 	url := strings.TrimRight(strings.TrimSpace(arkdURL), "/") + "/v1/info"
+	client := &http.Client{Timeout: 5 * time.Second}
 	var lastErr error
 	for i := 0; i < 30; i++ {
-		resp, err := http.Get(url)
+		resp, err := client.Get(url)
 		if err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 1151 - 1195,
fetchArkSignerPubKey currently calls http.Get (default client) which can hang;
replace the http.Get usage with an explicit http.Client that has a reasonable
Timeout (e.g., a few seconds) and use client.Get(url) inside the retry loop,
preserving existing retry logic and error handling (keep lastErr, url, payload
parsing, and btcec.ParsePubKey behavior) so each attempt times out instead of
potentially blocking indefinitely.
test/infra/setup_infra.sh (1)

42-53: Unused variable out in run_quiet function.

The out variable is assigned on success but never read anywhere in the script. Shellcheck correctly flags this (SC2034).

♻️ Remove unused variable or use it

If stdout capture isn't needed:

 run_quiet() {
     local output
     output=$("$@" 2>&1)
     status=$?
     if [ $status -eq 0 ]; then
-        out="$output"
         return 0
     else
         err="$output"
         return $status
     fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/setup_infra.sh` around lines 42 - 53, The run_quiet function
assigns the captured stdout to a variable named out that is never used; remove
the unused variable to satisfy Shellcheck SC2034 or, if the caller needs the
output, expose it (e.g., set a named global or echo the output) instead of
assigning to out. Specifically, in run_quiet keep capturing output into local
output and status, then on success either echo "$output" (or assign it to a
caller-visible variable) and return 0, or simply drop the assignment to out and
only set err on failure so err and status remain meaningful. Ensure you update
uses of run_quiet accordingly if you choose to return/echo the stdout.
swap/boltz/status.go (1)

58-60: ParseEvent returns zero value for unknown events, which equals SwapCreated.

When an unrecognized event string is passed, the map lookup returns the zero value 0, which is indistinguishable from SwapCreated. Consider returning a sentinel value or using the comma-ok idiom.

♻️ Proposed fix with explicit unknown handling
+const (
+	SwapUnknown SwapUpdateEvent = iota - 1
+	SwapCreated
+	// ... rest unchanged
+)
+
 func ParseEvent(event string) SwapUpdateEvent {
-	return swapUpdateEventStrings[event]
+	if e, ok := swapUpdateEventStrings[event]; ok {
+		return e
+	}
+	return SwapUnknown
 }

Alternatively, add a second return value:

func ParseEvent(event string) (SwapUpdateEvent, bool) {
	e, ok := swapUpdateEventStrings[event]
	return e, ok
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/status.go` around lines 58 - 60, ParseEvent currently returns
swapUpdateEventStrings[event] which yields the zero value on unknown keys (same
as SwapCreated), causing ambiguity; update handling in ParseEvent by using the
comma-ok idiom to check the map (e, ok := swapUpdateEventStrings[event]) and
return a second boolean (func ParseEvent(event string) (SwapUpdateEvent, bool))
or alternatively define an explicit sentinel like SwapUpdateEventUnknown and
return that when ok is false; ensure you update callers to handle the new return
or sentinel.
test/infra/fulmine.Dockerfile (1)

17-28: Consider running the container as a non-root user.

The container runs as root by default, which is flagged by security scanners. While this is a test infrastructure container, adding a non-root user is a good security practice.

🔒 Proposed fix to add non-root user
 # Final image
 FROM alpine:3.20

+RUN addgroup -S fulmine && adduser -S fulmine -G fulmine
+
 WORKDIR /app

 COPY --from=go-builder /app/bin/* /app/

 ENV PATH="/app:${PATH}"
 ENV FULMINE_DATADIR=/app/data

+RUN chown -R fulmine:fulmine /app
 VOLUME /app/data

+USER fulmine
+
 ENTRYPOINT [ "fulmine" ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/fulmine.Dockerfile` around lines 17 - 28, Add a non-root user and
switch to it before ENTRYPOINT: create a user/group (e.g., fulmine:fulmine),
create and chown the data directory and /app to that user (referencing WORKDIR
/app and ENV FULMINE_DATADIR), and set USER to that non-root user prior to
ENTRYPOINT ["fulmine"]; ensure permissions allow the process to write to
$FULMINE_DATADIR and that binaries in /app are executable by the new user.
swap/bolt12_test.go (1)

11-16: Consider using idiomatic Go naming conventions.

Go convention prefers camelCase for variable names. invalid_offer and invalid_invoice would be more idiomatic as invalidOffer and invalidInvoice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/bolt12_test.go` around lines 11 - 16, Rename the snake_case test
variables to idiomatic camelCase: change invalid_offer to invalidOffer and
invalid_invoice to invalidInvoice, then update every reference to these symbols
(invalid_offer, invalid_invoice) throughout the test file (e.g., in assertions
and helper calls) so the code compiles and follows Go naming conventions; leave
other variables (offer, invoice) as-is unless you want to similarly CamelCase
them.
swap/boltz/utils.go (1)

13-30: Context cancellation not detected during sleep interval.

The current implementation doesn't respond to context cancellation while waiting on time.After(interval). If the context is cancelled mid-sleep, the function won't exit until the full interval elapses and the next iteration begins.

♻️ Proposed fix to make sleep context-aware
 	for {
 		select {
 		case <-ctx.Done():
 			if errors.Is(ctx.Err(), context.DeadlineExceeded) {
 				return fmt.Errorf("timed out")
 			}
 			return ctx.Err()
 		default:
 			done, err := fn(ctx)
 			if err != nil {
 				return err
 			}
 			if done {
 				return nil
 			}
-			<-time.After(interval)
+			select {
+			case <-ctx.Done():
+				if errors.Is(ctx.Err(), context.DeadlineExceeded) {
+					return fmt.Errorf("timed out")
+				}
+				return ctx.Err()
+			case <-time.After(interval):
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/utils.go` around lines 13 - 30, The loop that repeatedly calls
fn(ctx) doesn't observe context cancellation during the sleep because it uses
<-time.After(interval); update the wait so it is context-aware by replacing the
plain sleep with a select that waits on either ctx.Done() or
time.After(interval) and returns ctx.Err() (or wraps a deadline message) if
context is canceled; locate the polling loop where fn(ctx) is invoked and the
current <-time.After(interval) is used and change that single wait to a select
on ctx.Done() and time.After(interval) so cancellation is handled immediately.
test/infra/docker-compose.yml (1)

268-270: Pin the Boltz image to a release tag or digest.

Using boltz/boltz:latest makes the happy-path stack non-reproducible, so upstream image changes can break CI/regtest without any repo change. Please pin this to a known-good tag or digest.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/docker-compose.yml` around lines 268 - 270, The docker-compose
service "boltz" is using the floating image tag "boltz/boltz:latest" which makes
tests non-reproducible; update the "boltz" service in docker-compose.yml to pin
the image to a known-good release tag or an immutable digest (e.g., replace
"boltz/boltz:latest" with a specific tag like "boltz/boltz:vX.Y.Z" or
"boltz/boltz@sha256:..."), and include a brief comment noting the chosen
known-good version so CI/regtest remains reproducible and easy to update in
future.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 877aa83a-8ebe-4292-ac35-56457d3b432e

📥 Commits

Reviewing files that changed from the base of the PR and between 84eccba and 592206c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (46)
  • .github/workflows/integration.yaml
  • Makefile
  • go.mod
  • swap/batch_handler.go
  • swap/bolt12.go
  • swap/bolt12_test.go
  • swap/boltz/boltz.go
  • swap/boltz/status.go
  • swap/boltz/types.go
  • swap/boltz/utils.go
  • swap/boltz/ws.go
  • swap/btc_util.go
  • swap/chainswap.go
  • swap/chainswap_arkt_btc_handler.go
  • swap/chainswap_btc_ark_handler.go
  • swap/chainswap_monitor.go
  • swap/chainswap_resume.go
  • swap/chainswap_validate.go
  • swap/explorer.go
  • swap/musig.go
  • swap/musig2_handler.go
  • swap/swap.go
  • swap/utils.go
  • test/docker/docker-compose.yml
  • test/docker/setup.go
  • test/e2e/asset_test.go
  • test/e2e/batch_session_test.go
  • test/e2e/exit_test.go
  • test/e2e/history_test.go
  • test/e2e/swap_test.go
  • test/e2e/transaction_test.go
  • test/e2e/utils_test.go
  • test/infra/arkd.Dockerfile
  • test/infra/balances.sh
  • test/infra/cln.Dockerfile
  • test/infra/docker-compose.yml
  • test/infra/fulmine.Dockerfile
  • test/infra/mockboltz/Dockerfile
  • test/infra/mockboltz/server.go
  • test/infra/setup_infra.sh
  • test/infra/wallet.Dockerfile
  • vhtlc/opts.go
  • vhtlc/utils.go
  • vhtlc/vhtlc.go
  • vhtlc/vhtlc_fixtures.json
  • vhtlc/vhtlc_test.go
💤 Files with no reviewable changes (2)
  • test/docker/docker-compose.yml
  • test/docker/setup.go

Comment thread swap/bolt12.go
Comment thread swap/boltz/ws.go
Comment thread swap/btc_util.go Outdated
Comment thread swap/chainswap_arkt_btc_handler.go
Comment thread swap/chainswap_btc_ark_handler.go
Comment thread swap/chainswap.go Outdated
Comment thread swap/chainswap.go Outdated
Comment thread swap/swap.go
Comment thread swap/swap.go
Comment thread swap/utils.go
@arkanaai
Copy link
Copy Markdown

arkanaai Bot commented Mar 20, 2026

🔍 Arkana PR Review — arkade-os/go-sdk#113 (follow-up)

PR: Boltz Swap/VHTLC packages added
Author: sekulicd | New commits: 4 (addressing previous review)


Changes Since Last Review

All four new commits are labeled "pr review" — addressing feedback from the previous review round. The changes are solid:

✅ WebSocket Thread Safety (boltz/ws.go)

Added sync.RWMutex to protect conn, closed, reconnect, and swapIds fields. All read and write paths now properly acquire locks. The subscribe() method also nil-checks conn before writing. This was the most important fix — the previous version had data races on these fields between the read goroutine, ping goroutine, and external callers.

✅ Nil Callback Guards (chainswap.go, handlers)

  • NewChainSwap() now wraps the initial eventCallback(CreateEvent{...}) in a nil check — prevents panic if no callback is provided.
  • RefundArkToBTCSwap nil-checks unilateralRefundCallback before calling — prevents panic on the refund-without-callback path.
  • Both handler files (chainswap_arkt_btc_handler.go, chainswap_btc_ark_handler.go) have similar defensive guards. Good.

✅ Fee Estimation Fix (btc_util.go)

  • Added cooperativeClaimWitnessVBytes (17) and refundClaimWitnessVBytes (56) constants.
  • New computeClaimVSize() accounts for both witness paths — prevents fee underestimation that could cause stuck transactions. The unsigned tx skeleton has no witness, so adding the expected witness vbytes is the right approach.
  • Fixed the integer underflow: now checks feeAmount >= params.LockupAmount before subtracting, preventing a uint64 wraparound that would have created a massively overpaying output. This was a real bug in the previous version — good catch.

✅ Bolt12 Cleanup (bolt12.go)

Minor: reduced 21 lines of redundant code to 5. No behavioral change.


Remaining Minor Observations

  1. WS reconnection still lacks backoff: The reconnection loop in Connect() uses a fixed reconnectInterval (15s). Not a blocker, but exponential backoff would be more resilient against prolonged Boltz outages. Can be a follow-up.

  2. time.Sleep(5s) in handleBtcToArkServerLocked: Still present (line ~180 in chainswap_btc_ark_handler.go). This is the delay before cooperative signing for Boltz. It's non-blocking for the user (claim already completed), but a TODO to replace with event-driven timing would be cleaner.


Verdict

All critical issues from the previous review have been addressed. The WebSocket race conditions are fixed, the fee underflow bug is resolved, and nil-callback panics are guarded. LGTM — ready for human approval.

@arkanaai
Copy link
Copy Markdown

arkanaai Bot commented Mar 20, 2026

Review (iterative — 2 new commits since last review)

Cleanup/polish round. Changes:

swap/boltz/ws.go — RLock/RUnlock reordering: reads boltz.closed under lock and unlocks before each return path. Functionally equivalent to the previous pattern (both paths returned anyway), but now avoids the temp variable. Fine as-is — both unlock paths are covered.

test/infra/mockboltz/server.gopushSwapUpdate error now logged instead of silently dropped. Good catch. Import ordering fixed (goimports style).

test/e2e/swap_test.go — if/else→switch/case refactor. Cleaner.

test/e2e/utils_test.go — Removed unused waitForMockBoltzHealth. Dead code cleanup. 👍

Dockerfiles — Go 1.26.0 → 1.26.1 across arkd, fulmine, mockboltz, wallet Dockerfiles. Consistent toolchain bump.

Formatting — Line wrapping in chainswap_arkt_btc_handler.go, chainswap_btc_ark_handler.go, swap.go. No logic changes.

No concerns with this round.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@swap/boltz/ws.go`:
- Around line 43-49: The subscriptions channel is currently unbuffered (field
subscriptions chan bool) causing the reader goroutine to block if an ack arrives
when no goroutine is waiting; change subscriptions to a buffered channel (e.g.,
make(chan bool, 1)) and update all sends in the reader (the code path that
handles the "subscribe" ack) to perform non-blocking sends using a select with a
default case so an unexpected/delayed ack is dropped rather than blocking; apply
the same non-blocking send pattern wherever subscriptions is written (including
the subscribe()/ack handling and any reconnect code paths referenced around the
reader and reconnect logic) so reader/reconnect goroutines cannot deadlock on a
stalled receiver.

In `@swap/btc_util.go`:
- Around line 177-191: The control block builder must preserve the original leaf
versions from the swap tree instead of always writing BaseLeafVersion: update
createControlBlockFromSwapTree to use tree.ClaimLeaf.Version and
tree.RefundLeaf.Version when constructing the tap leaves (instead of calling
NewBaseTapLeaf/unconditionally inserting BaseLeafVersion), so the version bytes
written into the control block match the versions used by
computeSwapTreeMerkleRoot (which hashes tree.ClaimLeaf.Version and
tree.RefundLeaf.Version via tapLeafHash); ensure any helper used to build the
leaf or control block accepts/propagates the explicit Version value so the
control block and merkle root remain consistent.

In `@swap/chainswap_arkt_btc_handler.go`:
- Around line 596-604: The findLockupOutput function currently casts
wire.TxOut.Value to uint64 without validating it; add a guard in
findLockupOutput that checks each TxOut.Value from lockupTx.TxOut and rejects
non-positive values (Value <= 0) by returning an error (include txid/vout
context) before creating lockupTxOutput so you never cast negative or zero
values to uint64; update any callers that rely on the error to handle the
failure path if necessary.

In `@swap/chainswap_btc_ark_handler.go`:
- Around line 285-290: Check that claimDetails.TransactionHash decodes to
exactly 32 bytes before copying into msg: after hex.DecodeString into
txHashBytes, validate len(txHashBytes) == 32 and return a clear error (including
the invalid length and the original TransactionHash) if not; keep the fixed-size
buffer declaration var msg [32]byte and only perform copy(msg[:], txHashBytes)
once the length check passes. This change affects the transaction hash handling
around txHashBytes, claimDetails.TransactionHash and the msg [32]byte buffer
used for signing (from GetChainSwapClaimDetails flow).

In `@swap/swap.go`:
- Around line 169-177: ClaimVHTLC and RefundSwap currently only use the first
vtxo (vtxos[0]) returned by getVHTLCFunds while getBatchSessionArgs sums all
matching VTXOs into totalAmount, causing leftover funds and batch IO imbalance;
modify ClaimVHTLC, RefundSwap, and getBatchSessionArgs to iterate over the full
vtxos slice (not just vtxos[0]), aggregate their amounts into totalAmount, and
build transaction inputs/intents for every vtxo returned by getVHTLCFunds so all
matching outputs are consumed (update any use of the single variable vtxo to use
the full vtxos collection and ensure downstream functions that expect a single
input accept a list or are adjusted to process multiple inputs).
- Around line 774-777: The Boltz websocket created in submarineSwap via
h.boltzSvc.NewWebsocket() and ws.ConnectAndSubscribe(...) is never closed;
ensure you call ws.Close() so the reader/ping goroutines and remote subscription
are torn down: after a successful ConnectAndSubscribe, immediately defer
ws.Close() inside submarineSwap so it runs on every return, and if
ConnectAndSubscribe returns an error call ws.Close() before returning the error
(or ensure Close is invoked in the error path); reference NewWebsocket(),
ws.ConnectAndSubscribe(...), ws.Close(), and the submarineSwap function when
making the change.
- Around line 69-88: NewSwapHandler currently dereferences potentially nil
inputs and results (calling arkClient.GetConfigData, using *cfg and
privateKey.PubKey) which can panic; update NewSwapHandler to validate inputs
before dereferencing: return an error if arkClient or privateKey is nil, call
arkClient.GetConfigData only after checking arkClient and handle a nil/invalid
cfg result instead of dereferencing *cfg (or copy cfg safely), and call
privateKey.PubKey() only after confirming privateKey is non-nil; ensure all
failures return descriptive errors from NewSwapHandler (e.g., "nil arkClient",
"failed to get config data: %w", "nil privateKey") rather than allowing a panic.

In `@swap/utils.go`:
- Around line 244-256: In decodeInvoice, detect and reject invoices where
bolt11.MSatoshi is not an exact multiple of 1000 (i.e. bolt11.MSatoshi % 1000 !=
0) so non-whole-satoshi invoices fail instead of being floored; update
decodeInvoice to return an error when that condition is met (preserving the
existing return types) before computing amount := uint64(bolt11.MSatoshi / 1000)
and the preimage hash (this prevents reverseSwap from receiving an incorrect sat
amount).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: da599945-3720-4dc8-957f-4b8dbdb6468d

📥 Commits

Reviewing files that changed from the base of the PR and between 592206c and d6e4278.

📒 Files selected for processing (10)
  • .github/workflows/integration.yaml
  • .github/workflows/unit.yaml
  • swap/bolt12.go
  • swap/boltz/ws.go
  • swap/btc_util.go
  • swap/chainswap.go
  • swap/chainswap_arkt_btc_handler.go
  • swap/chainswap_btc_ark_handler.go
  • swap/swap.go
  • swap/utils.go
✅ Files skipped from review due to trivial changes (2)
  • .github/workflows/unit.yaml
  • .github/workflows/integration.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • swap/bolt12.go
  • swap/chainswap.go

Comment thread swap/boltz/ws.go
Comment on lines +43 to +49
subscriptions chan bool
mu sync.RWMutex
conn *websocket.Conn
closed bool
reconnect bool
dialer *websocket.Dialer
swapIds []string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make subscription acknowledgements non-blocking.

subscriptions is an unbuffered shared channel. If the "subscribe" ack arrives after the 5-second timeout—or when no subscribe() call is currently waiting—the send in the reader goroutine blocks forever, and no more updates or reconnect handling can run.

Suggested fix
 type Websocket struct {
 	Updates chan SwapUpdate
 
 	apiUrl        string
-	subscriptions chan bool
+	subscriptions chan struct{}
 	mu            sync.RWMutex
 	conn          *websocket.Conn
@@
 	return &Websocket{
 		apiUrl:        boltz.WSURL,
-		subscriptions: make(chan bool),
+		subscriptions: make(chan struct{}, 1),
 		dialer:        &dialer,
 		Updates:       make(chan SwapUpdate),
 	}
 }
@@
 				case "subscribe":
-					boltz.subscriptions <- true
+					select {
+					case boltz.subscriptions <- struct{}{}:
+					default:
+					}
 					continue

Also applies to: 166-168, 226-230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/ws.go` around lines 43 - 49, The subscriptions channel is
currently unbuffered (field subscriptions chan bool) causing the reader
goroutine to block if an ack arrives when no goroutine is waiting; change
subscriptions to a buffered channel (e.g., make(chan bool, 1)) and update all
sends in the reader (the code path that handles the "subscribe" ack) to perform
non-blocking sends using a select with a default case so an unexpected/delayed
ack is dropped rather than blocking; apply the same non-blocking send pattern
wherever subscriptions is written (including the subscribe()/ack handling and
any reconnect code paths referenced around the reader and reconnect logic) so
reader/reconnect goroutines cannot deadlock on a stalled receiver.

Comment thread swap/btc_util.go
Comment on lines +177 to +191
func computeSwapTreeMerkleRoot(tree boltz.SwapTree) ([]byte, error) {
claimScript, err := hex.DecodeString(tree.ClaimLeaf.Output)
if err != nil {
return nil, fmt.Errorf("decode claim leaf script: %w", err)
}
refundScript, err := hex.DecodeString(tree.RefundLeaf.Output)
if err != nil {
return nil, fmt.Errorf("decode refund leaf script: %w", err)
}

claimLeafHash := tapLeafHash(tree.ClaimLeaf.Version, claimScript)
refundLeafHash := tapLeafHash(tree.RefundLeaf.Version, refundScript)

h := computeMerkleRoot(claimLeafHash[:], refundLeafHash[:])
return h[:], nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the swap-tree leaf versions when building the control block.

computeSwapTreeMerkleRoot hashes swapTree.ClaimLeaf.Version and swapTree.RefundLeaf.Version, but createControlBlockFromSwapTree recreates both leaves with NewBaseTapLeaf and always writes BaseLeafVersion into the control block. If Boltz ever returns a non-base leaf version, the control block no longer matches the merkle root you tweaked against, and the script-path claim will fail.

Also applies to: 230-253

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/btc_util.go` around lines 177 - 191, The control block builder must
preserve the original leaf versions from the swap tree instead of always writing
BaseLeafVersion: update createControlBlockFromSwapTree to use
tree.ClaimLeaf.Version and tree.RefundLeaf.Version when constructing the tap
leaves (instead of calling NewBaseTapLeaf/unconditionally inserting
BaseLeafVersion), so the version bytes written into the control block match the
versions used by computeSwapTreeMerkleRoot (which hashes tree.ClaimLeaf.Version
and tree.RefundLeaf.Version via tapLeafHash); ensure any helper used to build
the leaf or control block accepts/propagates the explicit Version value so the
control block and merkle root remain consistent.

Comment on lines +596 to +604
func findLockupOutput(lockupTx *wire.MsgTx, expectedPkScript []byte) (*lockupTxOutput, error) {
for vout, out := range lockupTx.TxOut {
if bytes.Equal(out.PkScript, expectedPkScript) {
return &lockupTxOutput{
txid: lockupTx.TxHash().String(),
vout: uint32(vout),
amount: uint64(out.Value),
pkScript: out.PkScript,
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-positive lockup outputs before casting to uint64.

wire.TxOut.Value is signed. If a malformed lockup transaction contains Value <= 0, the cast to uint64 wraps negatives into a huge amount and the claim builder works with nonsense funds instead of failing cleanly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_arkt_btc_handler.go` around lines 596 - 604, The
findLockupOutput function currently casts wire.TxOut.Value to uint64 without
validating it; add a guard in findLockupOutput that checks each TxOut.Value from
lockupTx.TxOut and rejects non-positive values (Value <= 0) by returning an
error (include txid/vout context) before creating lockupTxOutput so you never
cast negative or zero values to uint64; update any callers that rely on the
error to handle the failure path if necessary.

Comment on lines +285 to +290
txHashBytes, err := hex.DecodeString(claimDetails.TransactionHash)
if err != nil {
return fmt.Errorf("decode transaction hash: %w", err)
}
var msg [32]byte
copy(msg[:], txHashBytes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate TransactionHash length before copying it into the signing buffer.

copy(msg[:], txHashBytes) silently zero-pads short hashes and truncates long ones. A malformed GetChainSwapClaimDetails response would still produce a partial signature, just over the wrong message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_btc_ark_handler.go` around lines 285 - 290, Check that
claimDetails.TransactionHash decodes to exactly 32 bytes before copying into
msg: after hex.DecodeString into txHashBytes, validate len(txHashBytes) == 32
and return a clear error (including the invalid length and the original
TransactionHash) if not; keep the fixed-size buffer declaration var msg [32]byte
and only perform copy(msg[:], txHashBytes) once the length check passes. This
change affects the transaction hash handling around txHashBytes,
claimDetails.TransactionHash and the msg [32]byte buffer used for signing (from
GetChainSwapClaimDetails flow).

Comment thread swap/swap.go
Comment on lines +69 to +88
func NewSwapHandler(
arkClient arksdk.ArkClient,
boltzSvc *boltz.Api,
esploraURL string,
privateKey *btcec.PrivateKey,
timeout uint32,
) (*SwapHandler, error) {
cfg, err := arkClient.GetConfigData(context.Background())
if err != nil {
return nil, fmt.Errorf("failed to get config data: %w", err)
}
return &SwapHandler{
arkClient: arkClient,
boltzSvc: boltzSvc,
explorerClient: NewExplorerClient(esploraURL),
privateKey: privateKey,
publicKey: privateKey.PubKey(),
timeout: timeout,
config: *cfg,
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate constructor dependencies before dereferencing them.

arkClient.GetConfigData(...), privateKey.PubKey(), and *cfg can all panic here. Since NewSwapHandler already returns an error, bad inputs should fail fast with a regular error instead of crashing the process.

Suggested fix
 func NewSwapHandler(
 	arkClient arksdk.ArkClient,
 	boltzSvc *boltz.Api,
 	esploraURL string,
 	privateKey *btcec.PrivateKey,
 	timeout uint32,
 ) (*SwapHandler, error) {
+	if arkClient == nil {
+		return nil, errors.New("nil ark client")
+	}
+	if boltzSvc == nil {
+		return nil, errors.New("nil boltz service")
+	}
+	if privateKey == nil {
+		return nil, errors.New("nil private key")
+	}
+
 	cfg, err := arkClient.GetConfigData(context.Background())
 	if err != nil {
 		return nil, fmt.Errorf("failed to get config data: %w", err)
 	}
+	if cfg == nil {
+		return nil, errors.New("nil config data")
+	}
 	return &SwapHandler{
 		arkClient:      arkClient,
 		boltzSvc:       boltzSvc,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/swap.go` around lines 69 - 88, NewSwapHandler currently dereferences
potentially nil inputs and results (calling arkClient.GetConfigData, using *cfg
and privateKey.PubKey) which can panic; update NewSwapHandler to validate inputs
before dereferencing: return an error if arkClient or privateKey is nil, call
arkClient.GetConfigData only after checking arkClient and handle a nil/invalid
cfg result instead of dereferencing *cfg (or copy cfg safely), and call
privateKey.PubKey() only after confirming privateKey is non-nil; ensure all
failures return descriptive errors from NewSwapHandler (e.g., "nil arkClient",
"failed to get config data: %w", "nil privateKey") rather than allowing a panic.

Comment thread swap/swap.go
Comment on lines +169 to +177
vtxos, err := h.getVHTLCFunds(ctx, []*vhtlc.VHTLCScript{vHTLC})
if err != nil {
return "", err
}
if len(vtxos) == 0 {
return "", ErrorNoVtxosFound
}

vtxo := &vtxos[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Multi-VTXO swaps are only partially handled.

ClaimVHTLC and RefundSwap pick vtxos[0], while getBatchSessionArgs sums every matching VTXO into totalAmount but still only forwards vtxos[0]. If a swap script accumulates more than one VTXO, the direct paths leave funds behind and the batch-session intents can become input/output imbalanced.

Also applies to: 315-323, 1158-1188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/swap.go` around lines 169 - 177, ClaimVHTLC and RefundSwap currently
only use the first vtxo (vtxos[0]) returned by getVHTLCFunds while
getBatchSessionArgs sums all matching VTXOs into totalAmount, causing leftover
funds and batch IO imbalance; modify ClaimVHTLC, RefundSwap, and
getBatchSessionArgs to iterate over the full vtxos slice (not just vtxos[0]),
aggregate their amounts into totalAmount, and build transaction inputs/intents
for every vtxo returned by getVHTLCFunds so all matching outputs are consumed
(update any use of the single variable vtxo to use the full vtxos collection and
ensure downstream functions that expect a single input accept a list or are
adjusted to process multiple inputs).

Comment thread swap/swap.go
Comment on lines +774 to +777
ws := h.boltzSvc.NewWebsocket()
if err := ws.ConnectAndSubscribe(ctx, []string{swap.Id}, 5*time.Second); err != nil {
return nil, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Always close the Boltz websocket opened for submarineSwap.

This function can return from several later success/failure branches, but the websocket created here is never closed. That leaves the reader/ping goroutines and the remote subscription alive after the swap finishes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/swap.go` around lines 774 - 777, The Boltz websocket created in
submarineSwap via h.boltzSvc.NewWebsocket() and ws.ConnectAndSubscribe(...) is
never closed; ensure you call ws.Close() so the reader/ping goroutines and
remote subscription are torn down: after a successful ConnectAndSubscribe,
immediately defer ws.Close() inside submarineSwap so it runs on every return,
and if ConnectAndSubscribe returns an error call ws.Close() before returning the
error (or ensure Close is invoked in the error path); reference NewWebsocket(),
ws.ConnectAndSubscribe(...), ws.Close(), and the submarineSwap function when
making the change.

Comment thread swap/utils.go
Comment on lines +244 to +256
func decodeInvoice(invoice string) (uint64, []byte, error) {
bolt11, err := decodepay.Decodepay(invoice)
if err != nil {
return 0, nil, err
}

amount := uint64(bolt11.MSatoshi / 1000)
preimageHash, err := hex.DecodeString(bolt11.PaymentHash)
if err != nil {
return 0, nil, err
}

return amount, input.Ripemd160H(preimageHash), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject invoices that are not a whole number of sats.

bolt11.MSatoshi / 1000 floors any leftover millisats. That means a non-sat invoice can be treated as the wrong sat amount in reverseSwap instead of failing cleanly.

Suggested fix
 func decodeInvoice(invoice string) (uint64, []byte, error) {
 	bolt11, err := decodepay.Decodepay(invoice)
 	if err != nil {
 		return 0, nil, err
 	}
 
+	if bolt11.MSatoshi%1000 != 0 {
+		return 0, nil, fmt.Errorf("invoice amount %d msat is not a whole sat", bolt11.MSatoshi)
+	}
 	amount := uint64(bolt11.MSatoshi / 1000)
 	preimageHash, err := hex.DecodeString(bolt11.PaymentHash)
 	if err != nil {
 		return 0, nil, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func decodeInvoice(invoice string) (uint64, []byte, error) {
bolt11, err := decodepay.Decodepay(invoice)
if err != nil {
return 0, nil, err
}
amount := uint64(bolt11.MSatoshi / 1000)
preimageHash, err := hex.DecodeString(bolt11.PaymentHash)
if err != nil {
return 0, nil, err
}
return amount, input.Ripemd160H(preimageHash), nil
func decodeInvoice(invoice string) (uint64, []byte, error) {
bolt11, err := decodepay.Decodepay(invoice)
if err != nil {
return 0, nil, err
}
if bolt11.MSatoshi%1000 != 0 {
return 0, nil, fmt.Errorf("invoice amount %d msat is not a whole sat", bolt11.MSatoshi)
}
amount := uint64(bolt11.MSatoshi / 1000)
preimageHash, err := hex.DecodeString(bolt11.PaymentHash)
if err != nil {
return 0, nil, err
}
return amount, input.Ripemd160H(preimageHash), nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/utils.go` around lines 244 - 256, In decodeInvoice, detect and reject
invoices where bolt11.MSatoshi is not an exact multiple of 1000 (i.e.
bolt11.MSatoshi % 1000 != 0) so non-whole-satoshi invoices fail instead of being
floored; update decodeInvoice to return an error when that condition is met
(preserving the existing return types) before computing amount :=
uint64(bolt11.MSatoshi / 1000) and the preimage hash (this prevents reverseSwap
from receiving an incorrect sat amount).

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (2)
swap/chainswap_btc_ark_handler.go (1)

286-291: ⚠️ Potential issue | 🟠 Major

Reject non-32-byte claim hashes before signing.

copy(msg[:], txHashBytes) silently zero-pads or truncates malformed TransactionHash values, so this path can produce a partial signature over the wrong message.

Suggested fix
 	txHashBytes, err := hex.DecodeString(claimDetails.TransactionHash)
 	if err != nil {
 		return fmt.Errorf("decode transaction hash: %w", err)
 	}
+	if len(txHashBytes) != 32 {
+		return fmt.Errorf(
+			"invalid transaction hash length %d for %q",
+			len(txHashBytes),
+			claimDetails.TransactionHash,
+		)
+	}
 	var msg [32]byte
 	copy(msg[:], txHashBytes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/chainswap_btc_ark_handler.go` around lines 286 - 291, The code currently
decodes claimDetails.TransactionHash into txHashBytes and blindly does
copy(msg[:], txHashBytes) which will silently zero-pad or truncate non-32-byte
inputs; instead validate the decoded length first and return an error if
len(txHashBytes) != 32 before constructing msg and signing. Specifically, in the
block that handles txHashBytes (use the symbols claimDetails.TransactionHash,
txHashBytes, msg [32]byte and the copy call), add a length check and fail fast
with a clear error when the decoded hash is not exactly 32 bytes so you never
sign a partially padded or truncated message.
swap/boltz/ws.go (1)

42-43: ⚠️ Potential issue | 🟠 Major

Buffer subscription acks so the reader can’t deadlock.

subscriptions is still unbuffered. If a "subscribe" ack arrives after the 5-second wait has already timed out—or when no subscribe() call is currently waiting—the reader goroutine blocks forever on the send, and no more updates or reconnect handling can run.

Suggested fix
 type Websocket struct {
 	Updates chan SwapUpdate
 
 	apiUrl        string
-	subscriptions chan bool
+	subscriptions chan struct{}
 	mu            sync.RWMutex
 	conn          *websocket.Conn
 	closed        bool
@@
 	return &Websocket{
 		apiUrl:        boltz.WSURL,
-		subscriptions: make(chan bool),
+		subscriptions: make(chan struct{}, 1),
 		dialer:        &dialer,
 		Updates:       make(chan SwapUpdate),
 	}
 }
@@
 				case "subscribe":
-					boltz.subscriptions <- true
+					select {
+					case boltz.subscriptions <- struct{}{}:
+					default:
+					}
 					continue

Also applies to: 67-70, 166-168, 226-230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/ws.go` around lines 42 - 43, The subscriptions channel is
unbuffered which lets the reader goroutine block forever when it tries to send
an ack if no subscribe() caller is waiting; change the channel to a buffered
channel (e.g., capacity 1) where it is declared/initialized (the subscriptions
variable and any make(chan bool) instances referenced in the file) so the reader
can send an ack without blocking, and apply the same buffer change to the other
similar channels found in the file (the other channel
declarations/initializations mentioned around the other ranges) to prevent
deadlocks in the reader goroutine.
🧹 Nitpick comments (1)
test/infra/fulmine.Dockerfile (1)

13-14: Consider additional build optimizations for smaller, more secure images.

The build configuration is solid, but you could consider these optional enhancements:

  • Add -trimpath to remove local path information from the binary
  • Add -s -w to strip debug symbols and reduce binary size
  • Add LABEL directives for image metadata (version, commit, build date)
  • Add HEALTHCHECK for better container orchestration
♻️ Optional build and metadata improvements
 # Build without web assets (not needed for boltz-fulmine sidecar)
 RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} \
-    go build -ldflags="-X 'main.version=${VERSION}'" -o bin/fulmine cmd/fulmine/main.go
+    go build -trimpath -ldflags="-s -w -X 'main.version=${VERSION}'" -o bin/fulmine cmd/fulmine/main.go

 # Final image
 FROM alpine:3.20

+LABEL org.opencontainers.image.version="${VERSION}" \
+      org.opencontainers.image.source="https://github.com/ArkLabsHQ/fulmine" \
+      org.opencontainers.image.title="Boltz Fulmine Sidecar"
+
 WORKDIR /app

 COPY --from=go-builder /app/bin/* /app/

 ENV PATH="/app:${PATH}"
 ENV FULMINE_DATADIR=/app/data

 VOLUME /app/data

+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD fulmine healthcheck || exit 1
+
 ENTRYPOINT [ "fulmine" ]

Note: Verify that the fulmine binary supports a healthcheck subcommand before adding the HEALTHCHECK directive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/fulmine.Dockerfile` around lines 13 - 14, Update the Dockerfile
RUN build step that compiles cmd/fulmine/main.go to add Go linker flags and
build flags for smaller binaries (add -trimpath and -ldflags="-s -w -X
'main.version=${VERSION}'" or incorporate -s -w into the existing -ldflags) and
keep CGO_ENABLED/GOOS/GOARCH usage as-is; also add image LABEL directives
exposing version/commit/build date (use ${VERSION}, commit hash and build date
build args) and, if the fulmine binary implements a healthcheck subcommand, add
a HEALTHCHECK directive invoking the binary (fallback to a simple exit 0/1 check
if not available) to improve metadata and runtime observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@swap/boltz/ws.go`:
- Around line 75-79: Connect() and the reader/reconnect loops reopen the
websocket even after Close() sets boltz.closed = true; before any dial or retry
sleep return if boltz.closed is set. Modify Websocket.Connect(), the reader
goroutine's reconnect/backoff code, and any retry loops to check boltz.closed
(under the same mutex/atomic used by Close()) before attempting url.Parse/dial
or sleeping for backoff, and bail out cleanly instead of retrying; use the
existing Close(), boltz.closed and the same synchronization primitive used
elsewhere to avoid races.

In `@test/e2e/swap_test.go`:
- Around line 48-53: The shared events slice is mutated by the async
swap-monitor goroutine via eventCallback (func(event swap.ChainSwapEvent)) and
later read by the test goroutine, causing a data race; protect access by
serializing modifies/reads (e.g., add a sync.Mutex or use a channel) around the
events slice: wrap all append operations in eventCallback and all test-side
reads/assertions with the same mutex.lock/unlock (or send events to a buffered
channel and collect from it in the test), and apply the same protection to the
other similar callback sites mentioned so all accesses to the events slice are
synchronized.

In `@test/e2e/utils_test.go`:
- Around line 243-259: The helper setMockBoltzConfig (and the other mock-boltz
helpers in this file) use http.DefaultClient/NewRequest with no context/timeout
and proceed to read/parse responses without first verifying StatusCode; update
setMockBoltzConfig to create a context with a short timeout (e.g.,
context.WithTimeout), build the request with http.NewRequestWithContext, use an
http.Client with a timeout or the context, perform the request, immediately
check resp.StatusCode and return/fail with the resp body included in the error
message if it's not 200 OK, then only read/parse the body on success and always
defer resp.Body.Close(); apply the same pattern to the other helpers referenced
(the GET/POST mock-boltz helpers in the file).

In `@test/infra/fulmine.Dockerfile`:
- Around line 4-7: Add the missing build args to the boltz-fulmine service so
VERSION, TARGETOS, TARGETARCH and FULMINE_BRANCH are passed into the Docker
build; specifically update the service's build block in docker-compose.yml to
provide args for VERSION (e.g. "test-infra" or a real version string), TARGETOS
(e.g. "linux"), TARGETARCH (e.g. "amd64") and FULMINE_BRANCH (pin to a
branch/tag instead of defaulting to "master") so the ARGs declared in the
fulmine.Dockerfile (VERSION, TARGETOS, TARGETARCH, FULMINE_BRANCH) are populated
during image build and the binary contains version info.
- Around line 17-28: The image currently runs as root; create a dedicated
non-root user and switch to it: add steps to create a user/group (e.g. fulmine
user), ensure ownership of the FULMINE_DATADIR (/app/data) and any copied
binaries is set to that user, and add a USER directive before the ENTRYPOINT so
the container runs unprivileged; update references around ENV FULMINE_DATADIR
and ENTRYPOINT "fulmine" to ensure the new user can execute the binary and write
to the data directory.
- Around line 7-10: The Dockerfile currently uses ARG FULMINE_BRANCH and runs
"git clone ... --branch ${FULMINE_BRANCH}" which leads to non-reproducible
builds; change the build arg to something like FULMINE_COMMIT and update the RUN
step that performs the clone (the RUN git clone ... line) so it fetches and
checks out a specific commit/tag (use a minimal fetch and git checkout of
${FULMINE_COMMIT} or otherwise fetch by commit hash) instead of cloning a moving
branch; also update any consumers (e.g., docker-compose build args) to pass the
pinned FULMINE_COMMIT value.

In `@test/infra/mockboltz/Dockerfile`:
- Around line 20-25: The final Docker stage runs as root; create and switch to a
non-root user and ensure the binary is owned and executable by that user before
switching. In the Dockerfile change the runtime stage that contains EXPOSE 9001
and ENTRYPOINT ["/app/mock-boltz"] to add a user (e.g., addgroup/adduser or
adduser -D), chown /app/mock-boltz and any required dirs to that user, set
appropriate file permissions, and add a USER <username> line before the
ENTRYPOINT so the container no longer runs as root.

In `@test/infra/mockboltz/server.go`:
- Around line 591-609: handleWS is modifying client.subs without locking while
pushSwapUpdate reads it, causing a concurrent map panic; wrap all writes to
client.subs with the client's mutex (client.mu) the same way reads are
protected. Specifically, in the subscribe branch where you iterate msg.Args and
set client.subs[id] = struct{}{} (and the similar block at the other location
mentioned), acquire client.mu.Lock() before mutating client.subs and defer or
unlock after the updates, keeping the existing conn.WriteJSON ack calls
protected as needed to avoid races between subscription changes and
pushSwapUpdate.
- Around line 276-280: The goroutine uses an undefined logger reference
s.log.Printf which breaks compilation; replace that call with the mock server's
test logger (s.t.Logf) so the error is logged via the existing testing.T field,
i.e. in the anonymous func that calls s.pushSwapUpdate change s.log.Printf(...)
to s.t.Logf(...); if the Server struct actually uses a different logger field
(e.g. s.logger), use that field's Printf method instead.

---

Duplicate comments:
In `@swap/boltz/ws.go`:
- Around line 42-43: The subscriptions channel is unbuffered which lets the
reader goroutine block forever when it tries to send an ack if no subscribe()
caller is waiting; change the channel to a buffered channel (e.g., capacity 1)
where it is declared/initialized (the subscriptions variable and any make(chan
bool) instances referenced in the file) so the reader can send an ack without
blocking, and apply the same buffer change to the other similar channels found
in the file (the other channel declarations/initializations mentioned around the
other ranges) to prevent deadlocks in the reader goroutine.

In `@swap/chainswap_btc_ark_handler.go`:
- Around line 286-291: The code currently decodes claimDetails.TransactionHash
into txHashBytes and blindly does copy(msg[:], txHashBytes) which will silently
zero-pad or truncate non-32-byte inputs; instead validate the decoded length
first and return an error if len(txHashBytes) != 32 before constructing msg and
signing. Specifically, in the block that handles txHashBytes (use the symbols
claimDetails.TransactionHash, txHashBytes, msg [32]byte and the copy call), add
a length check and fail fast with a clear error when the decoded hash is not
exactly 32 bytes so you never sign a partially padded or truncated message.

---

Nitpick comments:
In `@test/infra/fulmine.Dockerfile`:
- Around line 13-14: Update the Dockerfile RUN build step that compiles
cmd/fulmine/main.go to add Go linker flags and build flags for smaller binaries
(add -trimpath and -ldflags="-s -w -X 'main.version=${VERSION}'" or incorporate
-s -w into the existing -ldflags) and keep CGO_ENABLED/GOOS/GOARCH usage as-is;
also add image LABEL directives exposing version/commit/build date (use
${VERSION}, commit hash and build date build args) and, if the fulmine binary
implements a healthcheck subcommand, add a HEALTHCHECK directive invoking the
binary (fallback to a simple exit 0/1 check if not available) to improve
metadata and runtime observability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b1a26417-a352-4c40-92ef-e191a130e007

📥 Commits

Reviewing files that changed from the base of the PR and between d6e4278 and 07e1b9f.

📒 Files selected for processing (11)
  • swap/boltz/ws.go
  • swap/chainswap_arkt_btc_handler.go
  • swap/chainswap_btc_ark_handler.go
  • swap/swap.go
  • test/e2e/swap_test.go
  • test/e2e/utils_test.go
  • test/infra/arkd.Dockerfile
  • test/infra/fulmine.Dockerfile
  • test/infra/mockboltz/Dockerfile
  • test/infra/mockboltz/server.go
  • test/infra/wallet.Dockerfile
✅ Files skipped from review due to trivial changes (2)
  • test/infra/arkd.Dockerfile
  • test/infra/wallet.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (2)
  • swap/chainswap_arkt_btc_handler.go
  • swap/swap.go

Comment thread swap/boltz/ws.go
Comment on lines +75 to +79
func (boltz *Websocket) Connect() error {
wsUrl, err := url.Parse(boltz.apiUrl)
if err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop reconnecting once the websocket has been closed.

Close() flips closed = true, but neither Connect() nor the reconnect loop checks that flag before dialing again. If Close() happens while the reader goroutine is already in the backoff path, this code will reopen a client the caller explicitly shut down.

Suggested fix
 func (boltz *Websocket) Connect() error {
+	boltz.mu.RLock()
+	closed := boltz.closed
+	boltz.mu.RUnlock()
+	if closed {
+		return errors.New("websocket is closed")
+	}
+
 	wsUrl, err := url.Parse(boltz.apiUrl)
 	if err != nil {
 		return err
 	}
@@
 		for {
 			pingTicker.Stop()
 			boltz.mu.Lock()
+			closed := boltz.closed
 			reconnect := boltz.reconnect
 			if reconnect {
 				boltz.reconnect = false
 			}
 			boltz.mu.Unlock()
-			if reconnect {
+			if closed || reconnect {
 				return
 			} else {
 				time.Sleep(reconnectInterval)
 			}
 			err := boltz.Connect()

Also applies to: 173-189, 258-281

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@swap/boltz/ws.go` around lines 75 - 79, Connect() and the reader/reconnect
loops reopen the websocket even after Close() sets boltz.closed = true; before
any dial or retry sleep return if boltz.closed is set. Modify
Websocket.Connect(), the reader goroutine's reconnect/backoff code, and any
retry loops to check boltz.closed (under the same mutex/atomic used by Close())
before attempting url.Parse/dial or sleeping for backoff, and bail out cleanly
instead of retrying; use the existing Close(), boltz.closed and the same
synchronization primitive used elsewhere to avoid races.

Comment thread test/e2e/swap_test.go
Comment on lines +48 to +53
// Track swap events to verify the correct state machine transitions
var events []swap.ChainSwapEvent
eventCallback := func(event swap.ChainSwapEvent) {
events = append(events, event)
t.Logf("ChainSwap event: %T", event)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Protect the shared events slices.

Each of these callbacks appends to a slice from an async swap-monitor goroutine, while the test goroutine later reads the same slice for assertions. That is a real data race.

Suggested fix
-	var events []swap.ChainSwapEvent
+	var (
+		eventsMu sync.Mutex
+		events   []swap.ChainSwapEvent
+	)
 	eventCallback := func(event swap.ChainSwapEvent) {
+		eventsMu.Lock()
 		events = append(events, event)
+		eventsMu.Unlock()
 		t.Logf("ChainSwap event: %T", event)
 	}
@@
-				for _, e := range events {
+				eventsMu.Lock()
+				snapshot := append([]swap.ChainSwapEvent(nil), events...)
+				eventsMu.Unlock()
+				for _, e := range snapshot {
 					switch e.(type) {

Also applies to: 148-152, 368-372, 457-460, 558-561, 657-660, 761-764, 860-863

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/swap_test.go` around lines 48 - 53, The shared events slice is
mutated by the async swap-monitor goroutine via eventCallback (func(event
swap.ChainSwapEvent)) and later read by the test goroutine, causing a data race;
protect access by serializing modifies/reads (e.g., add a sync.Mutex or use a
channel) around the events slice: wrap all append operations in eventCallback
and all test-side reads/assertions with the same mutex.lock/unlock (or send
events to a buffered channel and collect from it in the test), and apply the
same protection to the other similar callback sites mentioned so all accesses to
the events slice are synchronized.

Comment thread test/e2e/utils_test.go
Comment on lines +243 to +259
func setMockBoltzConfig(t *testing.T, cfg map[string]any) {
t.Helper()
body, err := json.Marshal(cfg)
require.NoError(t, err)

req, err := http.NewRequest("POST", mockBoltzAdminURL+"/admin/config", bytes.NewReader(body))
require.NoError(t, err)
req.Header.Set("Content-Type", "application/json")

resp, err := http.DefaultClient.Do(req)
require.NoError(t, err)
defer func() { _ = resp.Body.Close() }()

respBody, _ := io.ReadAll(resp.Body)
require.Equalf(t, http.StatusOK, resp.StatusCode,
"setMockBoltzConfig failed: %s", string(respBody))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bound and validate the mock-boltz admin requests.

These helpers use http.DefaultClient/http.Get with no timeout or request context, and the read helpers decode bodies without checking for 200 OK. A stalled or erroring mock service can hang the suite or return zero-value state that looks valid to the caller.

Suggested fix
 const mockBoltzAdminURL = "http://127.0.0.1:9101"
+
+var mockBoltzHTTPClient = &http.Client{Timeout: 5 * time.Second}
@@
-	req, err := http.NewRequest("POST", mockBoltzAdminURL+"/admin/config", bytes.NewReader(body))
+	req, err := http.NewRequestWithContext(
+		t.Context(),
+		http.MethodPost,
+		mockBoltzAdminURL+"/admin/config",
+		bytes.NewReader(body),
+	)
 	require.NoError(t, err)
@@
-	resp, err := http.DefaultClient.Do(req)
+	resp, err := mockBoltzHTTPClient.Do(req)
 	require.NoError(t, err)
@@
-	url := fmt.Sprintf("%s/admin/swaps/%s", mockBoltzAdminURL, swapID)
-	resp, err := http.Get(url) //nolint:gosec
+	url := fmt.Sprintf("%s/admin/swaps/%s", mockBoltzAdminURL, swapID)
+	req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, url, nil)
+	require.NoError(t, err)
+	resp, err := mockBoltzHTTPClient.Do(req)
 	require.NoError(t, err)
 	defer func() { _ = resp.Body.Close() }()
+	require.Equalf(t, http.StatusOK, resp.StatusCode, "getMockBoltzSwap(%s) failed", swapID)

Also applies to: 262-272, 276-293, 296-307, 413-434, 446-457

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/utils_test.go` around lines 243 - 259, The helper setMockBoltzConfig
(and the other mock-boltz helpers in this file) use
http.DefaultClient/NewRequest with no context/timeout and proceed to read/parse
responses without first verifying StatusCode; update setMockBoltzConfig to
create a context with a short timeout (e.g., context.WithTimeout), build the
request with http.NewRequestWithContext, use an http.Client with a timeout or
the context, perform the request, immediately check resp.StatusCode and
return/fail with the resp body included in the error message if it's not 200 OK,
then only read/parse the body on success and always defer resp.Body.Close();
apply the same pattern to the other helpers referenced (the GET/POST mock-boltz
helpers in the file).

Comment on lines +4 to +7
ARG VERSION
ARG TARGETOS
ARG TARGETARCH
ARG FULMINE_BRANCH=master
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing build arguments in docker-compose.yml results in empty VERSION and default values.

The docker-compose.yml configuration does not pass any of the declared build args (VERSION, TARGETOS, TARGETARCH, FULMINE_BRANCH). This means:

  • VERSION will be empty, resulting in no version information in the binary (line 14)
  • FULMINE_BRANCH will default to master
  • TARGETOS and TARGETARCH will be unset (Docker will use host platform defaults)

The empty VERSION impacts observability and debugging since the binary won't report its version.

🔧 Proposed fix to add build args in docker-compose.yml

In test/infra/docker-compose.yml, update the boltz-fulmine service to include build args:

boltz-fulmine:
    build:
        context: .
        dockerfile: fulmine.Dockerfile
        args:
            VERSION: "test-infra"
            TARGETOS: linux
            TARGETARCH: amd64
            FULMINE_BRANCH: master  # or pin to a specific commit/tag
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/fulmine.Dockerfile` around lines 4 - 7, Add the missing build args
to the boltz-fulmine service so VERSION, TARGETOS, TARGETARCH and FULMINE_BRANCH
are passed into the Docker build; specifically update the service's build block
in docker-compose.yml to provide args for VERSION (e.g. "test-infra" or a real
version string), TARGETOS (e.g. "linux"), TARGETARCH (e.g. "amd64") and
FULMINE_BRANCH (pin to a branch/tag instead of defaulting to "master") so the
ARGs declared in the fulmine.Dockerfile (VERSION, TARGETOS, TARGETARCH,
FULMINE_BRANCH) are populated during image build and the binary contains version
info.

Comment on lines +7 to +10
ARG FULMINE_BRANCH=master

WORKDIR /app
RUN git clone --branch ${FULMINE_BRANCH} --single-branch --depth 1 https://github.com/ArkLabsHQ/fulmine.git .
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-reproducible builds due to default branch reference.

The Dockerfile defaults to cloning the master branch, which means each build can pull different code as the upstream repository evolves. This creates several risks:

  • Build reproducibility: Integration tests may break unexpectedly when upstream changes
  • Supply chain security: No verification of source integrity (commit hash, signatures)
  • Debugging difficulty: Hard to correlate test failures with specific fulmine versions

For test infrastructure that requires stability, consider pinning to a specific commit hash or tagged version.

🔒 Proposed fix to pin to a specific commit
-ARG FULMINE_BRANCH=master
+ARG FULMINE_COMMIT=abc123def456  # Replace with actual commit hash

 WORKDIR /app
-RUN git clone --branch ${FULMINE_BRANCH} --single-branch --depth 1 https://github.com/ArkLabsHQ/fulmine.git .
+RUN git clone https://github.com/ArkLabsHQ/fulmine.git . && \
+    git checkout ${FULMINE_COMMIT}

Then in docker-compose.yml:

build:
    args:
        FULMINE_COMMIT: "abc123def456"  # Pin to tested version
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/fulmine.Dockerfile` around lines 7 - 10, The Dockerfile currently
uses ARG FULMINE_BRANCH and runs "git clone ... --branch ${FULMINE_BRANCH}"
which leads to non-reproducible builds; change the build arg to something like
FULMINE_COMMIT and update the RUN step that performs the clone (the RUN git
clone ... line) so it fetches and checks out a specific commit/tag (use a
minimal fetch and git checkout of ${FULMINE_COMMIT} or otherwise fetch by commit
hash) instead of cloning a moving branch; also update any consumers (e.g.,
docker-compose build args) to pass the pinned FULMINE_COMMIT value.

Comment on lines +17 to +28
FROM alpine:3.20

WORKDIR /app

COPY --from=go-builder /app/bin/* /app/

ENV PATH="/app:${PATH}"
ENV FULMINE_DATADIR=/app/data

VOLUME /app/data

ENTRYPOINT [ "fulmine" ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Container runs as root user.

The final image does not specify a non-root user, so the container will run as root. While this is test infrastructure, following the principle of least privilege improves security posture.

🛡️ Proposed fix to add non-root user
 FROM alpine:3.20

+RUN addgroup -g 1000 fulmine && \
+    adduser -D -u 1000 -G fulmine fulmine && \
+    mkdir -p /app/data && \
+    chown -R fulmine:fulmine /app
+
 WORKDIR /app

 COPY --from=go-builder /app/bin/* /app/

 ENV PATH="/app:${PATH}"
 ENV FULMINE_DATADIR=/app/data

 VOLUME /app/data

+USER fulmine
+
 ENTRYPOINT [ "fulmine" ]

As per static analysis: Trivy DS-0002 flags that image user should not be 'root'.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM alpine:3.20
WORKDIR /app
COPY --from=go-builder /app/bin/* /app/
ENV PATH="/app:${PATH}"
ENV FULMINE_DATADIR=/app/data
VOLUME /app/data
ENTRYPOINT [ "fulmine" ]
FROM alpine:3.20
RUN addgroup -g 1000 fulmine && \
adduser -D -u 1000 -G fulmine fulmine && \
mkdir -p /app/data && \
chown -R fulmine:fulmine /app
WORKDIR /app
COPY --from=go-builder /app/bin/* /app/
ENV PATH="/app:${PATH}"
ENV FULMINE_DATADIR=/app/data
VOLUME /app/data
USER fulmine
ENTRYPOINT [ "fulmine" ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/fulmine.Dockerfile` around lines 17 - 28, The image currently runs
as root; create a dedicated non-root user and switch to it: add steps to create
a user/group (e.g. fulmine user), ensure ownership of the FULMINE_DATADIR
(/app/data) and any copied binaries is set to that user, and add a USER
directive before the ENTRYPOINT so the container runs unprivileged; update
references around ENV FULMINE_DATADIR and ENTRYPOINT "fulmine" to ensure the new
user can execute the binary and write to the data directory.

Comment on lines +20 to +25
FROM alpine:3.21
RUN apk add --no-cache ca-certificates tzdata
WORKDIR /app
COPY --from=builder /out/mock-boltz /app/mock-boltz
EXPOSE 9001
ENTRYPOINT ["/app/mock-boltz"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop root from the runtime image.

The final stage still runs as root. This service only binds to port 9001 and reads a single binary, so keeping root here is an unnecessary privilege expansion.

Suggested fix
 FROM alpine:3.21
-RUN apk add --no-cache ca-certificates tzdata
+RUN apk add --no-cache ca-certificates tzdata \
+    && addgroup -S mockboltz \
+    && adduser -S -G mockboltz mockboltz
 WORKDIR /app
 COPY --from=builder /out/mock-boltz /app/mock-boltz
+USER mockboltz
 EXPOSE 9001
 ENTRYPOINT ["/app/mock-boltz"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/Dockerfile` around lines 20 - 25, The final Docker stage
runs as root; create and switch to a non-root user and ensure the binary is
owned and executable by that user before switching. In the Dockerfile change the
runtime stage that contains EXPOSE 9001 and ENTRYPOINT ["/app/mock-boltz"] to
add a user (e.g., addgroup/adduser or adduser -D), chown /app/mock-boltz and any
required dirs to that user, set appropriate file permissions, and add a USER
<username> line before the ENTRYPOINT so the container no longer runs as root.

Comment thread test/infra/mockboltz/server.go
Comment on lines +591 to +609
if msg.Op == "subscribe" && msg.Channel == "swap.update" {
s.wsMu.RLock()
client := s.wsClients[conn]
s.wsMu.RUnlock()
if client == nil {
continue
}
for _, id := range msg.Args {
client.subs[id] = struct{}{}
}

ack := map[string]any{
"event": "subscribe",
"channel": "swap.update",
"args": msg.Args,
}
client.mu.Lock()
_ = conn.WriteJSON(ack)
client.mu.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard client.subs with the client mutex.

handleWS() writes client.subs without locking, while pushSwapUpdate() reads the same map concurrently. That can panic with concurrent map read and map write հենց when a subscription races with a pushed update.

Suggested fix
 		if msg.Op == "subscribe" && msg.Channel == "swap.update" {
 			s.wsMu.RLock()
 			client := s.wsClients[conn]
 			s.wsMu.RUnlock()
 			if client == nil {
 				continue
 			}
+			client.mu.Lock()
 			for _, id := range msg.Args {
 				client.subs[id] = struct{}{}
 			}
 
 			ack := map[string]any{
 				"event":   "subscribe",
 				"channel": "swap.update",
 				"args":    msg.Args,
 			}
-			client.mu.Lock()
 			_ = conn.WriteJSON(ack)
 			client.mu.Unlock()
 		}
 	}
@@
 	for conn, client := range s.wsClients {
-		if _, ok := client.subs[id]; !ok {
-			continue
-		}
 		client.mu.Lock()
-		err := conn.WriteJSON(payload)
+		_, subscribed := client.subs[id]
+		var err error
+		if subscribed {
+			err = conn.WriteJSON(payload)
+		}
 		client.mu.Unlock()
 		if err != nil {
 			log.WithError(err).Warn("failed to push ws event")
 		}
 	}

Also applies to: 947-953

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 591 - 609, handleWS is modifying
client.subs without locking while pushSwapUpdate reads it, causing a concurrent
map panic; wrap all writes to client.subs with the client's mutex (client.mu)
the same way reads are protected. Specifically, in the subscribe branch where
you iterate msg.Args and set client.subs[id] = struct{}{} (and the similar block
at the other location mentioned), acquire client.mu.Lock() before mutating
client.subs and defer or unlock after the updates, keeping the existing
conn.WriteJSON ack calls protected as needed to avoid races between subscription
changes and pushSwapUpdate.

@arkanaai
Copy link
Copy Markdown

arkanaai Bot commented Mar 20, 2026

Followup review (commit d0209b8)

Single-line linting fix in test/infra/mockboltz/server.go: switched from s.log.Printf to log.WithError(err).Warnf for consistent structured logging. ✅ No concerns.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/infra/mockboltz/server.go (2)

1108-1113: Document the 512-threshold heuristic for locktime type.

The cutoff at 512 is a simplification. Real BIP-68 uses a flag bit, not a value threshold. Consider adding a brief comment so future readers understand the mock's behavior.

Suggested improvement
+// parseRelativeLocktime uses a simple heuristic: values >= 512 are treated as
+// seconds, otherwise as blocks. This is a mock simplification; real BIP-68
+// uses a flag bit to distinguish the two types.
 func parseRelativeLocktime(locktime uint32) arklib.RelativeLocktime {
 	if locktime >= 512 {
 		return arklib.RelativeLocktime{Type: arklib.LocktimeTypeSecond, Value: locktime}
 	}
 	return arklib.RelativeLocktime{Type: arklib.LocktimeTypeBlock, Value: locktime}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 1108 - 1113, The function
parseRelativeLocktime uses a simplified heuristic (locktime >= 512 =>
second-based) which differs from BIP-68 that inspects the high bit to determine
time-vs-block semantics; add a concise comment above parseRelativeLocktime
explaining this: state that the 512 cutoff is an intentional mock simplification
of BIP-68's flag-bit behavior, mention the real rule (use the high bit to select
between LocktimeTypeSecond and LocktimeTypeBlock), and note that the function
returns arklib.RelativeLocktime with Type set to arklib.LocktimeTypeSecond or
arklib.LocktimeTypeBlock accordingly so future readers understand the deviation.

449-454: Line exceeds golines max length.

Static analysis reports this line is not properly formatted per golines rules.

Suggested fix
 		filtered := make([]*psbt.TaprootScriptSpendSig, 0, len(in.TaprootScriptSpendSig)+1)
 		for _, existing := range in.TaprootScriptSpendSig {
-			if bytes.Equal(existing.XOnlyPubKey, xOnly) && bytes.Equal(existing.LeafHash, leafHash[:]) {
+			sameKey := bytes.Equal(existing.XOnlyPubKey, xOnly)
+			sameLeaf := bytes.Equal(existing.LeafHash, leafHash[:])
+			if sameKey && sameLeaf {
 				continue
 			}
 			filtered = append(filtered, existing)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 449 - 454, The line creating
filtered is too long; split the make call so it fits golines (e.g., compute the
capacity into a short variable or use a var declaration) — replace "filtered :=
make([]*psbt.TaprootScriptSpendSig, 0, len(in.TaprootScriptSpendSig)+1)" with a
multi-line form such as "cap := len(in.TaprootScriptSpendSig) + 1" followed by
"filtered := make([]*psbt.TaprootScriptSpendSig, 0, cap)" or "var filtered
[]*psbt.TaprootScriptSpendSig; filtered = make([]*psbt.TaprootScriptSpendSig, 0,
len(in.TaprootScriptSpendSig)+1)" ensuring identifiers filtered,
psbt.TaprootScriptSpendSig and in.TaprootScriptSpendSig are used as shown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/infra/mockboltz/server.go`:
- Around line 1108-1113: The function parseRelativeLocktime uses a simplified
heuristic (locktime >= 512 => second-based) which differs from BIP-68 that
inspects the high bit to determine time-vs-block semantics; add a concise
comment above parseRelativeLocktime explaining this: state that the 512 cutoff
is an intentional mock simplification of BIP-68's flag-bit behavior, mention the
real rule (use the high bit to select between LocktimeTypeSecond and
LocktimeTypeBlock), and note that the function returns arklib.RelativeLocktime
with Type set to arklib.LocktimeTypeSecond or arklib.LocktimeTypeBlock
accordingly so future readers understand the deviation.
- Around line 449-454: The line creating filtered is too long; split the make
call so it fits golines (e.g., compute the capacity into a short variable or use
a var declaration) — replace "filtered := make([]*psbt.TaprootScriptSpendSig, 0,
len(in.TaprootScriptSpendSig)+1)" with a multi-line form such as "cap :=
len(in.TaprootScriptSpendSig) + 1" followed by "filtered :=
make([]*psbt.TaprootScriptSpendSig, 0, cap)" or "var filtered
[]*psbt.TaprootScriptSpendSig; filtered = make([]*psbt.TaprootScriptSpendSig, 0,
len(in.TaprootScriptSpendSig)+1)" ensuring identifiers filtered,
psbt.TaprootScriptSpendSig and in.TaprootScriptSpendSig are used as shown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4e71f37-badf-446f-a586-b7469c8e9448

📥 Commits

Reviewing files that changed from the base of the PR and between 07e1b9f and d0209b8.

📒 Files selected for processing (1)
  • test/infra/mockboltz/server.go

@arkanaai
Copy link
Copy Markdown

arkanaai Bot commented Mar 20, 2026

🔍 Arkana PR Review — go-sdk#113 (Boltz Swap/VHTLC packages)

Scope: Moving Boltz swap & VHTLC packages from Fulmine to go-sdk. 13k+ additions across 48 files.

✅ Security Strengths

  • VHTLC address validation (chainswap_validate.go): Properly reconstructs expected VHTLC address from Boltz parameters and compares before sending funds — prevents scam attacks where Boltz returns a malicious lockup address.
  • BTC lockup address validation (validateBtcLockupAddress): Derives expected P2TR address from swap tree + MuSig2 aggregated key and verifies match.
  • Claim/refund script parsing: Byte-level manual parsing of tapscript leaves with strict length and opcode validation. Good defense-in-depth.
  • Preimage generation: Uses crypto/rand — correct.
  • Dual claim paths: Cooperative MuSig2 key-path claim with automatic fallback to script-path — good resilience.
  • Thread safety: ChainSwap struct uses sync.RWMutex for mutable fields; callbacks fire outside the lock to avoid deadlocks.
  • Refund timeout floor: MinRefundTimeout = 144 blocks sanity check prevents accepting dangerously short CLTV timeouts.

⚠️ Security Concerns

1. MuSig2 Nonce Binding (swap/musig.go:64-69)

nonces, err := musig2.GenNonces(
    musig2.WithPublicKey(c.publicKey),
)

The nonce is only bound to the public key, not to the message being signed. For production safety, nonces should also be bound to the signing message via WithMessage() (or at minimum the transaction hash). MuSig2 nonce reuse with different messages can leak private keys. This is mitigated by the fact that GenerateNonce is called fresh per claim, but a defense-in-depth binding would be safer.

2. Inconsistent HTTP clients (swap/explorer.go)
GetCurrentBlockHeight() (line ~130) and GetTransactionStatus() (line ~148) use the global http.Get() instead of e.client (which has a 10s timeout). This means these calls have no timeout protection and could hang indefinitely. Should use e.client.Get() consistently.

3. WebSocket reconnection loop (swap/boltz/ws.go:147-158)

for {
    // ...reconnect forever without limit
    time.Sleep(reconnectInterval)
    err := boltz.Connect()
    if err == nil { return }
}

No maximum retry count or total timeout. A permanently unreachable Boltz server causes an infinite goroutine. Consider adding a max retry count or context cancellation check.

4. Potential panic in WebSocket handler (swap/boltz/ws.go:107)

defer func() {
    if r := recover(); r != nil {
        log.Errorf("panic in boltz ws connection: %v", r)
    }
}()

Recovery is logged but the goroutine silently dies — the Updates channel is never closed in this path, so consumers block forever. The reconnect loop below may or may not trigger depending on timing.

📝 Code Quality Notes

  • Good test coverage: Mock Boltz server, integration Docker compose, comprehensive swap e2e tests.
  • Clean separation: Handler pattern (ChainSwapEventHandler interface) with strategy for Ark→BTC vs BTC→Ark is well structured.
  • Event system: Typed domain events (UserLockEvent, ClaimEvent, etc.) — clean and extensible.
  • Quote mechanism: Handles amount mismatch gracefully via Boltz quote API with dedup guard (quoteAccepted flag).

🔗 Cross-Repo Impact

This moves core swap logic from Fulmine into the shared go-sdk. Any downstream consumers of Fulmine's swap packages will need to update imports. The vhtlc package is also included, which is used by arkd for VTXO validation.

Verdict: Solid architecture for a complex feature. The MuSig2 nonce binding and HTTP client inconsistencies should be addressed before merge. The WebSocket reconnection loop is lower priority but worth a follow-up issue.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
test/infra/mockboltz/server.go (1)

592-601: ⚠️ Potential issue | 🔴 Critical

Lock client.subs on both the subscribe and broadcast paths.

At Lines 599-600 handleWS() writes client.subs without client.mu, while pushSwapUpdate() reads the same map at Lines 977-978. That can still panic with concurrent map read and map write when a subscribe races with an injected update.

Suggested fix
 		if msg.Op == "subscribe" && msg.Channel == "swap.update" {
 			s.wsMu.RLock()
 			client := s.wsClients[conn]
 			s.wsMu.RUnlock()
 			if client == nil {
 				continue
 			}
+			client.mu.Lock()
 			for _, id := range msg.Args {
 				client.subs[id] = struct{}{}
 			}
 
 			ack := map[string]any{
 				"event":   "subscribe",
 				"channel": "swap.update",
 				"args":    msg.Args,
 			}
-			client.mu.Lock()
 			_ = conn.WriteJSON(ack)
 			client.mu.Unlock()
 		}
@@
 	for conn, client := range s.wsClients {
-		if _, ok := client.subs[id]; !ok {
-			continue
-		}
 		client.mu.Lock()
-		err := conn.WriteJSON(payload)
+		_, subscribed := client.subs[id]
+		var err error
+		if subscribed {
+			err = conn.WriteJSON(payload)
+		}
 		client.mu.Unlock()
 		if err != nil {
 			log.WithError(err).Warn("failed to push ws event")
 		}
 	}

Also applies to: 976-982

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 592 - 601, handleWS() is writing
to the client's subscription map (client.subs) without taking client.mu while
pushSwapUpdate() reads that map, causing potential concurrent map read/write
panics; update handleWS() (the subscribe branch where msg.Op=="subscribe" &&
msg.Channel=="swap.update") to acquire the per-client mutex (client.mu) before
modifying client.subs and release it after, and likewise ensure pushSwapUpdate()
wraps its reads of client.subs in the same client.mu; use the existing s.wsMu
only for accessing s.wsClients[conn] and do not replace the per-client locking
for subs access.
test/e2e/swap_test.go (1)

48-53: ⚠️ Potential issue | 🟠 Major

Synchronize each shared events collector.

At Lines 50-51, 157-158, 413-414, 509-510, 616-617, 723-724, and 853-854 the callback appends from an async goroutine, while the test goroutine later iterates/asserts on the same slice. That is a real data race and will make this e2e suite flaky under -race.

Suggested fix
-	var events []swap.ChainSwapEvent
+	var (
+		eventsMu sync.Mutex
+		events   []swap.ChainSwapEvent
+	)
 	eventCallback := func(event swap.ChainSwapEvent) {
+		eventsMu.Lock()
 		events = append(events, event)
+		eventsMu.Unlock()
 		t.Logf("ChainSwap event: %T", event)
 	}
@@
-				for _, e := range events {
+				eventsMu.Lock()
+				snapshot := append([]swap.ChainSwapEvent(nil), events...)
+				eventsMu.Unlock()
+				for _, e := range snapshot {

Apply the same pattern to the other events slices in this file.

Also applies to: 156-160, 412-416, 508-512, 615-619, 722-726, 852-856

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/swap_test.go` around lines 48 - 53, The shared slice events (type
swap.ChainSwapEvent) is appended from async callbacks (eventCallback) and later
read from the test goroutine, causing a data race; protect access by introducing
a sync.Mutex (e.g., eventsMu) next to each events slice and lock/unlock around
every append in the callback and around any iteration/assertion reads (or make a
locked copy before iterating), and apply this same mutex-guard pattern to the
other events slices/callbacks in this file (the other occurrences referenced in
the review).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/swap_test.go`:
- Around line 549-567: The test currently treats reaching a terminal refund
state as success even if the unilateralRefundCalled channel never fired; change
the logic in the select that waits on unilateralRefundCalled so the test fails
if unilateralRefundCalled is not received within the timeout instead of
accepting terminal statuses as a fallback. Locate the select using the
unilateralRefundCalled channel and chainSwap.GetStatus() and remove or tighten
the switch that treats swap.ChainSwapRefundedUnilaterally / swap.ChainSwapFailed
/ swap.ChainSwapRefundFailed as success, ensuring the test explicitly requires
the unilateralRefundCalled case to occur (or fails on timeout) to prove the
callback path fires.

In `@test/infra/mockboltz/server.go`:
- Around line 1197-1208: The http.Get call used during Ark signer discovery (in
the New() flow of the mock server) has no timeout and can hang; replace the
plain http.Get(url) with a request that enforces a deadline—create a context
with timeout (e.g., context.WithTimeout) or use an http.Client with a Timeout
and perform client.Do(req) (or http.NewRequestWithContext) so each retry is
bounded, propagate/close the response body as before, and surface the
context/client error into lastErr for the retry loop.

---

Duplicate comments:
In `@test/e2e/swap_test.go`:
- Around line 48-53: The shared slice events (type swap.ChainSwapEvent) is
appended from async callbacks (eventCallback) and later read from the test
goroutine, causing a data race; protect access by introducing a sync.Mutex
(e.g., eventsMu) next to each events slice and lock/unlock around every append
in the callback and around any iteration/assertion reads (or make a locked copy
before iterating), and apply this same mutex-guard pattern to the other events
slices/callbacks in this file (the other occurrences referenced in the review).

In `@test/infra/mockboltz/server.go`:
- Around line 592-601: handleWS() is writing to the client's subscription map
(client.subs) without taking client.mu while pushSwapUpdate() reads that map,
causing potential concurrent map read/write panics; update handleWS() (the
subscribe branch where msg.Op=="subscribe" && msg.Channel=="swap.update") to
acquire the per-client mutex (client.mu) before modifying client.subs and
release it after, and likewise ensure pushSwapUpdate() wraps its reads of
client.subs in the same client.mu; use the existing s.wsMu only for accessing
s.wsClients[conn] and do not replace the per-client locking for subs access.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ba7acdb-109c-4d4e-bc0e-c7c7311ee06f

📥 Commits

Reviewing files that changed from the base of the PR and between d0209b8 and 759fc7c.

📒 Files selected for processing (3)
  • test/e2e/swap_test.go
  • test/e2e/utils_test.go
  • test/infra/mockboltz/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/utils_test.go

Comment thread test/e2e/swap_test.go
Comment on lines +549 to +567
// Wait for the unilateral refund callback to fire
select {
case <-unilateralRefundCalled:
t.Logf("Unilateral refund callback confirmed for swap %s", chainSwap.Id)
require.NotEmpty(t, events, "should have received events")
case <-time.After(60 * time.Second):
switch chainSwap.GetStatus() {
case swap.ChainSwapRefundedUnilaterally, swap.ChainSwapFailed, swap.ChainSwapRefundFailed:
t.Logf(
"Swap reached terminal state %d (valid for unilateral refund test)",
chainSwap.GetStatus(),
)
default:
t.Fatalf(
"timed out waiting for unilateral refund, swap status: %d",
chainSwap.GetStatus(),
)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the unilateral-refund callback assertion strict.

The fallback at Lines 554-566 currently passes once the swap reaches a terminal refund status, even if unilateralRefundCalled never fires. That means this test can succeed without proving the callback path it is named after.

Suggested fix
 	select {
 	case <-unilateralRefundCalled:
 		t.Logf("Unilateral refund callback confirmed for swap %s", chainSwap.Id)
 		require.NotEmpty(t, events, "should have received events")
 	case <-time.After(60 * time.Second):
-		switch chainSwap.GetStatus() {
-		case swap.ChainSwapRefundedUnilaterally, swap.ChainSwapFailed, swap.ChainSwapRefundFailed:
-			t.Logf(
-				"Swap reached terminal state %d (valid for unilateral refund test)",
-				chainSwap.GetStatus(),
-			)
-		default:
-			t.Fatalf(
-				"timed out waiting for unilateral refund, swap status: %d",
-				chainSwap.GetStatus(),
-			)
-		}
+		t.Fatalf(
+			"timed out waiting for unilateral refund callback, swap status: %d",
+			chainSwap.GetStatus(),
+		)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Wait for the unilateral refund callback to fire
select {
case <-unilateralRefundCalled:
t.Logf("Unilateral refund callback confirmed for swap %s", chainSwap.Id)
require.NotEmpty(t, events, "should have received events")
case <-time.After(60 * time.Second):
switch chainSwap.GetStatus() {
case swap.ChainSwapRefundedUnilaterally, swap.ChainSwapFailed, swap.ChainSwapRefundFailed:
t.Logf(
"Swap reached terminal state %d (valid for unilateral refund test)",
chainSwap.GetStatus(),
)
default:
t.Fatalf(
"timed out waiting for unilateral refund, swap status: %d",
chainSwap.GetStatus(),
)
}
}
// Wait for the unilateral refund callback to fire
select {
case <-unilateralRefundCalled:
t.Logf("Unilateral refund callback confirmed for swap %s", chainSwap.Id)
require.NotEmpty(t, events, "should have received events")
case <-time.After(60 * time.Second):
t.Fatalf(
"timed out waiting for unilateral refund callback, swap status: %d",
chainSwap.GetStatus(),
)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/swap_test.go` around lines 549 - 567, The test currently treats
reaching a terminal refund state as success even if the unilateralRefundCalled
channel never fired; change the logic in the select that waits on
unilateralRefundCalled so the test fails if unilateralRefundCalled is not
received within the timeout instead of accepting terminal statuses as a
fallback. Locate the select using the unilateralRefundCalled channel and
chainSwap.GetStatus() and remove or tighten the switch that treats
swap.ChainSwapRefundedUnilaterally / swap.ChainSwapFailed /
swap.ChainSwapRefundFailed as success, ensuring the test explicitly requires the
unilateralRefundCalled case to occur (or fails on timeout) to prove the callback
path fires.

Comment on lines +1197 to +1208
for i := 0; i < 30; i++ {
resp, err := http.Get(url)
if err != nil {
lastErr = err
time.Sleep(500 * time.Millisecond)
continue
}
body, _ := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
lastErr = fmt.Errorf("http %d: %s", resp.StatusCode, strings.TrimSpace(string(body)))
time.Sleep(500 * time.Millisecond)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "server.go" -path "*/mockboltz/*" | head -20

Repository: arkade-os/go-sdk

Length of output: 92


🏁 Script executed:

cat -n ./test/infra/mockboltz/server.go | sed -n '1190,1220p'

Repository: arkade-os/go-sdk

Length of output: 1226


🏁 Script executed:

cat -n ./test/infra/mockboltz/server.go | sed -n '1220,1240p'

Repository: arkade-os/go-sdk

Length of output: 784


🏁 Script executed:

grep -n "fetchArkSignerPubKey" ./test/infra/mockboltz/server.go | head -10

Repository: arkade-os/go-sdk

Length of output: 191


🏁 Script executed:

cat -n ./test/infra/mockboltz/server.go | sed -n '170,190p'

Repository: arkade-os/go-sdk

Length of output: 725


🏁 Script executed:

cat -n ./test/infra/mockboltz/server.go | sed -n '130,180p'

Repository: arkade-os/go-sdk

Length of output: 1564


Add request deadline to Ark signer discovery HTTP calls.

http.Get at line 1198 uses the default client with no timeout. A stalled arkd connection will block New() indefinitely and hang the mock stack despite the retry loop.

Suggested fix
 func fetchArkSignerPubKey(arkdURL string) (*btcec.PublicKey, error) {
 	if strings.TrimSpace(arkdURL) == "" {
 		return nil, fmt.Errorf("MOCK_BOLTZ_ARKD_URL is required")
 	}
 	url := strings.TrimRight(strings.TrimSpace(arkdURL), "/") + "/v1/info"
+	client := &http.Client{Timeout: 5 * time.Second}
 	var lastErr error
 	for i := 0; i < 30; i++ {
-		resp, err := http.Get(url)
+		resp, err := client.Get(url)
 		if err != nil {
 			lastErr = err
 			time.Sleep(500 * time.Millisecond)
 			continue
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i := 0; i < 30; i++ {
resp, err := http.Get(url)
if err != nil {
lastErr = err
time.Sleep(500 * time.Millisecond)
continue
}
body, _ := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
lastErr = fmt.Errorf("http %d: %s", resp.StatusCode, strings.TrimSpace(string(body)))
time.Sleep(500 * time.Millisecond)
func fetchArkSignerPubKey(arkdURL string) (*btcec.PublicKey, error) {
if strings.TrimSpace(arkdURL) == "" {
return nil, fmt.Errorf("MOCK_BOLTZ_ARKD_URL is required")
}
url := strings.TrimRight(strings.TrimSpace(arkdURL), "/") + "/v1/info"
client := &http.Client{Timeout: 5 * time.Second}
var lastErr error
for i := 0; i < 30; i++ {
resp, err := client.Get(url)
if err != nil {
lastErr = err
time.Sleep(500 * time.Millisecond)
continue
}
body, _ := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
lastErr = fmt.Errorf("http %d: %s", resp.StatusCode, strings.TrimSpace(string(body)))
time.Sleep(500 * time.Millisecond)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/mockboltz/server.go` around lines 1197 - 1208, The http.Get call
used during Ark signer discovery (in the New() flow of the mock server) has no
timeout and can hang; replace the plain http.Get(url) with a request that
enforces a deadline—create a context with timeout (e.g., context.WithTimeout) or
use an http.Client with a Timeout and perform client.Do(req) (or
http.NewRequestWithContext) so each retry is bounded, propagate/close the
response body as before, and surface the context/client error into lastErr for
the retry loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant