Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions in-progress/0045-tx-retrieval/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
| | |
| -------------------- | --------------------------------------------------------------------------------------------- |
| Owners | @PhilWindle |
| Approvers | @just-mitch @alexghr @spalladino @Maddiaa0 |
| Target Approval Date | 2025-06-10 |


## Executive Summary

This design attempts to define a more effective protocol for the retrieval of missing transactions from the network.


## Introduction

There is an outstanding problem with the block building protocol that we have no data availability solution. Node's mempools will naturally diverge and it is likely, particularly at higher transaction througputs that validators and provers won't have access to the transactions included in blocks proposed by the block proposer.

The codebase currently contains a request/reponse mechanism for retrieving these transactions but it has not proven very effective. This design aims at improving on it.

## Transaction Lifecycle

For the purpose of network participants, transactions have a rough lifecycle.

1. The transaction is stored as pending within the local mempool. During this time it may be subject to eviction, based on rules local to the node. We will refer to this state as PENDING.
2. The transaction is included in a block proposal. At this point the transaction should not be evicted as it may be required for block validation. We will refer to this state as PROPOSED.
3. The transaction is included in a mined block. All mined transactions are stored for a period of time. We will refer to this state as MINED.
4. The transaction's block is proven or pruned. We will refer to this state as EXPIRED.

Block proposals follow a similar lifecycle with the exception there is no such thing as a PENDING state for proposals.

If a block proposal does not result in a mined block, the transactions within it will revert to PENDING.

## Requirements

The requirements of validators are that transactions can be retrieved quickly. The transactions need to be retrieved and the block needs to be re-executed in time for an attestation to be produced. Provers also require transactions for re-execution, their timeliness requirements are less strict as they essentially have 1 - 2 epochs to produce the required proofs.


## Current Approach

Every node on the network subscribes to block proposals. Upon receiving a block proposal, the node will instruct it's transaction pool to mark the transaction hashes as PROPOSED, non-evictable. The node will then make an attempt to request any missing transactions from it's peers on the network. All PROPOSED hashes are removed from the pool when any block is mined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All PROPOSED hashes are removed from the pool when any block is mined

Aren't they removed when the block is proven, not mined, so the tx remains available for provers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently no. That set of hashes I have described is just used to prevent pending tx eviction. Mined txs remain until their block is finalised.


The node makes a number of 'rounds' of requesting transactions. Each round sees it select a random subset of peers and ask each peer for a subset of missing transactions. Timeouts are specified for each round and globally. The peer that sent propagated the proposal to the peer (note: not the proposer of the block) is always included in the peer subset.

The timeout values are arbitrarily set to 2 seconds and 8 seconds currently. The max number of peers selected for each round is a function of the number of transactions required.

A request for a transaction is singular, 1 tx at a time. Requests to a given peer are performed serially so at any given time a single peer is only asked for a single transaction. Each request is a unique dial, request/response and hang-up operation.

## Proposed approach

As before, every node on the network subscribes to block proposals and marks transaction hashes so as not to evict those transactions from the pool. Transaction hashes remain marked until the end of the slot after the slot in which they were PROPOSED. This avoids race conditions where a proposal for slot n + 1 arrives before a node synced the block for slot n. The syncing of the block currently would remove the protection for transactions in the new block proposal.

The `TxCollector` module will be modified to become a longer running task that can be thought of as permamnently making attempts to retrieve transactions from the network. It will dial and hold connections/streams to all connected peers. As stream/connection events happen it will re-attempt to establish connectivity and maintain available streams.

Upon receipt of a block proposal, the `TxCollector` will be notified of the proposal and the transactions that need to be retrieved. It will continue to perform a series of message exchanges with all peers until the transactions are no longer required.

Reason for the transactions no longer being required are:

1. The proposal never made it into a mined block and the following slot has passed. The transaction transitioned back to PENDING.
2. The block and it's transactions become EXPIRED.
3. The transactions have been retrieved.

### Block Tx Request/Response

Instead of randomly selecting peers to query with random tx requests, the node will make frequent message exchanges with all of it's peers, these messages will be small and sent over previously established streams reducing latency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of it's peers

Is it safe to blast all peers so frequently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not. Maybe we should stagger it. We could send to a 1/4 of peers every 300ms or something, rotating of course. Would need to think about it.


We introduce two intervals, the `proposedRequestInterval` and the `minedRequestInterval`, typically say 500ms and 2000ms respectively.

Every `proposedRequestInterval`, the node makes an evaluation as to which block proposals it still requires transactions for and when it last enquired about a proposal. Queries will be made for proposals that are PROPOSED at this interval (provided txs are still required), proposals that are MINED will be queried at the less frequent `minedRequestInterval`.

Peers are queried using `BlockTxRequests` messages.

```
type BlockTxRequest = {
slotNumber: number,
blockHash: Buffer, // 32 byte hash of the proposed block header
}

type BlockTxRequests = {
requests: BlockTxRequest[]
}
```

Upon receipt of a `BlockTxRequests` the peer will respond with a `BlockTxResponses`.

```
type BlockTxResponse = {
slotNumber: number,
blockHash: Buffer, // 32 byte hash of the proposed block
blockAvailable: boolean; // Whether the peer has the block available
Comment on lines +87 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reliable way to know to which msg the peer is replying to? If so, I'd remove these two fields, just to make messages smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. To be honest though, even just a 32 bit message ID on a per connection basis would suffice.

txIndices: Buffer, // BitVector indicating which txs from the proposal are available at the peer
}

type BlockTxResponses = {
responses: BlockTxResponse[]
}
```

The frequent exchange of these messages enables nodes to build up mappings of where in their sets of peers transactions are available. These mappings will change rapidly as peers also implement the same transaction retrieval protocol.

### Tx Request/Response

Transactions are requested in batches using the `TxRequest` message. Additionally, `TxRequests` messages are used to request transactions from multiple blocks in a single request.

```
type TxRequest = {
slotNumber: number,
blockHash: Buffer, // 32 byte hash of the proposed block
txIndices: Buffer, // BitVector indicating which txs from the proposal are requested
}
Comment on lines +104 to +108
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow for batching multiple TxRequests, in case a node needs txs from more than a single block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should, will update.

```

```
type TxRequests = {
requests: TxRequest[]
}
```

Using the mapping generated using the Block Tx Request/Response message exchange we intelligently request transactions from selected peers.

1. Only make a single request to a peer at a time.
2. Limit the number of transactions requested in a single request to a configurable `batchSize`.
3. Allocate txs to peers such that we optimally retrieve all txs in the minimum number of requests and asking for the minimum number of txs from any given peer.

Only making a single request to a peer with a limited number of transactions prevents a node from simply requesting all available transactions from the first peer to respond. Instead we should aim to spread the load as much as possible.

## Protections

Nodes will wish to protect themselves from malicious or faulty peers through peer-scoring. Punishments should be applied for:

1. Providing invalid transactions.
2. Providing transactions that do not match the requested block hash.
3. Making `BlockTxRequest` requests for block proposals that do not exist, accounting for the fact that you may be slightly behind/ahead of the peer.
4. Making too many requests in a given period of time.
5. Making duplicate requests for transactions.