-
Notifications
You must be signed in to change notification settings - Fork 288
GossipSub v1.4: Message preamble + IMReceiving notification to considerably reduce bandwidth & latency for large messages #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Shouldnt this be v1.3? |
Actually, there was an open PR with v1.3, the idea was to set an appropriate number once its considered ready for merge |
ok, fair enough. |
|
||
The purpose of the preamble is to allow receivers to instantly learn about the incoming message. | ||
The preamble must include the message ID and length, | ||
providing receivers with immediate access to critical information about the incoming message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue is that as of the protobuf schema is designed, you will have to download the whole message in order to access the preamble. If you look at how control messages are represented in the rpc message:
https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#protobuf
https://github.com/libp2p/specs/tree/master/pubsub#the-rpc
It is numbered after our full published message. So you would have to download the whole message before you can access the preamble.
Nvm, I misunderstood this. The preamble is a rpc message sent separately beforehand
### IMReceiving Message | ||
|
||
The IMReceiving message serves a distinct purpose compared to the IDONTWANT message. | ||
An IDONTWANT can only be transmitted after receiving the entire message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the distinction here of IMReceiving
compared with IDONTWANT
and having this broadcasted earlier, how effective would this be in practice ? One issue we have seen is that an actual control message takes a while to be processed by the gossip router even after it has been received due to HOL blocking. So by the time you process the control message, the actual message might already be sent by your mesh peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the scenarios we tested, sending IMReceiving
significantly increases the probability of mesh peers being able to stop unnecessary message sends since enough IMReceiving
go through in time.
Still, definitely something we will look out for in our experiments, and check for scenarios where HOL might have a severe impact. We also have have Ethereum focused tests and analyses on the roadmap.
With QUIC as a transport and multiplexer, we can further reduce the HOL impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the scenarios we tested, sending IMReceiving significantly increases the probability of mesh peers being able to stop unnecessary message sends since enough IMReceiving go through in time.
Is there more information on the scenarios tested ? Ex: How many different topics nodes were subscribed to along with how many messages were being published per second on these topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the distinction here of
IMReceiving
compared withIDONTWANT
and having this broadcasted earlier, how effective would this be in practice ? One issue we have seen is that an actual control message takes a while to be processed by the gossip router even after it has been received due to HOL blocking. So by the time you process the control message, the actual message might already be sent by your mesh peers.
Yes, that is why we still see duplicates, averaging around 1.8 per peer in the network. Proper prioritization of preamble/IDONTWANTs should further lower the number of duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more information on the scenarios tested ? Ex: How many different topics nodes were subscribed to along with how many messages were being published per second on these topics.
All (1500) peers were subscribed to a single topic. Twelve messages were introduced, each by a different publisher, with each publisher waiting 3 seconds before sending the next message. Messages larger than 600 KB take more time to reach all peers, building outgoing message queues at many peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that there is no limit on the number of IWANTs you can send for the same message. Thus, you send an IWANT to each of the nodes that send an IHAVE with a message ID that you haven't received (yet). This should be limited to an |
Let me suggest this alternative using IDONTWANT instead of introducing the new IAMRECEIVING message:
This requires another IWANT in case a message is not delivered. |
Yes, that is one big issue!
Yes, this is part of the solution, but it also requires that replying to IWANT requests be made mandatory (at least for large messages), and preamble can further limit IWANT requests! |
It is already "mandatory": not replying to a received IWANT message penalizes your score.
I'd be keen to have some small upgrades like this one before jumping into something bigger. |
While the fundamental purpose of IDONTWANT messages is: However, the use of IDONTWANT messages can be tailored to serve any of the following two purposes:
|
This is interesting feature. We were doing experiments in a similar direction with @Menduist (here he is mentioning this potential feature) To compare here are my simulation results for 3 options
The results are pretty close to yours and they are really impressive. However there were some security concerns regarding this feature. If I remember correctly the major security concern was kind of amplification attack: by sending a single preamble message an attacker would cause sending many |
Thank you @Nashatyrev for your encouraging feedback. It's great to see that both results are similar and indeed promising. Yes, using "Preamble+IMRECEIVING" messages is similar to using "Notify+IDONTWANT_on_Notify". However, the use of IMRECEIVING can help achieve additional resilience against amplification attacks. Here's how it works:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to draft this out and simulate it. I have reviewed the spec, and I have a couple of concerns:
I think all our existing control messages are about behaviors between two peers. This preamble is the first message that affects the behavior between the remote peer and the remote peer's mesh. This makes me uneasy as it may be a new attack surface.
Here are just two attacks that I could think of that exploit this:
Remember than any peer can push a message to you. This is how flood publishing works.
-
A kind of eclipse attack:
- Malicious peer (not necessarily in your mesh) sends you a preamble with an important msg id that is time-sensitive.
- The malicious peer can also set a large (max) msg size here as well to force you to delay even more.
- You send your peers
IMRECEIVING
- The malicious peer doesn't give you the data.
- Right as your estimated download timer expires, another different malicious peer sends you a preamble with the important msg id.
- You repeat from step 2. Each iteration causes a delay until the message has expired.
- Malicious peer (not necessarily in your mesh) sends you a preamble with an important msg id that is time-sensitive.
-
There may also be an amplification attack depending on the exact semantics of IMRECEIVING and IDONTWANT:
- Malicious peer (not necessarily in your mesh) sends you a preamble with a large enough msg size
- You send your peers
IMRECEIVING
- The malicious peer lied about the message size, but the id is correct for the given message.
- Do you send IDONTWANT to your peers?
- You have the message referenced in the message id, so it's true you don't want them to send you that message id.
- If you do send IDONTWANT, then this message caused 2 control messages per peer.
- And what happens if this message is really small?
- I can force you to send (D-1)*2 control messages per message.
- Would downscoring help if I lied about the message size?
- maybe, but what if there were many malicious peers? This only costs them 1 small message.
You may be able to mitigate these somewhat by only accepting preambles from peers in your mesh.
I think figuring out the proper timeouts might be tricky. Too lenient and you amplify the problems above. Too strict and you may unnecessarily penalize honest peers. The proper timeout for your peers seems trickier. They need to estimate the bandwidth between you and some other unknown peer, and the RTT between them and you.
Hello @MarcoPolo, many thanks for reviewing this draft. Yes, we have already considered and addressed the highlighted concerns in this proposal. The draft remains open for any further suggestions or revisions. To clarify the workflow outlined in this proposal, the proposed changes apply only to large messages. For smaller messages, we use standard GossipSub v1.2 operation (No preamble/IMRECEIVING needed). Message forwarding for large messages
IHAVE/IWANT processing for large messages
Regarding floodpublish for large messages
yes, this along with negative scoring, outstanding_preamble_limit, and defer_interval mitigates most of the problems |
Talking about message count, we have approximately
For a 1MB message, we noticed up to D duplicates (with approximately two duplicates coming from IWANT replies). The proposed approach reduces duplicates to under 2, as the IMRECEIVING message almost eliminates message contention time. Talking about preamble/IMReceiving transmission counts, we have approximately We can even use IDONTWANT messages as a preamble (provided that IDONTWANT announcement is immediately followed by the message transmission). |
Thanks for pointing this out. IMO, the initial preamble announcements are typically made by high-resource peers. So, adjusting timeouts based on the usual network averages can be sufficient. A per-topic lenient/aggressive strategy could also be a viable option. |
|
Yes, that requires changing the IDONTWANT semantics to promise. That is why, a separate preamble is used here |
|
Another concern which came to my mind is that slowing down message receiving for a remote peer could basically be done without score penalization: after sending a preamble the adversary may just slow down further message transfer which is basically acceptable (as remote peer outbound bandwidth may just be saturated). |
pubsub/gossipsub/gossipsub-v1.4.md
Outdated
| `peer_preamble_announcements` | The maximum number of preamble announcements for unfinished transfers per peer | 1??? | | ||
| `mesh_preamble_announcements` | The maximum number of preamble announcements to accept for unfinished transfers per topic per heartbeat interval | 3??? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need separate parameters for these limits? I would treat skipping a message body or supplying a message of a different length as a regular protocol violation (tracked with behaviourPenaltyWeight
) That should downscore a bad peer pretty quickly.
pubsub/gossipsub/gossipsub-v1.4.md
Outdated
| `peer_preamble_announcements` | The maximum number of preamble announcements for unfinished transfers per peer | 1??? | | ||
| `mesh_preamble_announcements` | The maximum number of preamble announcements to accept for unfinished transfers per topic per heartbeat interval | 3??? | | ||
| `max_iwant_requests` | The maximum number of simultaneous IWANT requests for a message | 1??? | | ||
| `preamble_threshold` | The smallest message size to use message preamble | 200KB??? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this to be parametrized in the spec? Could it be just an implementation decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need separate parameters for these limits? I would treat skipping a message body or supplying a message of a different length as a regular protocol violation (tracked with
behaviourPenaltyWeight
) That should downscore a bad peer pretty quickly.
Yes, that suffices with behavior penalties.
Limiting maximum unfinished (ongoing)transfers by one sender (peer_preamble_announcements) can provide added protection in some cases, especially when message counts/heartbeat_interval are high. But such peers will get swapped soon.
Mesh_preamble_announcements is probably not needed (will remove in next commit)
Do we really need this to be parametrized in the spec? Could it be just an implementation decision?
Just for feedback on reasonable defaults.
In general, this idea makes a lot of sense. You can send a small control message to let peers know you're in the process of receiving a large message, and please don't send me more copies. This might even help benefit the "wait" strategies discussed here: https://ethresear.ch/t/pppt-fighting-the-gossipsub-overhead-with-push-pull-phase-transition/22118. As you can wait a much smaller amount to see if a peer sends you a IMRECEIVING message. The biggest problems I see are:
But here are the things I like about this proposal:
In an effort to improve the proposal and not just naysay, here is a rough outline of some small modifications that could fix the problems while maintaining the benefits. On simplifying IMRECEIVING handling:
On setting timeouts and message length: There's an easy way to do this today without any spec changes. You simply encode the message length as part of the message ID. This prevents peers lying about the size of a message. Implementations SHOULD provide this message ID to a function that returns the timeout for receiving a given message. Implementations SHOULD also provide the remote peer's ID to this function. This timeout function MAY be implemented by the user and MAY use a node's estimated bandwidth along with other connection information related to the peer. Importantly, this does not have to be a part of the protocol spec. It is just a recommendation to implementors. |
Thank you @MarcoPolo, for bringing this up. I believe preamble/IMReceiving is a perfect fit for DAS. We are already doing some experiments on this. The transmission time for preamble/IMReceiving is nearly negligible. So, link latency is the only contention time we face. This gives us two distinct advantages:
So, we can achieve extremely low duplicate counts for huge messages without compromising latency. |
Yes, this is the most challenging part to handle.
I guess we can mitigate this issue. For instance, other mesh members may notice false-length advertisements from the IMReceiving announcement and fall back to GossipSub v1.2 for that message. |
This is a good suggestion. Thank you for taking the time to propose this improvement.
We still receive IDONTWANT messages from peers, so there's no strict need to send IHAVEs in response to IMReceiving. In this proposal, we can clarify that peers can send IWANT directly to mesh members.
We can leave the push-pull transition choice to the application. One possible approach is to include a flag in the IMReceiving message to indicate mode selection (skipping the length field can also indicate pull mode selection). |
Just to clarify my intent, a peer does not send an |
Looks like a good option to me! Another option is to add the
Sounds like overcomplicating to me and I didn't get how this may prevent sending invalid size? Adversary may still send ID with invalid size encoded |
Yes, but they will be giving you an ID for something else. They can never give you an ID for a real message with a wrong size. For example, say your ID function is simply msg_len_u64 + 32 bytes from sha256 (In reality you may want to use varint encoding for length). A message of length 512 bytes that hashes to A malicious peer can lie about the message size, but that will result in a different ID. If they lie and send a preamble for Contrast this with not including the message length as part of the ID. A malicious peer can lie about the message size for |
@MarcoPolo I think I got you |
I'm more of a concern about attack when a malicious node sends correct preamble and then slows down the actual message transfer. It's enough to have just a single malicious node in your mesh to slow down every message. And it's not that complicated to 'infect' the meshes of the majority of nodes in the network and significantly slow down messages propagation globally. |
… and IMReceiving message
…gurability, added safety strategy
1b537be
to
22d4da8
Compare
I have updated the PR based on the feedback received. Key improvements include:
|
Many thanks for highlighting this important concern. We have tried to address it in the revised draft, and additional due diligence can further strengthen the defense mechanism. For example, PREAMBLE comes from early message recipients. Typically, they are faster peers or, at least, they are better than the mesh average. Since there is no benefit in processing a PREAMBLE received from a slower peer, we can use observed peer performance to accept PREAMBLE only from peers that are better than the mesh average (mesh average profiling is trivial with PREAMBLE). |
@ufarooqstatus tbh I still see no reliable mechanism to address slow transfer after preamble. All solutions look pretty fuzzy to me. You would either stay vulnerable or penalize good peers with casually saturated outbound bandwidth To push forward the changes in this PR may be it could be better to:
|
This extension considerably reduces bandwidth utilization and network-wide message dissemination time for large messages.
Problem with existing approach (GossipSub v1.2):
Solution (Proposed extension):
More context available here