fix: robust Modbus Receive with desync detection and resync ability #2362

shaunco · 2025-11-25T05:45:09Z

In a rather large deployment that includes ~30 Global Control 5 iSMA-B-MIX38-IP gateways that each have 10-30 Conto D6 energy meters on the RTU bus behind them, and PLC4Go running on a Yocto device, we're seeing two issues:

Somewhat randomly, the iSMA leaks RTU CRC16 values onto the TCP stream but doesn't include them in the MBAP length field (I suppose that is technically "correct" to not include them in the length, haha?). This causes MessageCodec to consume the MBAP+PDU properly, but then leave the CRC16 on the buffer for the next .Receive() call to deal with. The old code obviously didn't like this.
When the TCP connection to the iSMA is sitting unused in the connection cache, TCP keep-alives are happening. These keep alives are <60 bytes, so the iSMA adds padding to the Ethernet frame header. That padding length is properly listed in the Ethernet header, but the NIL bytes correctly follow the TCP header (+0 byte TCP payload) as it should. The NIC and/or kernel should discard these extra NIL bytes, but they seem to randomly appear in our receive buffer. (I will separately dig into why this is happening on our specific hardware+Yocto 🫠)

To deal with both of these, I've updated MessageCodec to try its best to detect these scenarios where the stream becomes desynchronized and to burn bytes in an attempt to resynchronize the stream.

One related additional issue, which I'll submit a fix for in a separate PR:

If Receive() is interrupted by a context cancelation/timeout and the connection lease is returned to the cache by the PLC4X user, the connection potentially has unprocessed bytes on it, and more data can arrive from slow responses while it sits unused in the cache. This makes the Receive() stream potentially polluted when the next lease is obtained, even though MessageCodec is unaware of the cache and is expecting an empty receive stream buffer. Depending on where the prior user had left off, this also causes a desync. This PR deals corrects most of this dirty buffer issue, but the log is a bit noisy. The proper fix is to have the connection cache flush the receive buffer prior to handling out a lease... that is: prior lessee didn't want the bytes, new lessee isn't expecting them. Someone has to clean up.

(EDIT: looks like @sruehl beat me to this next one)

If you look through all my changes (sorry, please squash), I started with just dealing with the CRC16 issue by calculating CRC16 and then peeking the next two bytes... but then ran into the second issue and realized the desync/sync should be more generic. That said, the next PR will catch the padding leak issue when the connection is sitting in cache, but it won't help with padding leaks if a lessee is holding the connection lease but making requests spaced enough that a keep-alive sneaks in.

- retry parsing with full buffered data when initial MBAP length is wrong - treat EOF as incomplete data and avoid dropping partial frames - keep discarding truly unparsable packets with diagnostic logging

fix: attempt to resynchronize the read stream if desynchronized

fix: io.EOF is a trap, we checked fragmentation above

…ling

chrisdutz · 2025-11-25T05:59:26Z

Could you please add a bit more information on what the problem is, that you're trying to solve? Possibly this is also interesting for other languages.

shaunco · 2025-11-25T17:03:04Z

Could you please add a bit more information on what the problem is, that you're trying to solve? Possibly this is also interesting for other languages.

🤦‍♂️ yeah, I should have done that. I added details above.

chrisdutz · 2025-11-25T20:44:49Z

Geee ... Modbus seems to be the "Standard" with so many "mandatory" things, that nobody really seems to care about. I know in the PLC4J we have some code to try to recover from situations like this (However we have that in the SPI, no idea if we're using that in the Modbus driver however) ... if you want I can have a look after I return from the indusry frair I'm currently at (would be Thursday) .... if you're in a hurry ... I have no objections to merging this PR ... I'll just try and have a look if anything needs porting to other languages.

shaunco · 2025-11-25T21:26:12Z

I had considered if there was some way to have the mspec protocol generators add new protocol functions like CheckSync() that could peek bytes from the receive buffer, using mspec knowledge, to declare the stream sane or not ... and if not, something like Resync() that, again using mspec knowledge, could return how many bytes to discard from the stream to attempt to become sane again. It seemed silly to only have this on modbus (Ethernet padding leak could hit any protocol) and similarly silly to reimplement MBAP parsing/validation in the MessageCodec when most of that is sprinkled throughout the existing generated protocol code.

The issue was I don't know nearly enough about mspec or the generators, and I only had this one live environment with Modbus to test against.

I'm not in a rush if you want to hold on merging this for a better or more generalized method of handling this.

shaunco and others added 11 commits November 22, 2025 15:45

fix: harden Modbus Receive against truncated/extended frames

f2b91fe

- retry parsing with full buffered data when initial MBAP length is wrong - treat EOF as incomplete data and avoid dropping partial frames - keep discarding truly unparsable packets with diagnostic logging

fix: watch for and discard trailing CRCs from misbehaving gateways

77ef19d

fix: attempt to resynchronize the read stream if desynchronized

fix: additional sanity checks on the MBAP

9a256e4

fix: io.EOF is a trap, we checked fragmentation above

fix: deal with TCP keep-alive padding that leaks from the kernel

f829886

refactor: attempting to simply logic while still covering desync hand…

dfaedca

…ling

refactor: more robust consistency checks

3f48021

fix: Final consistency check case should discard all available bytes

8de94d7

refactor: reduce log spam

6735c7d

fix: handleDesync more robustly handles the padding leak issue

6727c48

fix: keep the last 5 bytes to avoid breaking fragmentation

92cc909

Merge branch 'apache:develop' into modbus-robustReceive

dce8726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: robust Modbus Receive with desync detection and resync ability #2362

fix: robust Modbus Receive with desync detection and resync ability #2362

shaunco commented Nov 25, 2025 •

edited

Loading

Uh oh!

chrisdutz commented Nov 25, 2025

Uh oh!

shaunco commented Nov 25, 2025

Uh oh!

chrisdutz commented Nov 25, 2025

Uh oh!

shaunco commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: robust Modbus Receive with desync detection and resync ability #2362

Are you sure you want to change the base?

fix: robust Modbus Receive with desync detection and resync ability #2362

Conversation

shaunco commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdutz commented Nov 25, 2025

Uh oh!

shaunco commented Nov 25, 2025

Uh oh!

chrisdutz commented Nov 25, 2025

Uh oh!

shaunco commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaunco commented Nov 25, 2025 •

edited

Loading