Skip to content

Conversation

@shaunco
Copy link
Contributor

@shaunco shaunco commented Nov 25, 2025

In a rather large deployment that includes ~30 Global Control 5 iSMA-B-MIX38-IP gateways that each have 10-30 Conto D6 energy meters on the RTU bus behind them, and PLC4Go running on a Yocto device, we're seeing two issues:

  1. Somewhat randomly, the iSMA leaks RTU CRC16 values onto the TCP stream but doesn't include them in the MBAP length field (I suppose that is technically "correct" to not include them in the length, haha?). This causes MessageCodec to consume the MBAP+PDU properly, but then leave the CRC16 on the buffer for the next .Receive() call to deal with. The old code obviously didn't like this.
  2. When the TCP connection to the iSMA is sitting unused in the connection cache, TCP keep-alives are happening. These keep alives are <60 bytes, so the iSMA adds padding to the Ethernet frame header. That padding length is properly listed in the Ethernet header, but the NIL bytes correctly follow the TCP header (+0 byte TCP payload) as it should. The NIC and/or kernel should discard these extra NIL bytes, but they seem to randomly appear in our receive buffer. (I will separately dig into why this is happening on our specific hardware+Yocto 🫠)

To deal with both of these, I've updated MessageCodec to try its best to detect these scenarios where the stream becomes desynchronized and to burn bytes in an attempt to resynchronize the stream.

One related additional issue, which I'll submit a fix for in a separate PR:

If Receive() is interrupted by a context cancelation/timeout and the connection lease is returned to the cache by the PLC4X user, the connection potentially has unprocessed bytes on it, and more data can arrive from slow responses while it sits unused in the cache. This makes the Receive() stream potentially polluted when the next lease is obtained, even though MessageCodec is unaware of the cache and is expecting an empty receive stream buffer. Depending on where the prior user had left off, this also causes a desync. This PR deals corrects most of this dirty buffer issue, but the log is a bit noisy. The proper fix is to have the connection cache flush the receive buffer prior to handling out a lease... that is: prior lessee didn't want the bytes, new lessee isn't expecting them. Someone has to clean up.

(EDIT: looks like @sruehl beat me to this next one)

If you look through all my changes (sorry, please squash), I started with just dealing with the CRC16 issue by calculating CRC16 and then peeking the next two bytes... but then ran into the second issue and realized the desync/sync should be more generic. That said, the next PR will catch the padding leak issue when the connection is sitting in cache, but it won't help with padding leaks if a lessee is holding the connection lease but making requests spaced enough that a keep-alive sneaks in.

@chrisdutz
Copy link
Contributor

Could you please add a bit more information on what the problem is, that you're trying to solve? Possibly this is also interesting for other languages.

@shaunco
Copy link
Contributor Author

shaunco commented Nov 25, 2025

Could you please add a bit more information on what the problem is, that you're trying to solve? Possibly this is also interesting for other languages.

🤦‍♂️ yeah, I should have done that. I added details above.

@chrisdutz
Copy link
Contributor

Geee ... Modbus seems to be the "Standard" with so many "mandatory" things, that nobody really seems to care about. I know in the PLC4J we have some code to try to recover from situations like this (However we have that in the SPI, no idea if we're using that in the Modbus driver however) ... if you want I can have a look after I return from the indusry frair I'm currently at (would be Thursday) .... if you're in a hurry ... I have no objections to merging this PR ... I'll just try and have a look if anything needs porting to other languages.

@shaunco
Copy link
Contributor Author

shaunco commented Nov 25, 2025

I had considered if there was some way to have the mspec protocol generators add new protocol functions like CheckSync() that could peek bytes from the receive buffer, using mspec knowledge, to declare the stream sane or not ... and if not, something like Resync() that, again using mspec knowledge, could return how many bytes to discard from the stream to attempt to become sane again. It seemed silly to only have this on modbus (Ethernet padding leak could hit any protocol) and similarly silly to reimplement MBAP parsing/validation in the MessageCodec when most of that is sprinkled throughout the existing generated protocol code.

The issue was I don't know nearly enough about mspec or the generators, and I only had this one live environment with Modbus to test against.

I'm not in a rush if you want to hold on merging this for a better or more generalized method of handling this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants