MQTT publish seems to prevent detection of connection loss #802
-
I'm using the Aws::Crt::Mqtt::MqttConnection::Publish method (aws-iot-device-sdk-cpp-v2 v1.32.0) to send data to the AWS MQTT backend approximately every second. Both OnConnectionInterrupted and OnDisconnect callbacks of MqttConnectionHandlers are registered to handle connection events. The connection is configured with the following settings:
The issue arises when the device enters suspend mode (e.g., system sleep). While suspended, the application is frozen and unable to perform any MQTT/TCP activity. During this period, the backend closes the connection due to timeout. However, because the application is frozen, it misses the disconnection. Upon resume, the application continues calling Publish() as usual. At this point:
This state persists for up to 30 minutes, until a socket timeout finally occurs. At that point, the SDK emits a connection interrupted event and reconnects. I would expect the keep-alive ping to be sent even during publish operations to allow timely detection of a dead connection. A log excerpt is provided below showing repeated publish failures, ping postponements, and eventual disconnection due to a socket timeout. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
There's no way to control this behavior but I agree that it is undesirable. Ultimately what's broken is that the ping push-out should only occur on receipt of Acks for submitted operations (and the push out should only be against the timestamp that the acknowledged operation was sent out on). In your case, there would never be a push out because nothing ever gets acked. The MQTT5 client does not have this behavior as near as I can tell, and given that it is also a superior protocol (and client implementation), I would switch to it. |
Beta Was this translation helpful? Give feedback.
-
We addressed the ping request scheduling logic last year, and with the current implementation (assuming the fix is working as expected), ping requests should only be scheduled after receiving successful ACKs. Based on that, the behavior you're seeing shouldn’t happen—or it may be unrelated to the publish operation itself. I ran a local test using a mock server that never sends PUBACKs or PINGRESPs, and in that case, the client was able to initiate pings with the ongoing publishes. One area worth checking is whether the device clock might have been affected by a system suspend/resume event. The client uses timestamps to determine when to send a ping (The code link: https://github.com/awslabs/aws-c-mqtt/blob/main/source/client_channel_handler.c#L96). (Well, it still not explain why it run the ping task ~7 seconds after a publish operation.) To investigate further, could you please share the log with TRACE level enabled? That would give us more insight into what’s happening. |
Beta Was this translation helpful? Give feedback.
The issue was fixed here in April 24: https://github.com/awslabs/aws-c-mqtt/releases/tag/v0.10.4
So if you're using 1.32.0 (which is Jan 24) you have the old, bad behavior; updating should fix.