Support for long running message consumer #18456

shubham-Shole4ever · 2019-07-31T07:39:37Z

shubham-Shole4ever
Jul 31, 2019

Is your feature request related to a problem? Please describe.
The ackTimeout is set at the consumer level and is valid for all the messages that consumer handles. We have a case where the consumption of a message takes an unpredictable amount of time, ranging from 10 mins to couple hours. We also don't want to set the ackTimeout for the messages to be max possible (which could be half a day or more).
Can we have a feature where the consumer can send back a signal to the broker, acknowledging that its not failed but currently working on the received message, and the broker extends the ackTimeout for that message.

Describe the solution you'd like
A functionality which allows the consumer to notify the broker that it is working on the received message. The broker, on receiving this signal can extend the ackTimeout for that particular message (probably refreshing the ackTimeout)

Describe alternatives you've considered
Currently, there is no way to modify the ackTimeout for a particular message. The ackTimeout is set at the consumer level and cannot be modified for any message.

codelipenghui · 2019-07-31T07:59:47Z

codelipenghui
Jul 31, 2019
Collaborator

Please take a look at this document which may help you. http://pulsar.apache.org/docs/en/concepts-messaging/#negative-acknowledgement

0 replies

shubham-Shole4ever · 2019-07-31T08:58:48Z

shubham-Shole4ever
Jul 31, 2019
Author

@codelipenghui I had a look at the negative-acknowledgement. This will still not work if my ackTimeout is set to 10 mins and the message I am consuming is taking 30 mins (for e.g.). The broker will resurface the message after 10 mins, even though one of the consumer is still working on it.
I want to avoid this scenario. My proposal is to have something like a "working(messageId)" functionality on the consumer, which notifies the broker not to timeout (and resurface) the message, but rather extend/refresh the ackTimeout set for the concerned messageId.

0 replies

codelipenghui · 2019-07-31T09:16:26Z

codelipenghui
Jul 31, 2019
Collaborator

@shubham-Shole4ever
You can disable ack timeout, just use ack/negative ack. It means explicitly telling the broker that the process failed and then the broker redeliver this message, if message is in progress, no need to ack/negative ack.

0 replies

shubham-Shole4ever · 2019-07-31T09:39:10Z

shubham-Shole4ever
Jul 31, 2019
Author

@codelipenghui
But if my application crashes while it is processing the message, it'll never be able to ack/negative ack that message ever. This'll result in that message never being retried. This is the exact scenario why I cannot ditch the ackTimeout as well.

0 replies

merlimat · 2019-07-31T13:07:21Z

merlimat
Jul 31, 2019
Collaborator

@shubham-Shole4ever When a consumer crashes, or the TCP connection is broken, the messages that were delivered to this consumer and not acked, will be replayed to another available consumer (in case of shared subscriptions) or next time the consumer reconnects.

You don't need ack timeout for that.

0 replies

sijie · 2019-08-05T07:46:24Z

sijie
Aug 5, 2019
Collaborator

@shubham-Shole4ever does Matteo's comment make sense to you?

0 replies

shubham-Shole4ever · 2019-08-06T06:29:48Z

shubham-Shole4ever
Aug 6, 2019
Author

@sijie I can do with the workaround suggested by @codelipenghui and @merlimat for the time being. However, as mentioned, the solution will not work in case I also have a need of ackTimeout.
Would request the community to propose a feature to handle such cases for future.

Thanks @codelipenghui and @merlimat for all the help. :)

0 replies

harissecic · 2021-08-17T20:51:12Z

harissecic
Aug 17, 2021

Stumbled upon this looking for another answer. However in case it helps anyone I'll leave a comment. For such cases I guess it's possible to combine DLQ with ackTimeout. Default value I use is 3. Although I don't use auto-ack I guess it will still work the same way. If message times-out 3 times (in this case) it will automatically go to Dead Letter Queue. This will prevent it to loop endlessly between services. My example is that I'm building up a module for a framework. Now in such case I don't do ackTimeout but let users set it if they want to. However, I do by default set 3 retries before DLQ. Reason was personal experience where it endlessly looped my test message to the shared consumers and I got error logs all the time and couldn't figure it out. Then I realised well message is simply getting negativeAck from each consumer and then redelivered all the time but funny thing is it was malformed JSON message so consumers were doomed to crash (validations falied for previously nullable thing in kotlin that I moved to non-null). When I set up DLQ to 3 I had some messages fail and then get re-read due to timeout for ack. But combining DLQ, ackTimeout, and shared consumers I think you can set timeout pretty low if processing data takes less time and you do manual ACK as soon as it's done.

0 replies

benbro · 2022-11-14T07:20:30Z

benbro
Nov 14, 2022

There is still no good solution for retrying long running jobs.

0 replies

harissecic · 2022-11-14T08:28:27Z

harissecic
Nov 14, 2022

There is still no good solution for retrying long running jobs.

I think there's plenty of good enough workarounds but I agree there should be one optimal for long running consumers out-of-the-box. Just to list a few:

Using message properties on consumers with reconsumeLater - not sure which version starts to support this feature but adding properties to the message like processing=true and later isDone=true would require just a little extra code to check these properties before even trying to consume the message. If done is set to true simply ack message and move to the next.
Using readers with similar approach where message metadata/properties are read. In some cases consumers are not needed and using reader is a bit more simpler but in others we do really want the consumer - so not really a workaround in context of this case.
Combining DLQ with negAck and later processing DLQ with extra custom code to check if something was done already. Putting max redelivery to 1 would make message automatically on the next retry going directly to DLQ after timeout. This of course would require local concurrent cache where you keep processing ID-s in runtime memory and check them on message arrivals so you can simply negAck message if it's still processing. This way after processing actual message consumer can trigger "removing" message from DLQ. This would support both ackTimout and manually handling timeouts.
Trying to cache everything in DB or such and looking for messageIds, started processing time, allowed timeouts, ... Upon receiving message check this list and determine whether the message is being processed still or failed and this was a consumer restart.

I assume some kind of 3 would be good to have out-of-the-box. Best of course would be to have something like LRQ (long running queue for the lack of creativity from my side) where upon retry of ackTimeout consumer has the option to send back the message to broker like 'still processing' and it moves message to this queue and have Pulsar track if TCP dies, push them back to normal queue and retry, if TCP is alive let consumer tell when this message should be removed. Using DLQ for this is also possible but confuses messages that where retired too much and the ones that consumer is aware take too long.

0 replies

tisonkun · 2022-11-14T08:33:51Z

tisonkun
Nov 14, 2022
Collaborator

I'm moving this discussion to the Discussions forum since it's an open-ended discussion instead of an actionable task :)

0 replies

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for long running message consumer #18456

{{title}}

Replies: 12 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

This comment was marked as off-topic.

{{title}}

{{title}}

{{title}}

Select a reply

Support for long running message consumer #18456

shubham-Shole4ever Jul 31, 2019

Replies: 12 comments

codelipenghui Jul 31, 2019 Collaborator

shubham-Shole4ever Jul 31, 2019 Author

codelipenghui Jul 31, 2019 Collaborator

shubham-Shole4ever Jul 31, 2019 Author

merlimat Jul 31, 2019 Collaborator

sijie Aug 5, 2019 Collaborator

shubham-Shole4ever Aug 6, 2019 Author

harissecic Aug 17, 2021

This comment was marked as off-topic.

benbro Nov 14, 2022

harissecic Nov 14, 2022

tisonkun Nov 14, 2022 Collaborator

shubham-Shole4ever
Jul 31, 2019

codelipenghui
Jul 31, 2019
Collaborator

shubham-Shole4ever
Jul 31, 2019
Author

codelipenghui
Jul 31, 2019
Collaborator

shubham-Shole4ever
Jul 31, 2019
Author

merlimat
Jul 31, 2019
Collaborator

sijie
Aug 5, 2019
Collaborator

shubham-Shole4ever
Aug 6, 2019
Author

harissecic
Aug 17, 2021

benbro
Nov 14, 2022

harissecic
Nov 14, 2022

tisonkun
Nov 14, 2022
Collaborator