Namespace bundle ownership during kubernetes node upgrades #23873
Unanswered
ahabel-wob
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We run apache pulsar in kubernetes, and in some cases it does not survive the node upgrade process. The pulsar version is 4.0.0, but will update to 4.0.2 soon.
We are running pulsar on 4 nodes, and the pods are the following:
In the logs we see this error (for all kinds of topics):
Failed to create consumer: consumerId=23954, Namespace bundle for topic (persistent://app/platform-prod/__transaction_buffer_snapshot-partition-4) not served by this instance:app-pulsar-prd-broker-1.app-pulsar-prd-broker.apache-pulsar.svc.cluster.local:8080. Please redo the lookup. Request is denied: namespace=app/platform-prod
The error is logged between proxy and the brokers, it fails to do a new lookup. The broker can be stuck like this for hours, e.g.: 6 hours. Only a manual broker restart resolves the issue.
We will try to reproduce this on stage environment by draining a node when the system is under load.
What causes this, and how can we configure our pulsar cluster to be more resilient?
The slack pulsar's AI response is the following:
The issue you're experiencing seems related to the broker's inability to correctly handle namespace bundle ownership during node upgrades or restarts. This can lead to lookup failures and prolonged unavailability of certain topics. According to the Pulsar documentation on load balancing, topics are dynamically assigned to brokers based on load conditions, and issues can arise if the ownership of namespace bundles is not correctly managed during broker restarts.
Beta Was this translation helpful? Give feedback.
All reactions