You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Core issue:
Pinot is unable to safely ingest/serve queries from remaining replicas for a prolonged period due to some sort of retry logic impacting controller functionality.
Background:
Recently we saw a controller struggling to process ZKEvents as fast as they were created. This began happening after a server failed to start due to a deadlock condition, and was left in this state for a few days. Controller CPU is elevated during this period, and eventually the throughput of callbacks/events is too high for processing to keep up:
It looks like the slow event processing was due to resource starvation, with Helix's ZKEventThread presumably struggling to be scheduled. From our metrics, we see a huge increase in ZK transaction volume (metric is tx log size, which is flushed every 1h):
Looking at snapshot of the cluster during this time, it seems likely that the transactions were under the dead server's MESSAGES znode:
For reference, other servers in this cluster have a ~200-300k cversion. However, when looking at the messages themselves I see the message znodes are created and unmodified for a long time - to me it is not yet clear which child znodes are being modified.
Another phrasing of the issue may be: failed messages continue to load controller/ZK even after failing:
One note about cluster/table setup: we use minion for upsert compaction, which generates a lot more messages than is typical for a realtime table of this size.
Has anyone seen something similar? I haven't yet walked through the relevant helix code. The end goal of raising this issue is to understand how we can prevent a dead server causing such a large load increase on controller and ZK.
The text was updated successfully, but these errors were encountered:
Core issue:
Pinot is unable to safely ingest/serve queries from remaining replicas for a prolonged period due to some sort of retry logic impacting controller functionality.
Background:
Recently we saw a controller struggling to process ZKEvents as fast as they were created. This began happening after a server failed to start due to a deadlock condition, and was left in this state for a few days. Controller CPU is elevated during this period, and eventually the throughput of callbacks/events is too high for processing to keep up:
It looks like the slow event processing was due to resource starvation, with Helix's ZKEventThread presumably struggling to be scheduled. From our metrics, we see a huge increase in ZK transaction volume (metric is tx log size, which is flushed every 1h):
Looking at snapshot of the cluster during this time, it seems likely that the transactions were under the dead server's
MESSAGES
znode:For reference, other servers in this cluster have a ~200-300k cversion. However, when looking at the messages themselves I see the message znodes are created and unmodified for a long time - to me it is not yet clear which child znodes are being modified.
Another phrasing of the issue may be: failed messages continue to load controller/ZK even after failing:
One note about cluster/table setup: we use minion for upsert compaction, which generates a lot more messages than is typical for a realtime table of this size.
Has anyone seen something similar? I haven't yet walked through the relevant helix code. The end goal of raising this issue is to understand how we can prevent a dead server causing such a large load increase on controller and ZK.
The text was updated successfully, but these errors were encountered: