Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unavailable Server Causes Controller Load Issues #14930

Open
itschrispeck opened this issue Jan 28, 2025 · 0 comments
Open

Unavailable Server Causes Controller Load Issues #14930

itschrispeck opened this issue Jan 28, 2025 · 0 comments

Comments

@itschrispeck
Copy link
Collaborator

itschrispeck commented Jan 28, 2025

Core issue:
Pinot is unable to safely ingest/serve queries from remaining replicas for a prolonged period due to some sort of retry logic impacting controller functionality.

Background:
Recently we saw a controller struggling to process ZKEvents as fast as they were created. This began happening after a server failed to start due to a deadlock condition, and was left in this state for a few days. Controller CPU is elevated during this period, and eventually the throughput of callbacks/events is too high for processing to keep up:
Image

It looks like the slow event processing was due to resource starvation, with Helix's ZKEventThread presumably struggling to be scheduled. From our metrics, we see a huge increase in ZK transaction volume (metric is tx log size, which is flushed every 1h):
Image

Looking at snapshot of the cluster during this time, it seems likely that the transactions were under the dead server's MESSAGES znode:

(CONNECTED [localhost:55179]) /pinot/pinot-<redacted>/<redacted>-cluster/INSTANCES/eb92c571-ca4e-4035-8bf0-fc09a9c40e4b> stat MESSAGES
Stat(
  czxid=0x20000098a
  mzxid=0x20000098a
  ctime=1731131435565
  mtime=1731131435565
  version=0
  cversion=92797988
  aversion=0
  ephemeralOwner=0x0
  dataLength=0
  numChildren=3348
  pzxid=0x30f152e04
)

For reference, other servers in this cluster have a ~200-300k cversion. However, when looking at the messages themselves I see the message znodes are created and unmodified for a long time - to me it is not yet clear which child znodes are being modified.

Another phrasing of the issue may be: failed messages continue to load controller/ZK even after failing:
Image

One note about cluster/table setup: we use minion for upsert compaction, which generates a lot more messages than is typical for a realtime table of this size.

Has anyone seen something similar? I haven't yet walked through the relevant helix code. The end goal of raising this issue is to understand how we can prevent a dead server causing such a large load increase on controller and ZK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant