Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC caused "disk is full" for one of logDirs for broker during rebalancing. #1590

Open
emelyanovtv opened this issue Jun 16, 2021 · 4 comments

Comments

@emelyanovtv
Copy link

emelyanovtv commented Jun 16, 2021

Description:

We got the error: disk is full during rebalancing. Basically, we have 2 logs dirs per broker, which have the same size. But when rebalancing was running, we noticed that only one disk (log dir) for the broker has been filled with new data. The disk capacity (will be described below) for the log dir /var/dirs/kafka/data/topics was set to 3584000 Mb but once it finished with error and after we increased disk size for this specif log dir (out of disk space) became 3644675 Mb. Why can such kinds of things happen? Can you help us to have more clear explanations for this error?

The main assumption why this happened is that we moved almost 11 Tb of data among brokers and it took 2 days. Perhaps it can be root cause for this error.

The steps how it was:

  • Run rebalancing
  • Hit the limit for one broker and one of the log dirs has been full. Size for the failed broker and on of the log dir (let's say broker-1 /var/dirs/kafka/data/topics ) was 3584000 Mb
  • We increased the size for this specific log dir
  • Restarted the broker
  • Everything has been up successfully, the size for the same log dir (broker-1 /var/dirs/kafka/data/topic) became 3644675 Mb

Question:

  • Is it happened to anyone before, and how I can avoid that?

Current setup

  • CC: 2.0.168
  • kafka: 2.3.0
  • execution task has been triggered POST /kafkacruisecontrol/rebalance?json=true&dryrun=false&concurrent_partition_movements_per_broker=4&concurrent_leader_movements=10
  • total duration before it was failed took almost 50 hours (179776 sec)
  • each broker has 2 log dirs.
  • settings

cruisecontrol.properties:

capacity.config.file=config/capacity.json

capacity.json (for all brokers we have the same settings as below)

{
              "brokerId": "N",
              "capacity": {
                  "DISK": {
                    "/var/dirs/kafka/data/topics": "3584000",
                    "/var/dirs/kafka/data1/topics": "3584000"
                  },
                  "CPU": "100",
                  "NW_IN": "10000",
                  "NW_OUT": "10000"
              },
              "doc": "Capacity unit used for disk is in MB, cpu is in percentage, network throughput is in KB."
          }

If you need more info from me I'll share it with you easilly.

@emelyanovtv
Copy link
Author

Is anybody can help with some assumptions or something? Because I'm running out of an ideas.

@efeg
Copy link
Collaborator

efeg commented Jul 1, 2021

@emelyanovtv I am curious what the load endpoint of Cruise Control shows with populate_disk_info=true.
I wonder if the same disk is used by other services, causing the capacity to be used by not only Kafka, but also some other service.

@emelyanovtv
Copy link
Author

those disks are only for kafka brokers (dedicated).
BTW, after we got this error I did rebalance manually. I'll post part of data for brokers, but I checked and everything seems to me valid.

                       HOST         BROKER          RACK                               LOGDIR        DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
 kafka-0.broker,           200,rack-b,                                             7168000.000,        5019532.000/70.03,                  1,        71.517,               10000.000,                 223.701,                 463.693,               10000.000,           1221.138,           3717.967,           366/1128
                                                                                  /var/dirs/kafka/data/topics,        3005975.379/83.87,                                                                                                                    320/992
                                                                                 /var/dirs/kafka/data1/topics,        2013565.704/56.18,                                                                                                                     46/136
 kafka-1.broker,           201,rack-c,                                             7168000.000,        4524581.500/63.12,                  1,        77.026,               10000.000,                 225.302,                 519.606,               10000.000,           1261.675,           4125.801,           358/1121
                                                                                  /var/dirs/kafka/data/topics,        2928987.955/81.72,                                                                                                                    326/1040
                                                                                 /var/dirs/kafka/data1/topics,        1595596.534/44.52,                                                                                                                     32/81
 kafka-2.broker,           202,rack-a,                                             7168000.000,        4736796.500/66.08,                  1,        66.809,               10000.000,                 233.720,                 509.744,               10000.000,           1282.256,           4090.291,           419/1097
                                                                                  /var/dirs/kafka/data/topics,        2971493.640/82.91,                                                                                                                    388/986
                                                                                 /var/dirs/kafka/data1/topics,        1765304.770/49.26,                                                                                                                     31/111

@emelyanovtv
Copy link
Author

@efeg any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants