Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to configure and run cruisecontrol perfectly. #1607

Open
rahu7624 opened this issue Jul 1, 2021 · 7 comments
Open

Unable to configure and run cruisecontrol perfectly. #1607

rahu7624 opened this issue Jul 1, 2021 · 7 comments
Assignees
Labels
question A code or meta question about the project.

Comments

@rahu7624
Copy link

rahu7624 commented Jul 1, 2021

Hi Team ,

We are getting below errors while checking cruisecontrol status , can you please check and suggest.

[root@kafka-0 ~]# systemctl status cruisecontrol -l
● cruisecontrol.service - Zookeeper
Loaded: loaded (/etc/systemd/system/cruisecontrol.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-07-01 14:08:30 UTC; 3min 59s ago
Main PID: 29352 (cc.sh)
CGroup: /system.slice/cruisecontrol.service
├─29352 /bin/bash /usr/local/bin/cc.sh
└─29354 java -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=./logs -Dlog4j.configurationFile=file:./config/log4j.properties -cp ./cruise-control/build/dependant-libs/:./cruise-control/build/libs/:./cruise-control-metrics-reporter/build/libs/* com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain config/cruisecontrol.properties

Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for ReplicaCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector)
Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for DiskCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector)
Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,861] WARN Skipping goal violation detection for NetworkInboundCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector)
Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,862] WARN Skipping goal violation detection for NetworkOutboundCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector)
Jul 01 14:11:41 kafka-0 cc.sh[29352]: [2021-07-01 14:11:41,862] WARN Skipping goal violation detection for CpuCapacityGoal because load completeness requirement is not met. (com.linkedin.kafka.cruisecontrol.detector.GoalViolationDetector)
Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,398] INFO Start to detect topic replication factor anomaly. (com.linkedin.kafka.cruisecontrol.detector.TopicAnomalyFinder)
Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,399] WARN TOPIC_ANOMALY detected {Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}. Self healing start time 2021-07-01T14:11:45Z. (com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier)
Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,400] WARN Self-healing has been triggered. (com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier)
Jul 01 14:11:45 kafka-0 cc.sh[29352]: [2021-07-01 14:11:45,472] WARN Skipping TOPIC_ANOMALY fix because load completeness requirement is not met for goals. (com.linkedin.kafka.cruisecontrol.detector.AnomalyDetectorManager)
Jul 01 14:12:11 kafka-0 cc.sh[29352]: [2021-07-01 14:12:11,598] INFO Skipping proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer)
[root@kafka-0 ~]#

[root@kafka-0 kafka]# curl 'http://localhost:9090/kafkacruisecontrol/state'
MonitorState: {state: RUNNING(0.000% trained), NumValidWindows: (0/0) (NaN%), NumValidPartitions: 0/0 (0.000%), flawedPartitions: 0}
ExecutorState: {state: NO_TASK_IN_PROGRESS}
AnalyzerState: {isProposalReady: false, readyGoals: []}
AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, METRIC_ANOMALY, GOAL_VIOLATION, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, METRIC_ANOMALY=1.0, GOAL_VIOLATION=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=c3044efe-1176-461e-bd21-9b16418bc815, detectionDate=2021-07-01T14:11:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:11:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=20958eec-b7fa-4fc4-8c6a-38f000a20b09, detectionDate=2021-07-01T14:09:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:09:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=a61584a9-0d44-472c-b2b1-b8740a3c6ced, detectionDate=2021-07-01T14:13:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:13:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=2e5612d6-0c3c-4e38-a478-ca06b7eeb265, detectionDate=2021-07-01T14:15:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-01T14:15:45Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.88 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=6.31 minutes}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}

[root@kafka-0 kafka]#

@rahu7624 rahu7624 changed the title Unable to configure cruisecontrol perfectly. Unable to operate cruisecontrol perfectly. Jul 1, 2021
@rahu7624 rahu7624 changed the title Unable to operate cruisecontrol perfectly. Unable to configure and run cruisecontrol perfectly. Jul 1, 2021
@efeg
Copy link
Collaborator

efeg commented Jul 1, 2021

COMPLETENESS_NOT_READY means that Cruise Control (CC) was unable to collect sufficient samples from Kafka to generate a cluster model on which it will operate to perform goal-based cluster maintenance operations. This could be due to either of the two (1) you have just started CC, so it hasn't had time to collect samples, yet (give it some time and see if CC logs shows that it was able to collect samples a new window is rolled) (2) there is a problem in collecting samples from Kafka. Can you verify that you configured metrics reporter correctly on Kafka-side? Did you follow the quick-start tutorial on CC Github page to setup metrics reporter? Does your metrics reporter topic get any data from Kafka?

@efeg efeg self-assigned this Jul 1, 2021
@efeg efeg added the question A code or meta question about the project. label Jul 1, 2021
@rahu7624
Copy link
Author

rahu7624 commented Jul 2, 2021

Hi Adem ,

Thanks for looking into it , its a test setup having 3 nodes with just one test topic and currently no data flowing in/out.
I simply referred quick-start tutorial and configured the same way on all 3 nodes.
Kindly refer Kafka side configs for the same and let us know if any changes are required.

[rahul@kafka-0 ~]$ cat /usr/local/share/kafka/config/server.properties | grep -i cruise
metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
cruise.control.metrics.topic.auto.create=true
cruise.control.metrics.topic.num.partitions=1
cruise.control.metrics.topic.replication.factor=1

Thanks in advance.

@rahu7624
Copy link
Author

rahu7624 commented Jul 2, 2021

However situation is still the same even after 18 hours.

[rahul@kafka-0 ~]$ curl -X GET "http://localhost:9090/kafkacruisecontrol/state"
MonitorState: {state: RUNNING(0.000% trained), NumValidWindows: (0/0) (NaN%), NumValidPartitions: 0/0 (0.000%), flawedPartitions: 0}
ExecutorState: {state: NO_TASK_IN_PROGRESS}
AnalyzerState: {isProposalReady: false, readyGoals: []}
AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, METRIC_ANOMALY, GOAL_VIOLATION, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, METRIC_ANOMALY=1.0, GOAL_VIOLATION=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=b5852ac0-9ce9-4721-81bd-a6d89df6e7f5, detectionDate=2021-07-02T08:19:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:19:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=297d6fdd-ee77-4375-b787-f3e8fa39996b, detectionDate=2021-07-02T08:23:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:23:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=b6d70146-0c31-45a5-9d7e-f8f4f9c1c4a1, detectionDate=2021-07-02T08:27:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:27:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=f7344f75-1ccb-4215-8ae7-e0ca9347f2da, detectionDate=2021-07-02T08:35:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:35:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=7d61a3b9-e056-47bc-89a5-7a69fb4e414a, detectionDate=2021-07-02T08:21:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:21:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=d53e99a2-102c-452d-b3fd-c13741a4241c, detectionDate=2021-07-02T08:31:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:31:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=517c91d0-9c10-4c8b-9666-ea95c2ebb490, detectionDate=2021-07-02T08:25:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:25:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=33e18a5e-d05f-46ee-b63e-51dfa3ba44e3, detectionDate=2021-07-02T08:29:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:29:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=0fb9a33e-95c1-442f-8aa3-b8eff0814315, detectionDate=2021-07-02T08:37:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:37:45Z}, {description={Topics with replication factor violations: [{With desired RF 2: [{test(100.00)}]}]}, anomalyId=980cd90d-0f36-4463-988a-f6da8c68df09, detectionDate=2021-07-02T08:33:45Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-02T08:33:45Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.33 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=18.49 hours}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}

[rahul@kafka-0 ~]$

@efeg
Copy link
Collaborator

efeg commented Jul 2, 2021

@rahu7624 Do you see any data going into the __CruiseControlMetrics topic -- i.e. does it grow in size? If not, this is an issue with the Kafka-side configs. Here is a checklist that might help:

  • Did you run ./gradlew jar to generate ./cruise-control-metrics-reporter/build/libs/cruise-control-metrics-reporter-A.B.C.jar (Where A.B.C is the version of the Cruise Control) (Note: This project requires Java 11)?
  • Did you copy ./cruise-control-metrics-reporter/build/libs/cruise-control-metrics-reporter-A.B.C.jar (Where A.B.C is the version of the Cruise Control) to the correct Kafka server dependency jar folder for each Kafka broker you are running? For Apache Kafka, the folder would be core/build/dependant-libs-SCALA_VERSION/ (for a Kafka source checkout) or libs/ (for a Kafka release download). When you start Kafka, do you see logs generated by Cruise Control Metrics Reporter (you should)?
  • Did you modify Kafka server configuration to set metric.reporters to
    com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter for each Kafka broker you are running? For Apache Kafka, server properties are located at ./config/server.properties.
  • If SSL is enabled, did you ensure that the relevant client configurations are properly set for all brokers in
    ./config/server.properties? Note that CruiseControlMetricsReporter takes all configurations for vanilla
    KafkaProducer with a prefix of cruise.control.metrics.reporter. -- e.g.
    cruise.control.metrics.reporter.ssl.truststore.password.

@rahu7624
Copy link
Author

rahu7624 commented Jul 5, 2021

Tried reconfiguring it the way you advised , seems it started collecting some metrices but still giving some errors.

[root@kafka-2 kafka]# systemctl status cruisecontrol -l
● cruisecontrol.service - Zookeeper
Loaded: loaded (/etc/systemd/system/cruisecontrol.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-07-05 10:46:46 UTC; 44min ago
Main PID: 13241 (cc.sh)
CGroup: /system.slice/cruisecontrol.service
├─13241 /bin/bash /usr/local/bin/cc.sh
└─13243 java -Xmx1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=./logs -Dlog4j.configurationFile=file:./config/log4j.properties -cp ./cruise-control/build/dependant-libs/:./cruise-control/build/libs/:./cruise-control-metrics-reporter/build/libs/* com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain config/cruisecontrol.properties

Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] INFO Finished sampling from topic __CruiseControlMetrics for partitions [0] in time range [1625484537241,1625484657241]. Collected 526 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] WARN Broker 2 is missing 4/4 topics metrics and 39/39 leader partition metrics. Missing leader topics: [__KafkaCruiseControlPartitionMetricSamples, test, __KafkaCruiseControlModelTrainingSamples, __consumer_offsets]. (com.linkedin.kafka.cruisecontrol.monitor.sampling.holder.BrokerLoad)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,248] WARN Skip generating metric sample for broker 2 because the following required metrics are missing [BROKER_PRODUCE_LOCAL_TIME_MS_MAX, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MEAN, ALL_TOPIC_PRODUCE_REQUEST_RATE, ALL_TOPIC_MESSAGES_IN_PER_SEC, BROKER_PRODUCE_TOTAL_TIME_MS_MEAN, ALL_TOPIC_FETCH_REQUEST_RATE, BROKER_FOLLOWER_FETCH_REQUEST_RATE, ALL_TOPIC_REPLICATION_BYTES_OUT, BROKER_PRODUCE_TOTAL_TIME_MS_MAX, ALL_TOPIC_REPLICATION_BYTES_IN, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MAX, ALL_TOPIC_BYTES_IN, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MAX, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MEAN, ALL_TOPIC_BYTES_OUT, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_REQUEST_QUEUE_SIZE, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MAX, BROKER_RESPONSE_QUEUE_SIZE, BROKER_PRODUCE_LOCAL_TIME_MS_MEAN, BROKER_REQUEST_HANDLER_AVG_IDLE_PERCENT, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN]. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingUtils)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO Generated 79(39 skipped by broker {2=39}) partition metric samples and 2(1 skipped) broker metric samples for timestamp 1625484656792. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsProcessor)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO PARTITION Aggregator rolled out 1 new windows, reset 1 windows, current window range [1625484600000, 1625484900000], abandon 237 samples. (com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO Collected 79 partition metric samples for 79 partitions. Total partition assigned: 118. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,249] INFO BROKER Aggregator rolled out 1 new windows, reset 1 windows, current window range [1625478900000, 1625484900000], abandon 0 samples. (com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,255] INFO Collected 2 broker metric samples for 2 brokers. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher)
Jul 05 11:30:57 kafka-2 cc.sh[13241]: [2021-07-05 11:30:57,267] INFO Finished sampling in 26 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager)
Jul 05 11:30:58 kafka-2 cc.sh[13241]: [2021-07-05 11:30:58,408] INFO Skipping proposal precomputing because load monitor does not have enough snapshots. (com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer)
[root@kafka-2 kafka]#

@rahu7624
Copy link
Author

rahu7624 commented Jul 5, 2021

Also it show RF anomaly for cruisecontrol topics.

[root@kafka-2 kafka]# curl 'http://localhost:9090/kafkacruisecontrol/state'
MonitorState: {state: RUNNING(11.600% trained), NumValidWindows: (0/1) (0.000%), NumValidPartitions: 79/118 (66.949%), flawedPartitions: 0}
ExecutorState: {state: NO_TASK_IN_PROGRESS}
AnalyzerState: {isProposalReady: false, readyGoals: [ReplicaDistributionGoal, RackAwareGoal, TopicReplicaDistributionGoal, LeaderReplicaDistributionGoal, ReplicaCapacityGoal]}
AnomalyDetectorState: {selfHealingEnabled:[BROKER_FAILURE, DISK_FAILURE, GOAL_VIOLATION, METRIC_ANOMALY, TOPIC_ANOMALY, MAINTENANCE_EVENT], selfHealingDisabled:[], selfHealingEnabledRatio:{BROKER_FAILURE=1.0, DISK_FAILURE=1.0, GOAL_VIOLATION=1.0, METRIC_ANOMALY=1.0, TOPIC_ANOMALY=1.0, MAINTENANCE_EVENT=1.0}, recentGoalViolations:[], recentBrokerFailures:[], recentMetricAnomalies:[], recentDiskFailures:[], recentTopicAnomalies:[{description={Topics with replication factor violations: [{With desired RF 3: [{__KafkaCruiseControlModelTrainingSamples(100.00)}, {__CruiseControlMetrics(100.00)}, {__consumer_offsets(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=e8ee6abe-cfb5-42c7-9daa-1e9293e49692, detectionDate=2021-07-05T11:20:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:20:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}, {__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=cb615220-3a97-4670-a320-5a7e66612879, detectionDate=2021-07-05T11:16:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:16:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=35dc9570-8750-4baa-a2f3-4c2c641b51e0, detectionDate=2021-07-05T11:32:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:32:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=3320092c-2c2b-471e-949f-f7137b580de4, detectionDate=2021-07-05T11:28:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:28:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=07ac8022-e5cf-4d1c-9a99-2f47cfa8b476, detectionDate=2021-07-05T11:30:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:30:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__consumer_offsets(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=28339b49-77db-4ed3-9ba2-31920954b398, detectionDate=2021-07-05T11:34:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:34:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}]}]}, anomalyId=05bda5cc-693e-4072-a91b-05294bbb5e58, detectionDate=2021-07-05T11:22:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:22:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__KafkaCruiseControlModelTrainingSamples(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__CruiseControlMetrics(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=334304d1-1181-4702-827e-ff37a91cd436, detectionDate=2021-07-05T11:18:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:18:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__consumer_offsets(100.00)}, {__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}]}]}, anomalyId=1be80a1d-5124-494c-81e3-ed4c038991aa, detectionDate=2021-07-05T11:24:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:24:04Z}, {description={Topics with replication factor violations: [{With desired RF 3: [{__KafkaCruiseControlPartitionMetricSamples(100.00)}, {__CruiseControlMetrics(100.00)}, {__KafkaCruiseControlModelTrainingSamples(100.00)}, {__consumer_offsets(100.00)}]}]}, anomalyId=b82cd436-d121-4bb1-ac12-1d907939c92a, detectionDate=2021-07-05T11:26:04Z, status=COMPLETENESS_NOT_READY, statusUpdateDate=2021-07-05T11:26:04Z}], recentMaintenanceEvents:[], metrics:{meanTimeBetweenAnomalies:{GOAL_VIOLATION:0.00 milliseconds, BROKER_FAILURE:0.00 milliseconds, METRIC_ANOMALY:0.00 milliseconds, DISK_FAILURE:0.00 milliseconds, TOPIC_ANOMALY:8.29 milliseconds}, meanTimeToStartFix:0.00 milliseconds, numSelfHealingStarted:0, numSelfHealingFailedToStart:0, ongoingAnomalyDuration=47.01 minutes}, ongoingSelfHealingAnomaly:None, balancednessScore:100.000}

[root@kafka-2 kafka]#

@efeg
Copy link
Collaborator

efeg commented Jul 15, 2021

WARN Broker 2 is missing 4/4 topics metrics and 39/39 leader partition metrics. Missing leader topics: [__KafkaCruiseControlPartitionMetricSamples, test, __KafkaCruiseControlModelTrainingSamples, __consumer_offsets].

and then

INFO Generated 79(39 skipped by broker {2=39}) partition metric samples and 2(1 skipped) broker metric samples for timestamp 1625484656792.

implies that broker 2 was not configured properly. If broker 2 is configured later, then eventually CC will be able to collect samples from all brokers and will roll out a window -- i.e. MonitorState will show NumValidWindows: (1/1).

Also it show RF anomaly for cruisecontrol topics.

This is independent of the issue we discussed above. It says that "desired replication factor" config is set to 3, but the listed topics have an RF different from the desired RF. You can set the desired replication factor in a cluster using self.healing.target.topic.replication.factor config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A code or meta question about the project.
Projects
None yet
Development

No branches or pull requests

2 participants