Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog kafka_consumer check spends lot of time on closing consumers #19564

Open
arshdeeptinna opened this issue Feb 6, 2025 · 0 comments
Open

Comments

@arshdeeptinna
Copy link

Hi,
We have been running an older version of kafka_consumer (2.16.4) because we had some issues with metrics when we tried to upgrade before. We finally tried again and rolled out 6.1.0 an immediately noticed that the check takes much longer as compared to the previous version.

   kafka_consumer (2.16.4)
    -----------------------
      Instance ID: kafka_consumer:571393c725b533f7 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
      Total Runs: 6,899
      Metric Samples: Last Run: 1,536, Total: 10,564,681
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 145ms
      Last Execution Date : 2025-02-04 20:28:53 UTC (1738700933000)
      Last Successful Execution Date : 2025-02-04 20:28:53 UTC (1738700933000)
      metadata:
    kafka_consumer (6.1.0)
    ----------------------
      Instance ID: kafka_consumer:bd5fc7437860fb15 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
      Total Runs: 1,297
      Metric Samples: Last Run: 826, Total: 1,071,322
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 16.344s
      Last Execution Date : 2025-02-04 20:28:55 UTC (1738700935000)
      Last Successful Execution Date : 2025-02-04 20:28:55 UTC (1738700935000)

On investigating we noticed that the check spends a lot of time opening and closing consumer groups (roughly 11 out of 16 seconds) in get_highwater_offsets method. I made a quick fix basically tracking the current consumer and only opening one if different from current consumer and that brought the execution time to 5-6 seconds.

Modified

    kafka_consumer (6.1.0)
    ----------------------
      Instance ID: kafka_consumer:bd5fc7437860fb15 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kafka_consumer.d/conf.yaml
      Total Runs: 3
      Metric Samples: Last Run: 826, Total: 2,478
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 4.638s
      Last Execution Date : 2025-02-06 15:21:46 UTC (1738855306000)
      Last Successful Execution Date : 2025-02-06 15:21:46 UTC (1738855306000)

There might certainly be a better way to bring down the execution time but if you are ok I can definitely create a PR with the quick fix. In it's current form we can't run this check in our production environment as it might a few minutes to run each check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant