Skip to content

Conversation

k-raina
Copy link
Member

@k-raina k-raina commented Aug 21, 2025

What

Key Features:

  • MetricsCollector: Real-time performance metrics collection with latency tracking, memory monitoring, and throughput analysis
  • MetricsBounds: Configurable performance thresholds with automatic validation
  • Enhanced Tests: All existing ducktape tests now include integrated benchmark metrics
  • Rich Reporting: Detailed performance reports with P50/P95/P99 latencies, memory usage, and batch efficiency

Metrics Collected:

  • Throughput: Send/delivery rates (msg/s, MB/s) with realistic bounds (1k+ msg/s)
  • Latency: P50/P95/P99 percentiles using Python's statistics.quantiles()
  • Memory: Peak usage and growth tracking via psutil
  • Efficiency: Messages per poll, buffer utilization, per-topic/partition breakdowns
  • Reliability: Success/error rates with comprehensive validation

Files Added:

  • tests/ducktape/benchmark_metrics.py - Complete benchmark framework

Files Modified:

  • tests/ducktape/test_producer.py - Enhanced all tests with integrated metrics
  • tests/ducktape/README.md - Updated documentation

Checklist

  • Contains customer facing changes? Including API/behavior changes
    • No breaking changes - all existing tests enhanced with metrics, not replaced
  • Did you add sufficient unit test and/or integration test coverage for this PR?
    • Yes - all existing ducktape tests now include comprehensive metrics validation
    • Validated with 348k+ msg/s throughput and sub-100ms P95 latency

References

Test & Review

# Run enhanced ducktape tests with integrated benchmarks
./tests/ducktape/run_ducktape_test.py

@Copilot Copilot AI review requested due to automatic review settings August 21, 2025 12:44
@k-raina k-raina requested review from MSeal and a team as code owners August 21, 2025 12:44
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive benchmark framework for Kafka producer testing in the ducktape test suite. The framework provides real-time performance metrics collection, validation against configurable bounds, and detailed reporting capabilities.

  • Implements a complete MetricsCollector system with latency tracking, memory monitoring, and throughput analysis
  • Enhances all existing ducktape tests with integrated benchmark metrics without breaking changes
  • Adds configurable performance bounds validation with realistic thresholds (1k+ msg/s throughput)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
tests/ducktape/benchmark_metrics.py New comprehensive benchmark framework with MetricsCollector, MetricsBounds, and reporting utilities
tests/ducktape/test_producer.py Enhanced all producer tests with integrated metrics collection and validation
tests/ducktape/README.md Updated documentation to reflect new metrics capabilities and additional psutil dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


# Use quantiles for P95, P99 (more accurate than custom implementation)
try:
quantiles = statistics.quantiles(self.delivery_latencies, n=100)
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing quantiles with n=100 for every summary is computationally expensive. Consider using a more efficient approach like numpy.percentile or caching the sorted data.

Copilot uses AI. Check for mistakes.

@sonarqube-confluent

This comment has been minimized.

@sonarqube-confluent

This comment has been minimized.

Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. I was debating if we should use something like locust for this.. might be worth switching to down the road but you kind of have to hack it to do any non-RESTful patterns for testing. e.g. https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/kafka_ex.py

except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
# Handle edge cases where process might not exist or be accessible
return None
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not catch generic Exception here and just let it boil up to be remediated

return None


class MetricsBounds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO: load from config file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in commit eb493bb

latency_ms = (time.time() - send_times[msg_key]) * 1000
del send_times[msg_key] # Clean up
else:
latency_ms = 5.0 # Default latency if timing info not available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to just set to 0 or None

@MSeal
Copy link
Contributor

MSeal commented Aug 24, 2025

Let's touch up small things, get a merge then iterate / change things if we want later. I want to get this into the history so we can build abstractions above for simpler test definitions and swap the implementation details as needed / remove conflicts on future PRs

@k-raina k-raina requested a review from MSeal August 25, 2025 16:34
@sonarqube-confluent

This comment has been minimized.

Copy link
Member

@fangnx fangnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work :) Just left some questions

}

return {
# Basic metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the basic vs enhanced classification coming from some other source (e.g. some client benchmarking guides)? The list LGTM but just curious :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no guidelines i refered to, it made sense to me having basic metrics and enhances metrics segregated for ease of reviewer.

Later on we can further divide these metrics in code/comments as "latency", "throughput", "message delivery" etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense!

## Configuration

Performance bounds are loaded from a JSON config file. By default, it loads `benchmark_bounds.json`, but you can override this with the `BENCHMARK_BOUNDS_CONFIG` environment variable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default benchmark_bounds.json going to be added in the next PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added bounds in latest commit d0f4793

@sonarqube-confluent

This comment has been minimized.

@k-raina k-raina requested a review from fangnx August 26, 2025 08:45
@sonarqube-confluent
Copy link

Passed

Analysis Details

5 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 5 Code Smells

Coverage and Duplications

  • Coverage No coverage information (66.40% Estimated after merge)
  • Duplications No duplication information (5.60% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing minor comments. Let's move anything additional that's not a fix of a glaring issue to future PRs to unblock the history

@k-raina k-raina merged commit 858e77c into master Aug 27, 2025
3 checks passed
@k-raina k-raina deleted the kraina-add-benchmark-famework branch August 27, 2025 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants