-
Notifications
You must be signed in to change notification settings - Fork 926
Add Benchmark Framework for ducktape #2030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive benchmark framework for Kafka producer testing in the ducktape test suite. The framework provides real-time performance metrics collection, validation against configurable bounds, and detailed reporting capabilities.
- Implements a complete MetricsCollector system with latency tracking, memory monitoring, and throughput analysis
- Enhances all existing ducktape tests with integrated benchmark metrics without breaking changes
- Adds configurable performance bounds validation with realistic thresholds (1k+ msg/s throughput)
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
File | Description |
---|---|
tests/ducktape/benchmark_metrics.py |
New comprehensive benchmark framework with MetricsCollector, MetricsBounds, and reporting utilities |
tests/ducktape/test_producer.py |
Enhanced all producer tests with integrated metrics collection and validation |
tests/ducktape/README.md |
Updated documentation to reflect new metrics capabilities and additional psutil dependency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
# Use quantiles for P95, P99 (more accurate than custom implementation) | ||
try: | ||
quantiles = statistics.quantiles(self.delivery_latencies, n=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing quantiles with n=100 for every summary is computationally expensive. Consider using a more efficient approach like numpy.percentile or caching the sorted data.
Copilot uses AI. Check for mistakes.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. I was debating if we should use something like locust for this.. might be worth switching to down the road but you kind of have to hack it to do any non-RESTful patterns for testing. e.g. https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/kafka_ex.py
tests/ducktape/benchmark_metrics.py
Outdated
except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess): | ||
# Handle edge cases where process might not exist or be accessible | ||
return None | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not catch generic Exception here and just let it boil up to be remediated
return None | ||
|
||
|
||
class MetricsBounds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO: load from config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in commit eb493bb
tests/ducktape/test_producer.py
Outdated
latency_ms = (time.time() - send_times[msg_key]) * 1000 | ||
del send_times[msg_key] # Clean up | ||
else: | ||
latency_ms = 5.0 # Default latency if timing info not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to just set to 0 or None
Let's touch up small things, get a merge then iterate / change things if we want later. I want to get this into the history so we can build abstractions above for simpler test definitions and swap the implementation details as needed / remove conflicts on future PRs |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work :) Just left some questions
} | ||
|
||
return { | ||
# Basic metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the basic vs enhanced classification coming from some other source (e.g. some client benchmarking guides)? The list LGTM but just curious :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were no guidelines i refered to, it made sense to me having basic metrics and enhances metrics segregated for ease of reviewer.
Later on we can further divide these metrics in code/comments as "latency", "throughput", "message delivery" etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
## Configuration | ||
|
||
Performance bounds are loaded from a JSON config file. By default, it loads `benchmark_bounds.json`, but you can override this with the `BENCHMARK_BOUNDS_CONFIG` environment variable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is default benchmark_bounds.json going to be added in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added bounds in latest commit d0f4793
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing minor comments. Let's move anything additional that's not a fix of a glaring issue to future PRs to unblock the history
What
Key Features:
Metrics Collected:
statistics.quantiles()
psutil
Files Added:
tests/ducktape/benchmark_metrics.py
- Complete benchmark frameworkFiles Modified:
tests/ducktape/test_producer.py
- Enhanced all tests with integrated metricstests/ducktape/README.md
- Updated documentationChecklist
References
Test & Review
# Run enhanced ducktape tests with integrated benchmarks ./tests/ducktape/run_ducktape_test.py