-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](scan) Adaptive scan concurrency #44690
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
0bf7245
to
990b804
Compare
run buildall |
1 similar comment
run buildall |
7878089
to
1af6167
Compare
run buildall |
f7693f0
to
27963c8
Compare
run buildall |
2 similar comments
run buildall |
run buildall |
5b91373
to
5bc8071
Compare
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
MetricUnit::NANOSECONDS); | ||
|
||
SimplifiedScanScheduler::SimplifiedScanScheduler(std::string sched_name, | ||
std::shared_ptr<CgroupCpuCtl> cg_cpu_ctl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: pass by value and use std::move [modernize-pass-by-value]
be/src/vec/exec/scan/scanner_scheduler.cpp:399:
- : _is_stop(false), _cgroup_cpu_ctl(cg_cpu_ctl), _sched_name(sched_name) {
+ : _is_stop(false), _cgroup_cpu_ctl(cg_cpu_ctl), _sched_name(std::move(sched_name)) {
9a06917
to
041a17b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
MetricUnit::NANOSECONDS); | ||
|
||
SimplifiedScanScheduler::SimplifiedScanScheduler(std::string sched_name, | ||
std::shared_ptr<CgroupCpuCtl> cg_cpu_ctl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: pass by value and use std::move [modernize-pass-by-value]
be/src/vec/exec/scan/scanner_scheduler.cpp:397:
- _sched_name(sched_name) {
+ _sched_name(std::move(sched_name)) {
run buildall |
TPC-H: Total hot run time: 40800 ms
|
TPC-DS: Total hot run time: 199031 ms
|
ClickBench: Total hot run time: 33.68 s
|
run buildall |
e6b5b6c
to
01a41c7
Compare
run buildall |
TPC-H: Total hot run time: 40342 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 197062 ms
|
run build all |
run buildall |
TPC-H: Total hot run time: 31671 ms
|
TPC-DS: Total hot run time: 191141 ms
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 31795 ms
|
TPC-DS: Total hot run time: 183480 ms
|
ClickBench: Total hot run time: 30.81 s
|
…scan
run buildall |
TPC-H: Total hot run time: 31516 ms
|
TPC-DS: Total hot run time: 191029 ms
|
ClickBench: Total hot run time: 31.12 s
|
TeamCity be ut coverage result: |
run cloud_p0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Unit test for scanner schedule. Adaptive scan schedule is introduced by #44690 * ScannerContext::init * ScannerContext::_push_back_scan_task * ScannerContext::_get_margin * ScannerContext::_pull_next_scan_task * ScannerContext::_schedule_scan_task * Additional test for scan operator, make sure `adaptive_pipeline_task_serial_read_on_limit` is working correctlly. * ScannerContext::get_free_block * ScannerContext::return_free_block * ScannerContext::get_block_from_queue
Implementation of adaptive scan concurrency. We submit all scanner to scheduler at once in the past. It will introduce much problems: 1. The execution of scan task is not even between different query 2. Memory peak of scan task 3. When running scan task consumes all free block, other remaining scanner will be bubbles in scan scheduler. So that we want to adjust concurrency of scanner of each scan operator by monitoring system scan pressure. 1. Make full utilization of resource even if there is just one query. 2. Distribute resource between all scanners. Two new session variables are introduced: 1. min_scanner_concurrnency: Each scan operator will have at least min_scanner_concurrnency scanner is running; 2. min_scan_scheduler_concurrency: Minimum active threads of scan scheduler. The original num_scanner_threads is used as max_scanner_concurrency now. None 、
### What problem does this PR solve? Unit test for scanner schedule. Adaptive scan schedule is introduced by apache#44690 * ScannerContext::init * ScannerContext::_push_back_scan_task * ScannerContext::_get_margin * ScannerContext::_pull_next_scan_task * ScannerContext::_schedule_scan_task * Additional test for scan operator, make sure `adaptive_pipeline_task_serial_read_on_limit` is working correctlly. * ScannerContext::get_free_block * ScannerContext::return_free_block * ScannerContext::get_block_from_queue
What problem does this PR solve?
Implementation of adaptive scan concurrency. We submit all scanner to scheduler at once in the past. It will introduce much problems:
So that we want to adjust concurrency of scanner of each scan operator by monitoring system scan pressure.
Two new session variables are introduced:
The original num_scanner_threads is used as max_scanner_concurrency now.
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)