Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-3.0: [Bug](join) return eof when join build sink awakend by downstream source #47380 #47455

Open
wants to merge 1 commit into
base: branch-3.0
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #47380

…rce (#47380)

### What problem does this PR solve?
1. return eof when join build sink awakend by downstream source to avoid
HashJoinBuildSinkLocalState::close meet error.

![QQ_1737641060365](https://github.com/user-attachments/assets/8b8ddc15-7616-45ca-8afa-8895df21b52c)
2. add WakeUpEarlyReason to profile
3. add debug point `Pipeline::make_all_runnable.sleep` to reproduce
problem in regression test
```cpp
Exception in inverted_index_p0/ssb_unique_sql_zstd/sql/q4.3.sql:
java.lang.IllegalStateException: exceptions : exception : errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]rf process meet error: [E6] bf not inited and not ignored/disabled, rf: RuntimeFilter: (id = 0, type = bloomfilter, is_broadcast: true, ignored: false, disabled: false, build_bf_cardinality: true, dependency: none, synced_size: -1, has_local_target: true, has_remote_target: false, error_msg: []
  0#  doris::Exception::Exception(int, std::basic_string_view<char, std::char_traits<char> > const&) at /root/doris/be/src/common/exception.cpp:29
  1#  doris::Exception::Exception<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(int, std::basic_string_view<char, std::char_traits<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
  2#  doris::IRuntimeFilter::signal() at /root/doris/be/src/exprs/runtime_filter.cpp:610
  3#  doris::IRuntimeFilter::publish(doris::RuntimeState*, bool)::$_1::operator()(std::shared_ptr<doris::RuntimePredicateWrapper>, bool, unsigned long) const at /root/doris/be/src/exprs/runtime_filter.cpp:0
  4#  doris::IRuntimeFilter::publish(doris::RuntimeState*, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
  5#  doris::VRuntimeFilterSlots::publish(doris::RuntimeState*, bool) at /root/doris/be/src/exprs/runtime_filter_slots.h:0
  6#  doris::pipeline::HashJoinBuildSinkLocalState::close(doris::RuntimeState*, doris::Status) at /root/doris/be/src/pipeline/exec/hashjoin_build_sink.cpp:173
  7#  doris::pipeline::DataSinkOperatorXBase::close(doris::RuntimeState*, doris::Status) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
  8#  doris::pipeline::PipelineTask::close(doris::Status, bool) at /root/doris/be/src/common/status.h:390
  9#  doris::pipeline::_close_task(doris::pipeline::PipelineTask*, doris::Status) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
  10# doris::pipeline::TaskScheduler::_do_work(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
  11# doris::ThreadPool::dispatch_thread() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/move.h:206
  12# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:563
  13# ?
  14# __clone
```
@github-actions github-actions bot requested a review from dataroaring as a code owner January 26, 2025 07:07
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 26, 2025
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41061 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 87b9998f8a8ebe1948c311f51f19ddde17ebc9b2, data reload: false

------ Round 1 ----------------------------------
q1	17579	7470	7303	7303
q2	2078	182	173	173
q3	10551	1142	1147	1142
q4	10220	771	714	714
q5	7738	2904	2790	2790
q6	240	145	144	144
q7	991	615	600	600
q8	9372	1961	2016	1961
q9	6579	6437	6437	6437
q10	7015	2318	2316	2316
q11	471	265	264	264
q12	405	208	202	202
q13	17774	2993	3052	2993
q14	266	217	207	207
q15	572	517	526	517
q16	666	603	587	587
q17	988	575	576	575
q18	7398	6633	6789	6633
q19	1406	1139	1063	1063
q20	474	205	195	195
q21	3985	3245	3268	3245
q22	1103	1000	1029	1000
Total cold run time: 107871 ms
Total hot run time: 41061 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7263	7230	7213	7213
q2	327	235	227	227
q3	2924	2928	2925	2925
q4	2035	1848	1801	1801
q5	5728	5761	5812	5761
q6	216	136	140	136
q7	2275	1870	1801	1801
q8	3254	3500	3520	3500
q9	8983	8908	8853	8853
q10	3637	3578	3661	3578
q11	590	494	499	494
q12	801	602	604	602
q13	10129	3189	3141	3141
q14	301	277	268	268
q15	580	517	543	517
q16	699	658	655	655
q17	1852	1636	1610	1610
q18	8343	7737	7492	7492
q19	1711	1586	1636	1586
q20	2135	1866	1879	1866
q21	5556	5393	5330	5330
q22	1154	1084	1051	1051
Total cold run time: 70493 ms
Total hot run time: 60407 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197714 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 87b9998f8a8ebe1948c311f51f19ddde17ebc9b2, data reload: false

query1	1321	937	921	921
query2	6229	1991	2054	1991
query3	10946	4446	4301	4301
query4	66012	29191	23536	23536
query5	4989	456	447	447
query6	415	188	170	170
query7	5639	320	310	310
query8	308	233	228	228
query9	9194	2684	2675	2675
query10	468	271	255	255
query11	17815	15104	15657	15104
query12	159	102	104	102
query13	1551	449	459	449
query14	10179	7601	7294	7294
query15	206	184	177	177
query16	7204	441	513	441
query17	1076	593	607	593
query18	1935	361	345	345
query19	234	176	176	176
query20	133	120	117	117
query21	200	111	109	109
query22	4935	4388	4580	4388
query23	34496	34150	34307	34150
query24	6182	2981	2962	2962
query25	537	428	425	425
query26	669	170	171	170
query27	2054	353	359	353
query28	4252	2493	2435	2435
query29	702	467	429	429
query30	239	161	166	161
query31	1010	842	864	842
query32	73	56	64	56
query33	406	305	294	294
query34	915	511	520	511
query35	841	742	731	731
query36	1088	968	984	968
query37	122	76	75	75
query38	4088	4102	4138	4102
query39	1456	1434	1476	1434
query40	196	101	96	96
query41	48	45	47	45
query42	114	100	100	100
query43	535	507	490	490
query44	1157	837	808	808
query45	182	162	164	162
query46	1105	730	707	707
query47	1934	1837	1845	1837
query48	456	393	398	393
query49	725	403	386	386
query50	813	402	411	402
query51	7301	7140	7075	7075
query52	103	91	89	89
query53	258	182	181	181
query54	551	450	474	450
query55	76	75	78	75
query56	266	240	240	240
query57	1208	1106	1105	1105
query58	214	211	210	210
query59	3306	2860	3083	2860
query60	271	251	250	250
query61	107	104	109	104
query62	857	774	742	742
query63	217	191	187	187
query64	1359	677	638	638
query65	3299	3217	3194	3194
query66	724	306	297	297
query67	15844	15670	15645	15645
query68	3635	585	578	578
query69	425	277	275	275
query70	1225	1164	1134	1134
query71	363	254	256	254
query72	6222	3967	3932	3932
query73	755	343	344	343
query74	10324	8947	9032	8947
query75	3343	2649	2632	2632
query76	1852	1043	1141	1043
query77	485	276	274	274
query78	10578	9585	9618	9585
query79	1432	592	592	592
query80	866	424	443	424
query81	507	247	244	244
query82	1266	120	117	117
query83	232	151	149	149
query84	285	85	72	72
query85	893	319	310	310
query86	341	305	303	303
query87	4516	4292	4465	4292
query88	3545	2396	2352	2352
query89	418	293	298	293
query90	2019	186	185	185
query91	184	155	150	150
query92	65	49	52	49
query93	1737	559	573	559
query94	838	296	301	296
query95	362	264	256	256
query96	614	278	281	278
query97	3323	3192	3179	3179
query98	217	203	194	194
query99	1679	1438	1428	1428
Total cold run time: 318050 ms
Total hot run time: 197714 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 87b9998f8a8ebe1948c311f51f19ddde17ebc9b2, data reload: false

query1	0.03	0.04	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.53	0.50	0.51
query6	1.13	0.73	0.73
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.56	0.51	0.51
query10	0.55	0.54	0.54
query11	0.14	0.10	0.10
query12	0.14	0.12	0.10
query13	0.60	0.60	0.59
query14	2.76	2.85	2.77
query15	0.90	0.83	0.82
query16	0.38	0.38	0.39
query17	1.06	1.05	1.02
query18	0.24	0.22	0.21
query19	1.86	1.70	2.01
query20	0.02	0.02	0.01
query21	15.37	0.60	0.59
query22	2.40	2.84	1.88
query23	17.45	0.89	0.88
query24	3.24	0.90	1.56
query25	0.23	0.14	0.04
query26	0.57	0.13	0.12
query27	0.04	0.05	0.04
query28	10.08	1.10	1.08
query29	12.57	3.17	3.21
query30	0.24	0.07	0.06
query31	2.86	0.38	0.38
query32	3.27	0.46	0.46
query33	2.99	2.99	3.03
query34	16.98	4.54	4.57
query35	4.59	4.55	4.54
query36	0.67	0.47	0.48
query37	0.10	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.03	0.02
query40	0.15	0.13	0.12
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.91 s
Total hot run time: 32.71 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants