Description
Drill version:
1.22.0-SNAPSHOT
Describe the bug
the mergejoin memery leak.
To Reproduce
- Configure direct memory, In the drill-env.sh file in the conf directory add:
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"1G"} - drill hive storage plugin configuration:
{
"type": "hive",
"configProps": {
"hive.metastore.sasl.enabled": "false",
"fs.default.name": "hdfs://xxx:9000",
"datanucleus.schema.autoCreateAll": "true",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://xxx:3306/tpch?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8&useSSL=false",
"javax.jdo.option.ConnectionUserName": "root",
"javax.jdo.option.ConnectionPassword": "",
"hive.metastore.uris": "thrift://xxx:9083",
"hive.metastore.schema.verification": "false",
"hive.metastore.warehouse.dir": "/user/hive/warehouse",
"hive.jdbc.driver": "org.apache.hive.jdbc.HiveDriver",
"hive.jdbc.url": "jdbc:hive2://xxx:10000",
"hive.jdbc.user": "root",
"hive.jdbc.password": "",
"hive.dml.mode": "jdbc",
"writable": "true"
},
"enabled": true,
"authMode": "SHARED_USER"
} - Create eight tpch tables, use tpch tool to generate 1s data, and import data into hive. See the attachment for details.
- Write the execution script, run the tpche sql statements in 5 concurrency:
shell script:
[root@jupiterdev memeryleak]# cat para_power_test.sh
#!/bin/bash
drill_home=/opt/apache-drill-1.22.0-SNAPSHOT/bin
random_sql(){
while true
do
$drill_home/sqlline -u "jdbc:drill:zk=localhost:2181" -f tpch_sql8.sql >> log/tpch1s_sql8.log 2>&1
done
}
main(){
unset HADOOP_CLASSPATH
#sleep 2h
#TPCH power test
for i in seq 1 5
do
random_sql &
done
}
main $@
the sql as follows:
select
o_year,
sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share
from (
select
extract(year from o_orderdate) as o_year,
l_extendedprice * 1.0 as volume,
n2.n_name as nation
from hive.tpch.part, hive.tpch.supplier, hive.tpch.lineitem, hive.tpch.orders, hive.tpch.customer, hive.tpch.nation n1, hive.tpch.nation n2, hive.tpch.region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01'
and date '1996-12-31'
and p_type = 'LARGE BRUSHED BRASS') as all_nations
group by o_year
order by o_year;
5. Execute the script and view the sqlline running status:
[root@jupiterdev memeryleak]# ./para_power_test.sh
[root@jupiterdev memeryleak]# jps | grep SqlLine
5638 SqlLine
5799 SqlLine
5705 SqlLine
5709 SqlLine
5584 SqlLine
- View drill's direct memory through the 8047 page:


- After running for a while, an sql statement error is detected, and then pkill -f para_power_test.sh
- Check again that there is no sqlline execution, bug the direct memery is not fully release.

Expected behavior
Expected that the direct memery can fully release.
Additional context
update the planner.enable_hashjoin option to false through the 8047 page
hive.txt