[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

DA1OOO · 2025-01-22T08:01:36Z

Search before asking

I had searched in the issues and found no similar issues.

Version

doris 2.0.8

What's Wrong?

1 ssd disk usage hit to 95%+

other ssd disk:
/dev/nvme0n1 3.4T 1.5T 2.0T 43% /ssd1
/dev/nvme1n1 3.4T 3.3T 100G 98% /ssd2
/dev/nvme2n1 3.4T 1.7T 1.8T 49% /ssd3
/dev/nvme3n1 3.4T 1.7T 1.8T 48% /ssd4

After that, I execute sql ADMIN CLEAN TRASH and ADMIN REBALANCE DISK ON ("");, try to delete trash and rebalance disks in one backend.

However, this approach seems to have no effect. After an hour, the disk usage rate did not show any significant decrease.

Then, I found some errors:

show /proc/cluster_balance/history_tablets
be log:
2025-01-22 14:26:10,783 WARN (thrift-server-pool-21473|815893) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:10.252.158.45, be_port:9060, http_port:8040), task_type:STORAGE_MEDIUM_MIGRATE, signature:33701465, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(10.252.158.45)[INTERNAL_ERROR]tablet is already on specified storage medium 0]))

What You Expected?

Is this a bug? Is there a quick way to solve it?

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

DA1OOO · 2025-01-23T01:19:04Z

Every other machine has a great difference, 90% at most and 30% at least. It seems that the data is always written to the same disk, and the data fluctuation of other disks is very small.

DA1OOO · 2025-01-23T03:58:29Z

I want to modify FE config to stop load in hive usage SSD, but I find LOAD task all failed, because it try to write on a same SSD(the high usage percent disk).
ADMIN SET FRONTEND CONFIG ("storage_flood_stage_usage_percent" = "95"); ADMIN SET FRONTEND CONFIG ("storage_flood_stage_left_capacity_bytes" = "186916976721");

3 tries error:
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.

DA1OOO changed the title ~~[Bug] BE disks can not balance correctly. max: 95%+, min: 40+%~~ [Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

DA1OOO commented Jan 22, 2025

DA1OOO commented Jan 23, 2025

DA1OOO commented Jan 23, 2025

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

Comments

DA1OOO commented Jan 22, 2025

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

DA1OOO commented Jan 23, 2025

DA1OOO commented Jan 23, 2025