Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

Open
2 of 3 tasks
DA1OOO opened this issue Jan 22, 2025 · 2 comments
Open
2 of 3 tasks

[Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% #47308

DA1OOO opened this issue Jan 22, 2025 · 2 comments

Comments

@DA1OOO
Copy link

DA1OOO commented Jan 22, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris 2.0.8

What's Wrong?

1 ssd disk usage hit to 95%+
Image
other ssd disk:
/dev/nvme0n1 3.4T 1.5T 2.0T 43% /ssd1
/dev/nvme1n1 3.4T 3.3T 100G 98% /ssd2
/dev/nvme2n1 3.4T 1.7T 1.8T 49% /ssd3
/dev/nvme3n1 3.4T 1.7T 1.8T 48% /ssd4

After that, I execute sql ADMIN CLEAN TRASH and ADMIN REBALANCE DISK ON ("");, try to delete trash and rebalance disks in one backend.

However, this approach seems to have no effect. After an hour, the disk usage rate did not show any significant decrease.

Then, I found some errors:

  • show /proc/cluster_balance/history_tablets
    Image

  • be log:
    2025-01-22 14:26:10,783 WARN (thrift-server-pool-21473|815893) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:10.252.158.45, be_port:9060, http_port:8040), task_type:STORAGE_MEDIUM_MIGRATE, signature:33701465, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(10.252.158.45)[INTERNAL_ERROR]tablet is already on specified storage medium 0]))

What You Expected?

Is this a bug? Is there a quick way to solve it?

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@DA1OOO DA1OOO changed the title [Bug] BE disks can not balance correctly. max: 95%+, min: 40+% [Bug] Disks in one BE can not balance correctly. max: 95%+, min: 40+% Jan 22, 2025
@DA1OOO
Copy link
Author

DA1OOO commented Jan 23, 2025

Every other machine has a great difference, 90% at most and 30% at least. It seems that the data is always written to the same disk, and the data fluctuation of other disks is very small.

Image

Image

@DA1OOO
Copy link
Author

DA1OOO commented Jan 23, 2025

I want to modify FE config to stop load in hive usage SSD, but I find LOAD task all failed, because it try to write on a same SSD(the high usage percent disk).
ADMIN SET FRONTEND CONFIG ("storage_flood_stage_usage_percent" = "95"); ADMIN SET FRONTEND CONFIG ("storage_flood_stage_left_capacity_bytes" = "186916976721");

3 tries error:
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.
error msg: type:ETL_RUN_FAIL; msg:errCode = 2, detailMessage = disk /ssd1/doris-data on backend 12290 exceed limit usage, path hash: -8807881924367419279.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant