Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zpool free space reservation tunable #17024

Open
sempervictus opened this issue Feb 4, 2025 · 4 comments
Open

Zpool free space reservation tunable #17024

sempervictus opened this issue Feb 4, 2025 · 4 comments
Labels
Type: Feature Feature request or new feature

Comments

@sempervictus
Copy link
Contributor

Describe the feature would like to see added to OpenZFS

The level at which ZFS reports ENOSPC should be tunable by users: 96% hard-coded wastes a massive amount of space:

NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
somepool  3.62T  3.50T   130G        -         -    45%    96%  1.00x    ONLINE  -

How will this feature improve OpenZFS?

Reduce amount of money users waste on (especially fast) storage capacity, extend utility of capacity owned/leased.

Additional context

Having a heuristic calculation of min free space required to remove snapshots in order to free more space and unblock operations would be great here but a pool-wide setting along the lines of min_mb_free would probably suffice.

@sempervictus sempervictus added the Type: Feature Feature request or new feature label Feb 4, 2025
@amotin
Copy link
Member

amotin commented Feb 4, 2025

It might be discussed how much we should reserve for administrative purposes, but even with the current value it might be difficult to promise that some arbitrary deletion will free enough/any space on a pool to recover and not get stuck. I am not sure it worth the efforts to fight for those few percent. Developers time costs more.

Meanwhile take a note that those 130GB you have free already have 45% fragmentation, which means average free block size is about 192KB. It might be a performance problem already, and very soon ZFS will have to massively use ganging for large blocks to write anything at all. Don't go there!

@sempervictus
Copy link
Contributor Author

Thanks @amotin - unfortunately we haven't had the resources to dedicate to our ZFS TODO list which includes figuring out a defragmentation strategy even if via something like zed "using internal send" shenanigans to create new contiguous trees. This is like 9mo of use too :)

130G of wasted space is pretty brutal though and while development time for the heuristic part might not be worth it, a tunable to allow users to reserve a rational amount of free space for the ability to free up more seems reasonable. On a 32T nvme we will lose over a TB and drives are just going to keep getting bigger so a static value will "keep getting worse" relative to overall VDEV capacities.

@amotin
Copy link
Member

amotin commented Feb 5, 2025

With copy-on-write design fragmentation is expected, simply because even if you don't delete/overwrite data, some metadata are deleted/overwritten on each transaction, leaving some free holes behind. By the time your pool is that empty, all free space you have consists of those holes and all we can do is try to minimize those effects, for example with introduction of embeded log allocation class and other optimizations from developer side or keeping more free space from user side.

@shodanshok
Copy link
Contributor

Don't we already have spa_slop_shift (default 5) to tune how much free space should be reserved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

3 participants