Zpool free space reservation tunable #17024

sempervictus · 2025-02-04T15:54:56Z

Describe the feature would like to see added to OpenZFS

The level at which ZFS reports ENOSPC should be tunable by users: 96% hard-coded wastes a massive amount of space:

NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
somepool  3.62T  3.50T   130G        -         -    45%    96%  1.00x    ONLINE  -

How will this feature improve OpenZFS?

Reduce amount of money users waste on (especially fast) storage capacity, extend utility of capacity owned/leased.

Additional context

Having a heuristic calculation of min free space required to remove snapshots in order to free more space and unblock operations would be great here but a pool-wide setting along the lines of min_mb_free would probably suffice.

The text was updated successfully, but these errors were encountered:

amotin · 2025-02-04T16:07:18Z

It might be discussed how much we should reserve for administrative purposes, but even with the current value it might be difficult to promise that some arbitrary deletion will free enough/any space on a pool to recover and not get stuck. I am not sure it worth the efforts to fight for those few percent. Developers time costs more.

Meanwhile take a note that those 130GB you have free already have 45% fragmentation, which means average free block size is about 192KB. It might be a performance problem already, and very soon ZFS will have to massively use ganging for large blocks to write anything at all. Don't go there!

sempervictus · 2025-02-05T06:54:32Z

Thanks @amotin - unfortunately we haven't had the resources to dedicate to our ZFS TODO list which includes figuring out a defragmentation strategy even if via something like zed "using internal send" shenanigans to create new contiguous trees. This is like 9mo of use too :)

130G of wasted space is pretty brutal though and while development time for the heuristic part might not be worth it, a tunable to allow users to reserve a rational amount of free space for the ability to free up more seems reasonable. On a 32T nvme we will lose over a TB and drives are just going to keep getting bigger so a static value will "keep getting worse" relative to overall VDEV capacities.

amotin · 2025-02-05T13:10:46Z

With copy-on-write design fragmentation is expected, simply because even if you don't delete/overwrite data, some metadata are deleted/overwritten on each transaction, leaving some free holes behind. By the time your pool is that empty, all free space you have consists of those holes and all we can do is try to minimize those effects, for example with introduction of embeded log allocation class and other optimizations from developer side or keeping more free space from user side.

shodanshok · 2025-02-08T11:34:37Z

Don't we already have spa_slop_shift (default 5) to tune how much free space should be reserved?

sempervictus added the Type: Feature Feature request or new feature label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zpool free space reservation tunable #17024

Zpool free space reservation tunable #17024

sempervictus commented Feb 4, 2025

amotin commented Feb 4, 2025

sempervictus commented Feb 5, 2025

amotin commented Feb 5, 2025

shodanshok commented Feb 8, 2025

Zpool free space reservation tunable #17024

Zpool free space reservation tunable #17024

Comments

sempervictus commented Feb 4, 2025

Describe the feature would like to see added to OpenZFS

How will this feature improve OpenZFS?

Additional context

amotin commented Feb 4, 2025

sempervictus commented Feb 5, 2025

amotin commented Feb 5, 2025

shodanshok commented Feb 8, 2025