Kubernetes: introduce first prod-ready on-premise PV #917

YuryHrytsuk · 2025-01-03T07:32:58Z

Right now we have S3-backed PV management for on-premise Kubernetes. It is doubtful to be a good solution for Postgres PV. This PR focus is to introduce PV management that can be used for Postgres

Important

Critical Services (e.g. PostgreSQL) shall avoid using Longhorn and use local disks directly and implement HA in other way (e.g. application level HA like Postgres HA / Cluster setup)

Options considered

Ceph (via Rook Ceph CSI)
An open‑source distributed storage platform that supports block, file, and object storage.
https://rook.io/
-
CEPH is complicated to manage. It is [probably] the most mature and robust solution that requires expertise to be run smoothly / robustly on PROD. Since we will try to use application-level HA (e.g. Postgres Cluster) or some other tools (e.g. directly S3), I would stick to Longhorn as it is easier to manage and seems to be robust enough (for sure enough for non-critical services like Portainer)
- Ask O+P
- Configureing rook to use "external" ceph is already complicated + requires lots of future collaboration with O+P
~~NetApp ONTAP/Trident~~
Enterprise‑grade storage integrated with Kubernetes through the NetApp Trident CSI driver.
https://www.netapp.com/solutions/kubernetes/
-
Does not seem to be very popular. Focus on other options
~~OpenEBS~~
Container‑attached storage offering flexibility and ease‑of‑deployment for on‑prem clusters.
https://openebs.io/
-
Incubating stage. Let's pick something more mature according to CNCF stages
--> Longhorn <--
A lightweight, distributed block storage system that leverages local disks with replication for high availability.
https://longhorn.io/
-
Rook Ceph seems to be more reliable https://www.reddit.com/r/kubernetes/comments/1cbggo8/longhorn_is_unreliable/
BUT https://www.reddit.com/r/kubernetes/comments/1j02w70/is_usedproperly_longhorn_productionready_in_2025/
YH picks Longhorn
~~StorageOS~~
A software‑defined storage solution designed for container environments with enterprise‑grade features.
https://www.storageos.com/
-
Does not seem to be very popular. Focus on other options
~~GlusterFS~~
An open‑source, scalable network filesystem that’s been used as a persistent storage option in many on‑prem deployments.
https://gluster.org/
-
Not actively maintained --> Is the project still alive? gluster/glusterfs#4324

Resources

NFS:

Longhorn:

Comparisons:

S3 CSI (e.g. from yandex):

https://www.reddit.com/r/kubernetes/comments/1j2l7ec/what_are_the_valid_use_cases_for_s3_csi/

YuryHrytsuk · 2025-02-27T07:43:58Z

Comparison. OpenEBS vs Longhorn vs Rook Ceph as of Feb 27, 2025

CNCF Stage

Rook --> Graduaded
OpenEBS --> Sandbox
Longhorn --> Incubating

Start date & Founder

Longhorn --> SUSE's Rancher Labs in 2017
OpenEBS --> Maydata in 2016
Rook Ceph --> OpenSource in 2016

GitHub stats

Longhorn
- 1.6k issues Open | 6732 Closed
- 6.4k Stars
OpenEBS
- 35 issues Open | 2179 Closed
- 9.2k Stars
Rook Ceph
- 88 issues Open | 5459 Closed
- 12.6k Stars

YuryHrytsuk · 2025-03-03T10:42:35Z

Longhorn resource requirements

Do we satisfy the requirements

RAM --> yes
CPU --> yes
Network --> the minimum requirement, but we need to do better

Further insights into Longhorn requirements

Because the volume replication is synchronized, and because of network latency, it is hard to do cross-region replication. The backupstore is also used as a medium to address this problem.

https://longhorn.io/docs/1.9.0/concepts/#245-crash-consistency

The network bandwidth is not sufficient. Normally 1Gbps network will only able to serve 3 volumes if all of those volumes are running a high intensive workload.

https://longhorn.io/kb/troubleshooting-volume-readonly-or-io-error/#root-causes

The network latency is relatively high. If there are multiple volumes r/w simultaneously on a node, it’s better to guarantee that the latency is less than 20ms.

https://longhorn.io/kb/troubleshooting-volume-readonly-or-io-error/#root-causes

We don’t recommend using a low IOPS disks

https://longhorn.io/kb/troubleshooting-volume-readonly-or-io-error/#root-causes

YuryHrytsuk · 2025-04-17T13:57:34Z

Reusing values across helm charts in helmfile helmfile/helmfile#2011

YuryHrytsuk mentioned this issue Jan 3, 2025

Kubernets: introduce production ready volume management #855

Open

11 tasks

YuryHrytsuk self-assigned this Jan 3, 2025

YuryHrytsuk added the p:mid-prio label Jan 3, 2025

YuryHrytsuk added this to the Singularity milestone Jan 20, 2025

sanderegg modified the milestones: Singularity, The Awakening Feb 24, 2025

YuryHrytsuk mentioned this issue Mar 4, 2025

[Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉 #979

Merged

17 tasks

YuryHrytsuk modified the milestones: The Awakening, Pauwel Kwak Apr 15, 2025

YuryHrytsuk modified the milestones: Pauwel Kwak, Bazinga! May 6, 2025

YuryHrytsuk closed this as completed in #979 May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes: introduce first prod-ready on-premise PV #917

Kubernetes: introduce first prod-ready on-premise PV #917

YuryHrytsuk commented Jan 3, 2025 •

edited

Loading

YuryHrytsuk commented Feb 27, 2025 •

edited

Loading

YuryHrytsuk commented Mar 3, 2025 •

edited

Loading

YuryHrytsuk commented Apr 17, 2025 •

edited

Loading

Kubernetes: introduce first prod-ready on-premise PV #917

Kubernetes: introduce first prod-ready on-premise PV #917

Comments

YuryHrytsuk commented Jan 3, 2025 • edited Loading

Important

Options considered

Resources

YuryHrytsuk commented Feb 27, 2025 • edited Loading

Comparison. OpenEBS vs Longhorn vs Rook Ceph as of Feb 27, 2025

CNCF Stage

Start date & Founder

GitHub stats

YuryHrytsuk commented Mar 3, 2025 • edited Loading

Longhorn resource requirements

Do we satisfy the requirements

Further insights into Longhorn requirements

YuryHrytsuk commented Apr 17, 2025 • edited Loading

YuryHrytsuk commented Jan 3, 2025 •

edited

Loading

YuryHrytsuk commented Feb 27, 2025 •

edited

Loading

YuryHrytsuk commented Mar 3, 2025 •

edited

Loading

YuryHrytsuk commented Apr 17, 2025 •

edited

Loading