Skip to content

Create cloud repo specific job #4205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dsessler7
Copy link
Contributor

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?
    • Have you added automated tests?

Type of Changes:

  • New feature
  • Bug fix
  • Documentation
  • Testing enhancement
  • Other

What is the current behavior (link to any open issues here)?

Currently, backups to cloud repos are done by exec'ing into the pg instance pod's pgbackrest sidecar and running the pgbackrest backup command there.

What is the new behavior (if this is a feature change)?

  • Breaking change (fix or feature that would cause existing functionality to change)

Backups to cloud repos will now be done in the backup job itself. If no local volume repo exists, a dedicated repo host will not be created.

Other Information:

@dsessler7 dsessler7 requested review from benjaminjb and cbandy July 15, 2025 16:37
Copy link
Contributor

@benjaminjb benjaminjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me make sure I've got this right:

  • we don't want the repohost as a single point of failure for backup/archiving
  • one way to remove that SPOF is to undo the always-on repohost strategy for cloud backups
  • but to do that we need to always have a pgbackrest server on the instance pod
  • and we need to add the server configs, but not the certs (that's still dedicated repo host only)
  • and we add all the configs to a configmap for convenience
  • and we alter the job pod command depending on whether this is a PVC or a cloud

Do I have that right and/or did I miss any steps?

"k8s.io/apimachinery/pkg/util/rand"

"github.com/crunchydata/postgres-operator/pkg/apis/postgres-operator.crunchydata.com/v1beta1"
)

var tmpDirSizeLimit = resource.MustParse("16Mi")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this size also copied over from postgrescluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... don't know if we ever want the size to vary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want the size to vary, or if it's worth commenting on "why" we chose this size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the answer, so I added a TODO comment to provide an explanation for the size.

@@ -111,17 +112,9 @@ func AddConfigToInstancePod(
// volumes stay valid and Kubernetes propagates their contents to those pods.
secret := corev1.VolumeProjection{Secret: &corev1.SecretProjection{}}
secret.Secret.Name = naming.PGBackRestSecret(cluster).Name
secret.Secret.Items = append(secret.Secret.Items, clientCertificates()...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this here?

Copy link
Contributor Author

@dsessler7 dsessler7 Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to only create the pgbackrest server in the instance if a repo host was present. Now we always create the pgbackrest server in the instance (assuming backups are enabled of course) because it is needed for both the repo host and the cloud backups to talk to it. The server not only needs a server cert and key (which are added in the AddServerToInstancePod function), but also requires the client CA cert which is provided via the clientCertificates function... So it's a little confusing because the installation of the necessary pieces for the server occur in both AddServerToInstancePod and AddConfigToInstancePod... It probably could be refactored so that the server stuff is fully handled in the former function...

@@ -1510,7 +1508,7 @@ func (r *Reconciler) reconcileInstanceCertificates(
root.Certificate, leafCert.Certificate,
leafCert.PrivateKey, instanceCerts)
}
if err == nil {
if err == nil && backupsSpecFound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking: why don't we need this if using a local volume? Because these certs have to do with talking between repo/instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what you are referring to, but we always need the instance certificates now. Both the cloud backup from the backup job and the local volume backup from the repo host need to be able to talk to the pgbackrest server in the instance pod.

@dsessler7 dsessler7 requested a review from benjaminjb July 16, 2025 06:36
@dsessler7
Copy link
Contributor Author

* and we need to add the server configs, but not the certs (that's still dedicated repo host only)

No, we always need both the server config and the certs regardless of what kind of repos are defined.

The rest of your statements/steps sound correct.

Comment on lines +550 to +553
// TODO(cbandy): pass a FQDN in already.
pgHostFQDN := pgHost + "-0." +
serviceName + "." + serviceNamespace + ".svc." +
naming.KubernetesClusterDomain(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 Agreed!

Copy link
Contributor

@benjaminjb benjaminjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the walkthru and now also double-checking: the postgres instance always has the pgbackrest sidecar, so if OTel is enabled, we always want to rotate those, yeah?

@dsessler7
Copy link
Contributor Author

Thanks for the walkthru and now also double-checking: the postgres instance always has the pgbackrest sidecar, so if OTel is enabled, we always want to rotate those, yeah?

Yes, if backups are enabled at all and OTel Logging is enabled, we want to add log rotate configuration for the pgbackrest logs in the postgres instance pod, and this statement always has been true. However, in the code before we were not checking that backups were explicitly enabled, but were checking for repo host volumes. Up until the recent change that I made, starting in 5.7 we always had a repo host in place if backups were enabled, so checking for repo host was essentially equivalent to checking that backups were enabled. However, now that a repo host is only created if a local volume repo is requested in the spec, backups can be enabled without any repo host volumes, so we must explicitly check for backups being enabled when adding the log rotate config. This is the change I made in the last commit.

@dsessler7 dsessler7 force-pushed the create-cloud-repo-specific-job branch from 6a7aa82 to c80957c Compare July 17, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants