Skip to content

Conversation

@shaneboulden
Copy link

Currently the cluster etcd operator creates 0755 permissions for /var/lib/etcd by default:

$ oc debug node/ip-10-0-22-132.ap-southeast-2.compute.internal
sh-5.1# ls -l /host/var/lib/ | grep etcd
drwxr-xr-x.  3 root root     41 Jul  4 03:09 etcd

Upstream etcd uses 0700: https://github.com/etcd-io/etcd/blob/866bc0717c0b56579514c7363f3f47f6cd4109c6/client/pkg/fileutil/fileutil.go#L47

And the OpenShift CIS benchmark requires that the /var/lib/etcd data dir is configured with 0700 file permissions: https://github.com/ComplianceAsCode/content/blob/24dba6b94757881c8d8e0dd21390fa733a7c3b70/applications/openshift/master/file_permissions_var_lib_etcd/rule.yml#L28

This change aligns the cluster etcd operator with upstream and the CIS benchmark for OpenShift.

@openshift-ci openshift-ci bot requested review from Elbehery and jaypoulz July 4, 2025 03:18
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaneboulden
Once this PR has been reviewed and has the lgtm label, please assign dusk125 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 4, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 4, 2025

Hi @shaneboulden. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.


// Creating the cert dir recursively will create the base path too
err = os.MkdirAll(certDir, 0755)
err = os.MkdirAll(certDir, 0700)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure that this changes the dataDir?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only location I could find in the codebase where the /var/lib/etcd dir is created, via the memberDir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious about the permission of "/var/lib/etcd" which set to 0755

but I'm afraid u got the wrong place

maybe u could change the dir permission in the following place just like :

mkdir -p /var/log/etcd && chmod 0600 /var/log/etcd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that the entire folder structure is on 0755:

sh-5.1# stat --format '%a' /var/    
755
sh-5.1# stat --format '%a' /var/lib/etcd/
755 

same with other openshift folders:

sh-5.1# stat --format '%a' /var/lib/ovn-ic/                 
755
sh-5.1# stat --format '%a' /var/lib/kubelet/
755

so I would rather advocate to change the CIS rule.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lance5890 you're right, I'm mistaken. I will take a look at an initContainers approach. Should I turn this into a draft / WIP?

@tjungblu
I think you're right.

I couldn't find the /var/lib/etcd dir explicitly created by the operator. But, doing some digging through older issues, it sounds like the kubelet will create the dir with 0755 if it doesn't exist (not sure how much has changed since 2018...)

kubernetes/kubeadm#1308 (comment)

so I would rather advocate to change the CIS rule.

Looks like 0700 is also part of the STIG: https://github.com/ComplianceAsCode/content/blob/49189d32c6039a5e4ca68f6e4a04de1f719e47b2/products/ocp4/profiles/stig-node-v1r1.profile#L62

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneboulden are you sure that the data dir is part of the STIG? https://stigviewer.com/stigs/kubernetes
The data files I agree and understand, but the datadir itself seems pretty strange to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the STIG is generated from the git repo ;)
Oz is not around anymore for a few years, so we won't be able to ask him. I remember my old team was writing a few of those rules too.

Tracking this back to 2020: ComplianceAsCode/content#6341 and ComplianceAsCode/content#6341 (comment)

not sure whether this ever worked, it also seems an additional check that is not listed in the initial ticket.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Directory creation permissions are restricted from 0755 to 0700 across three files: certDir and tlsDir in render.go, and RestoreAsset functions in two bindata files. No control flow, error handling, or other functional behavior is modified.

Changes

Cohort / File(s) Summary
Permission tightening in certificate/TLS directories
pkg/cmd/render/render.go
Directory creation mode for certDir and tlsDir restricted from 0755 to 0700, limiting access to owner only
Permission tightening in asset restoration
pkg/operator/etcd_assets/bindata.go, pkg/tnf/assets/bindata.go
Directory creation mode in RestoreAsset functions changed from 0755 to 0700, reducing permissions scope for dynamically created directories

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • All changes are consistent, non-functional permission adjustments with identical logic across files
  • No error handling, control flow, or exported API modifications
  • Direct one-to-one permission value replacements with no conditional logic or side effects
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d5259e4 and fb2a38b.

📒 Files selected for processing (3)
  • pkg/cmd/render/render.go (1 hunks)
  • pkg/operator/etcd_assets/bindata.go (1 hunks)
  • pkg/tnf/assets/bindata.go (1 hunks)
🔇 Additional comments (2)
pkg/operator/etcd_assets/bindata.go (1)

2629-2629: Security improvement for asset restoration directories.

Restricting permissions to 0700 when creating directories during asset restoration is appropriate hardening. This affects directories created when extracting embedded etcd manifests and scripts.

Note: This is generated code (go-bindata), so the change reflects an update to the generation tool or its configuration.

pkg/tnf/assets/bindata.go (1)

588-588: Security improvement for TNF asset restoration directories.

Restricting permissions to 0700 for directories created during TNF (Two-Node-Fencing) asset restoration is appropriate security hardening.

Note: This is generated code (go-bindata).

Comment on lines +309 to +315
err = os.MkdirAll(certDir, 0700)
if err != nil {
return fmt.Errorf("failed to create directory %s: %w", memberDir, err)
}
// tlsDir contains the ca bundle and client cert pair for bootkube.sh and the bootstrap apiserver
tlsDir := filepath.Join(r.assetOutputDir, "tls")
err = os.MkdirAll(tlsDir, 0755)
err = os.MkdirAll(tlsDir, 0700)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

This doesn't change /var/lib/etcd permissions.

The permission changes on Lines 309 and 315 tighten security for certificate directories (certDir and tlsDir), which is good. However, these paths are:

  • certDir: /etc/kubernetes/static-pod-resources/etcd-member/etcd-all-certs
  • tlsDir: <assetOutputDir>/tls

Neither is the etcd data directory /var/lib/etcd mentioned in the PR title and description. As discussed in the past review comments and acknowledged by the PR author, /var/lib/etcd is a hostPath volume (see bindata/etcd/pod.yaml line 1773) that's created by the kubelet with default 0755 permissions, not by this operator code.

Impact: This PR won't satisfy the CIS benchmark or STIG requirement for /var/lib/etcd to have 0700 permissions. The certificate directory permission changes are beneficial but don't achieve the stated compliance goal.

Consider using an initContainer in the etcd pod spec to set /var/lib/etcd permissions, or addressing this at the node/kubelet level as suggested in the past review thread.



---

🏁 Script executed:

```shell
#!/bin/bash
# Verify the actual code at lines 309-315 in render.go
echo "=== Checking render.go lines 309-315 ==="
sed -n '309,315p' pkg/cmd/render/render.go

echo -e "\n=== Checking context around these lines (lines 300-320) ==="
sed -n '300,320p' pkg/cmd/render/render.go

echo -e "\n=== Searching for certDir variable assignment ==="
rg -n 'certDir\s*:=' pkg/cmd/render/render.go -B2 -A2

echo -e "\n=== Searching for tlsDir variable assignment ==="
rg -n 'tlsDir\s*:=' pkg/cmd/render/render.go -B2 -A2

echo -e "\n=== Checking if /var/lib/etcd is modified anywhere in render.go ==="
rg -n 'var/lib/etcd' pkg/cmd/render/render.go

echo -e "\n=== Checking bindata/etcd/pod.yaml for hostPath reference ==="
grep -n 'hostPath' bindata/etcd/pod.yaml | head -20

Length of output: 2140


🏁 Script executed:

#!/bin/bash
# Check what the permission values were changed from (if git history available)
echo "=== Checking git diff for permission changes ==="
git diff HEAD~1 pkg/cmd/render/render.go | grep -A3 -B3 'MkdirAll' | head -40

echo -e "\n=== PR title and description context ==="
git log --oneline -1

Length of output: 764


🏁 Script executed:

#!/bin/bash
# Verify broader context - check if /var/lib/etcd permissions are set elsewhere
echo "=== Searching for /var/lib/etcd in all code ==="
rg -n '/var/lib/etcd' --type go --type yaml

echo -e "\n=== Searching for etcd-mount or similar bootstrap scripts ==="
rg -n 'chmod.*etcd|etcd.*chmod' --type bash --type sh

Length of output: 6706


🏁 Script executed:

#!/bin/bash
# Check line 1773 in pod.yaml
echo "=== Checking pod.yaml lines 1760-1790 ==="
sed -n '1760,1790p' bindata/etcd/pod.yaml

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Search for any initContainer that might set permissions
echo "=== Checking for initContainers in pod.yaml ==="
grep -n 'initContainer' bindata/etcd/pod.yaml
grep -n -i 'init' bindata/etcd/pod.yaml | head -20

Length of output: 562


🏁 Script executed:

#!/bin/bash
# Get full context of OS permission changes in this commit
echo "=== All os.MkdirAll calls in render.go with permissions ==="
rg -n 'os.MkdirAll' pkg/cmd/render/render.go -A1

Length of output: 163


🏁 Script executed:

#!/bin/bash
# Verify the exact paths for certDir
echo "=== Full path construction for certDir ==="
rg -n 'certDir' pkg/cmd/render/render.go -B5 -A1 | head -50

echo -e "\n=== Full path construction for tlsDir ==="
rg -n 'tlsDir' pkg/cmd/render/render.go -B2 -A2 | head -40

Length of output: 1671


🏁 Script executed:

#!/bin/bash
# Check PR branch changes vs main
echo "=== Summary of file changes in this PR ==="
git diff --name-only HEAD~1

Length of output: 129



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if pod.gotpl.yaml template contains any chmod for etcd data dir
echo "=== Checking pod.gotpl.yaml for chmod commands ==="
grep -n 'chmod' bindata/etcd/pod.gotpl.yaml

echo -e "\n=== Looking for any script that might set /var/lib/etcd permissions ==="
grep -rn 'chmod.*0700' bindata/etcd/ --include='*.yaml' --include='*.gotpl.yaml'

echo -e "\n=== Checking etcKubernetesDir to understand asset output structure ==="
sed -n '295,310p' pkg/cmd/render/render.go

Length of output: 910


🏁 Script executed:

#!/bin/bash
# Verify the error message on line 312 - it says "memberDir" not "certDir"
echo "=== Checking error message context (lines 310-313) ==="
sed -n '310,313p' pkg/cmd/render/render.go

echo -e "\n=== Check what variable should be in error message ==="
sed -n '306,313p' pkg/cmd/render/render.go

Length of output: 652


🏁 Script executed:

#!/bin/bash
# Confirm no other os.MkdirAll calls with 0700 in this file
echo "=== All MkdirAll calls in render.go context ==="
grep -n 'MkdirAll' pkg/cmd/render/render.go

Length of output: 122


🏁 Script executed:

#!/bin/bash
# Check assetOutputDir to understand the directory structure
echo "=== Looking for assetOutputDir definition and usage ==="
rg -n 'assetOutputDir' pkg/cmd/render/render.go -B2 -A2 | head -60

Length of output: 1910


🏁 Script executed:

#!/bin/bash
# Verify if these cert/tls directories end up in /var/lib/etcd somehow
echo "=== Check if render output goes to /var/lib/etcd or elsewhere ==="
rg -n 'assetOutputDir.*=' pkg/cmd/render/render.go | head -10

Length of output: 101


This doesn't change /var/lib/etcd permissions, and there's a bug in the error message.

The permission changes on lines 309 and 315 tighten security for certificate directories (certDir and tlsDir), which is beneficial. However, these paths are:

  • certDir: <assetOutputDir>/etc-kubernetes/static-pod-resources/etcd-member/etcd-all-certs
  • tlsDir: <assetOutputDir>/tls

Neither is the etcd data directory /var/lib/etcd referenced in the PR title and description. The pod specification defines /var/lib/etcd as a hostPath volume (line 359 in bindata/etcd/pod.yaml), which is created by the kubelet with default permissions—not by this operator code. No chmod command exists in the pod templates for /var/lib/etcd.

Additionally, line 312 has a bug: the error message for the certDir creation failure incorrectly references memberDir instead of certDir.

This PR won't satisfy CIS benchmark or STIG requirements for /var/lib/etcd to have 0700 permissions. Consider using an initContainer in the etcd pod spec to set /var/lib/etcd permissions, or addressing this at the node/kubelet level.

Fix needed: Line 312—change error message from memberDir to certDir.

🤖 Prompt for AI Agents
In pkg/cmd/render/render.go around lines 309 to 315, the error returned when
creating certDir mistakenly references memberDir; update the fmt.Errorf call to
reference certDir instead of memberDir. Do not attempt to change /var/lib/etcd
here (this code only creates asset dirs under assetOutputDir); if /var/lib/etcd
permissions must be enforced, handle that separately (e.g., initContainer in the
etcd pod or node/kubelet configuration).

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 6, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 6, 2025

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants