Skip to content

Conversation

@clobrano
Copy link
Contributor

This change configures the podman-etcd resource to allow restarts in case of a start failure by setting . This is a prerequisite for the resource-agent to attempt restarts, improving the resilience of the etcd cluster. The actual restart logic is handled by the resource-agent itself.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 17, 2025

@clobrano: This pull request references OCPEDGE-2231 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

This change configures the podman-etcd resource to allow restarts in case of a start failure by setting . This is a prerequisite for the resource-agent to attempt restarts, improving the resilience of the etcd cluster. The actual restart logic is handled by the resource-agent itself.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 17, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2025
@clobrano clobrano changed the title OCPEDGE-2231: feat: Allow podman-etcd resource-agent to restart on start failure OCPEDGE-2231: [TNF] feat: Allow podman-etcd resource-agent to restart on start failure Nov 17, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

Two PCS resource creation commands were modified to include additional configuration parameters. The etcd resource now includes a migration-threshold=5 meta attribute, and the kubelet resource receives both migration-threshold=5 and a new PCS property command for start-failure-is-fatal=false. No control flow or error handling logic was altered.

Changes

Cohort / File(s) Summary
PCS resource configuration updates
pkg/tnf/pkg/pcs/cluster.go, pkg/tnf/pkg/pcs/etcd.go
Added migration-threshold=5 meta attribute to resource creation commands. In cluster.go, also added a new PCS property command to set start-failure-is-fatal=false for cluster startup failure handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Changes follow a consistent pattern of parameter additions to existing resource creation commands
  • No modifications to error handling, control flow, or validation logic
  • Changes are isolated to resource configuration sections with straightforward semantics
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f371609 and db85da4.

📒 Files selected for processing (2)
  • pkg/tnf/pkg/pcs/cluster.go (1 hunks)
  • pkg/tnf/pkg/pcs/etcd.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/tnf/pkg/pcs/cluster.go
  • pkg/tnf/pkg/pcs/etcd.go
🔇 Additional comments (2)
pkg/tnf/pkg/pcs/etcd.go (1)

23-31: Etcd migration-threshold=5 meta looks correct; confirm desired failure tolerance

The added meta migration-threshold=5 on the etcd clone resource is syntactically correct for pcs and consistent with the kubelet change. From a code perspective this is fine and keeps the command readable. Please just confirm that a threshold of 5 failures before migration/banning matches the intended restart/move policy for this test HA cluster.

pkg/tnf/pkg/pcs/cluster.go (1)

30-40: Kubelet migration threshold and global start-failure-is-fatal=false look correct—check cluster‑wide impact

The updated kubelet resource create command and the added pcs property set start-failure-is-fatal=false step are both syntactically sound and fit cleanly into the existing command sequence. Since start-failure-is-fatal is a cluster-wide property, please verify that making all start failures non-fatal is acceptable for every resource this helper will manage, not just podman-etcd/kubelet, and that the chosen migration-threshold=5 aligns with the desired restart vs. migrate behavior.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2025
@clobrano clobrano force-pushed the enhancement/podman-etcd-resource-create-with-start-failure-is-fatal-false branch 3 times, most recently from 99f9ae3 to a3a171f Compare November 18, 2025 11:10
This change configures the TNF cluster to allow restarts in case of a
start failure by setting the attribute `start-failure-is-fatal=false`.

This is a prerequisite for the resource-agents to attempt restarts upon
failures during their start action.
@clobrano clobrano force-pushed the enhancement/podman-etcd-resource-create-with-start-failure-is-fatal-false branch from a3a171f to db85da4 Compare November 21, 2025 13:25
@clobrano clobrano marked this pull request as ready for review November 28, 2025 09:55
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2025
@openshift-ci openshift-ci bot requested review from jaypoulz and slintes November 28, 2025 09:57
@clobrano
Copy link
Contributor Author

clobrano commented Dec 2, 2025

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2025

@clobrano: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants