Skip to content

Conversation

martinkennelly
Copy link
Contributor

@martinkennelly martinkennelly commented Jun 14, 2025

…et when syncd

Clear ovn-remote on startup to prevent ovn-controller connecting to a stale OVN southbound database. OVN Kube Controller may not have sync'd yet. ovn-remote will be set by OVNKube controller.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 14, 2025
Copy link
Contributor

openshift-ci bot commented Jun 14, 2025

@martinkennelly: GitHub didn't allow me to request PR reviews from the following users: martinkennelly.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

…et when syncd

/hold
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly martinkennelly changed the title WIP: Networking: reset ovn-remote config and allow ovnkube controller to s… OCPBUGS-42303: Networking: reset ovn-remote config and allow ovnkube controller to s… Jun 15, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 15, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-42303, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

…et when syncd

/hold
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 15, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-42303, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huiran0826

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-42303. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

…et when syncd

/hold
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@martinkennelly
Copy link
Contributor Author

Closing because theres a cost to pausing and resuming. Another approach is tried.

@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-42303, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huiran0826

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

…et when syncd

/hold
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…et it

This fixes the issue where ovn-remote is set
prior to reboot and when boot occurs, ovn-controller
syncs quickly with a stale SB DB.

This PR is part of the EIP GARP issue fix.
Its required because when ovnkube-controller and
ovn-controller container start on boot, there
is no order to which container will start first,
and we dont want ovn-controller to connect to SB DB
before ovnkube controller has added the drop flows.

Ideally, we would only allow ovn-controller to sync
with SB DB when ovnkube controller has concluded
syncing and the changes are available in SB DB.
That maybe future work.

Signed-off-by: Martin Kennelly <[email protected]>
@martinkennelly
Copy link
Contributor Author

The great Ben suggested to move this before the nmstate config check - nmstate maybe used to config br-ex
No other change.

@martinkennelly
Copy link
Contributor Author

/retest

@pliurh
Copy link
Contributor

pliurh commented Sep 11, 2025

This patch only reset ovn-remote when a node reboot. Wouldn't the same issue happen during an ovnkube-node pod restart?

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Sep 11, 2025

This patch only reset ovn-remote when a node reboot. Wouldn't the same issue happen during an ovnkube-node pod restart?

yep! we dont want to clear it during ovnkube-controller container restart because ovn-controller will do a full sync again with sb db and thats very costly. We already covered this scenario with drop flows.

@martinkennelly
Copy link
Contributor Author

So pod / container restart covered with the GARP drop flows added to the ext bridge and this covers the node reboot. We need to add this because the drop flows added the the ext bridge when ovnkube controller shuts down do not persist following a reboot.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 905b7e5 and 1 for PR HEAD 4d91920 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 733131d and 0 for PR HEAD 4d91920 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 4d91920 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2025
@martinkennelly
Copy link
Contributor Author

CI is borked. Infra issues and also cannot find nmstate version - unrelated to this PR

@martinkennelly
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2025
@martinkennelly
Copy link
Contributor Author

/retest

maybe CI is back..

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 733131d and 2 for PR HEAD 4d91920 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4f14943 and 1 for PR HEAD 4d91920 in total

@martinkennelly
Copy link
Contributor Author

/test e2e-gcp-op

CI is still borked.

@martinkennelly
Copy link
Contributor Author

/unhold

Trying again.

@martinkennelly
Copy link
Contributor Author

/cherry-pick release-4.20

@openshift-cherrypick-robot

@martinkennelly: once the present PR merges, I will cherry-pick it on top of release-4.20 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly
Copy link
Contributor Author

/retest

@huiran0826
Copy link

/test e2e-aws-ovn

@tssurya
Copy link
Contributor

tssurya commented Sep 29, 2025

@martinkennelly
Copy link
Contributor Author

Theres a few issues with deprovisioning i think - one i see most on this PR is the rate limiting in aws which should be fixed by: openshift/installer#9958 I was unable to find a bug.

Ive also seen other deprovisioning issue that didint mention rate limiting in aws. I dont have a bug for it.
Lets see retry fixes this.

Copy link
Contributor

openshift-ci bot commented Sep 29, 2025

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-techpreview dd6e137 link false /test e2e-gcp-op-techpreview
ci/prow/e2e-gcp-mco-disruptive 4d91920 link false /test e2e-gcp-mco-disruptive
ci/prow/e2e-gcp-op 4d91920 link false /test e2e-gcp-op
ci/prow/e2e-gcp-op-ocl 4d91920 link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 4d91920 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-aws-mco-disruptive 4d91920 link false /test e2e-aws-mco-disruptive
ci/prow/bootstrap-unit 4d91920 link false /test bootstrap-unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@martinkennelly
Copy link
Contributor Author

A new deprovisioning bug it seems "Custom IAM endpoint not found, using default endpoint" - trying to get a bug and ill look for override.

@martinkennelly
Copy link
Contributor Author

Cannot find a bug. I am engaging with installer team.

@martinkennelly
Copy link
Contributor Author

At least the other de-provision bug looks solved :)

@martinkennelly
Copy link
Contributor Author

@martinkennelly
Copy link
Contributor Author

@yuqi-zhang can you please over ride the e2e-aws-ovn job? I cannot wait for a fix to the CI as my issue has been escalated. See the attached bug.

@yuqi-zhang
Copy link
Contributor

/override ci/prow/e2e-aws-ovn

Failure should be unrelated

Copy link
Contributor

openshift-ci bot commented Sep 30, 2025

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-aws-ovn

In response to this:

/override ci/prow/e2e-aws-ovn

Failure should be unrelated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 4c5e822 into openshift:main Sep 30, 2025
17 of 23 checks passed
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: Jira Issue Verification Checks: Jira Issue OCPBUGS-42303
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-42303 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

…et when syncd

Clear ovn-remote on startup to prevent ovn-controller connecting to a stale OVN southbound database. OVN Kube Controller may not have sync'd yet. ovn-remote will be set by OVNKube controller.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@martinkennelly: new pull request created: #5317

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants