Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade failure 1.4 -> 1.8: not valid router found #1911

Open
hash-d opened this issue Jan 23, 2025 · 5 comments
Open

Upgrade failure 1.4 -> 1.8: not valid router found #1911

hash-d opened this issue Jan 23, 2025 · 5 comments

Comments

@hash-d
Copy link
Member

hash-d commented Jan 23, 2025

Describe the bug

After upgrading a 1.4 site to 1.8, the command skupper network status fails with the error below:

$ skupper --platform kubernetes --namespace pub-0-d1e--frontend status
Error: not valid router found

How To Reproduce

  • Create two 1.4 sites
  • Link them (token create / link create)
  • Run skupper network version and skupper status as a baseline
  • Upgrade one of them to 1.8
  • Run skupper network version and skupper status again, they will fail
  • Upgrade the other site and run the commands above again, which will also fail

Expected behavior

The status commands should not issue the error above

Environment details

  • Skupper CLI: 1.4.7, 1.8.3
  • Skupper Operator (if applicable): N/A
  • Platform: OpenShift

Additional context

A quick check indicates that the problem is on a missing site.Site.Platform (empty value) on pkg/network/network.go / DisplayableRouter().

Upgrades from 1.5.5 and from 1.8.2 into 1.8.3 worked fine. Tomorrow I'll retest 1.4 -> 1.5 to check whether that happens there as well.

@hash-d
Copy link
Member Author

hash-d commented Jan 23, 2025

Please note the issue may be intermittent. On some tests, the issue is not reproduced.

@hash-d
Copy link
Member Author

hash-d commented Jan 23, 2025

I have run the test three times 1.4 -> 1.5, with no issues.

@fgiorgetti
Copy link
Member

This problem is happening when sites are updated one by one on a connected VAN.

As an example, after one site is updated, config-sync gets restarted and detects the router
and controller beacons from the outdated site, still running 1.4.7.
The Site record sent by the controller running the old version, does not provide the platform value.

After that other site is updated, the controller starts and sends the new SiteRecord, but as that site
record is already stored by the collector lite in the first namespace, the new information is discarded.

Therefore the site that was updated first, never gets the platform and version values, leaving the site
entry in the skupper-network-status configmap outdated, causing the CLI to fail (as no platform is set).

@c-kruse @ajssmith Let me know if you think fix #1918 is appropriate here.

@hash-d
Copy link
Member Author

hash-d commented Jan 23, 2025

The issue also happens 1.4.7 -> 1.8.2, so this is no new for 1.8.3

@c-kruse
Copy link
Contributor

c-kruse commented Jan 23, 2025

@fgiorgetti excellent find and summary! +1 to go ahead with for 1.8.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants