-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant VPC routers stuck in BACKUP; cannot add default route, interface remains down #10281
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
/etc/cloudstack/monitorservice.json{
"config": "[ssh]:processname=sshd:servicename=ssh:pidfile=/var/run/sshd.pid:,",
"excluded_health_checks": "gateways_check.py",
"health_checks_advanced_run_interval": 10,
"health_checks_basic_run_interval": 3,
"health_checks_config": {
"gateways": "gatewaysIps=169.254.0.1 x.x.x.222 ",
"haproxyData": "",
"portForwarding": "",
"routerVersion": "templateVersion=Cloudstack Release 4.20.0.0 Fri Sep 27 01:02:15 PM UTC 2024,scriptsVersion=14829be797d19c6d38e8c77efff9aeea\n",
"systemThresholds": "minDiskNeeded=100.0,maxCpuUsage=100.0,maxMemoryUsage=100.0;",
"virtualMachines": ""
},
"health_checks_enabled": true,
"id": "monitorservice"
} /etc/cloudstack/ips.json
ip a show
/etc/cloudstack/cmdline.json{
"type": "cmdline",
"cmd_line": {
"vpccidr": "10.x.x.x/16",
"domain": "mycloud.internal",
"publicMtu": "1500",
"dns1": "1.1.1.1",
"dns2": "8.8.8.8",
"template": "domP",
"name": "r-29-VM",
"authorized_key": "ZWNkc2Etc2hhMi1uaXN0cDI1NiBBQUFBRTJWalpITmhMWE...",
"eth0ip": "169.254.x.x",
"eth0mask": "255.255.0.0",
"redundant_router": "1",
"advert_int": "1",
"router_id": "7",
"router_password": "12149581094199833879[...]65351083090438",
"redundant_state": "PRIMARY",
"type": "vpcrouter",
"disable_rp_filter": "true",
"baremetalnotificationsecuritykey": "Hm9nj6Y3zYPPwj2saTdeRZNInwXpSPDbY7rjbj4Xkes-tSENX6O33uZ...",
"baremetalnotificationapikey": "WAJz4lQ6Dx3ARNQnjVVafhjzS1CqYdFhOI0EZjVb5LGt1WOWufmi...",
"host": "x.x.x.193",
"port": "8080",
"logrotatefrequency": "*:00:00",
"source_nat_ip": "x.x.x.196"
}
} /var/log/cloud.log
|
@Rid in my testing, when vm is Runing, the VRs become PRIMARY/BACKUP which look fine. |
@weizhouapache both VMs are "Running", VPC state is "Enabled", both VPC routers are in "FAULT" Redundant state. |
the VPC VRs will be brought to PRIMARY/BACKUP by keepalived. can you check the status of what's the state of the network/vpctier ? does |
That's part of the issue, there's no
There's no eth2:
|
it is the main issue. Have you create a vpc tier ? If yes, what's the state of the vpc tier ? |
If I create a vpc tier (guest network), it stays in the "Allocated" state. No virtual routers are listed. If I try to create an instance I get the error:
|
it looks the vpc tier cannot be implemented due to insufficient vlan. can you check zone->physical networks -> choose the physical network with Guest traffic type -> Update physical network to see if VLAN/VNI is set ? are all vlans in use ? |
Thanks @weizhouapache, it turns out that the initial wizard configuration didn't complete successfully as it queried too many resources in the browser, we setup the network configuration manually and didn't put a VLAN range. Once we added the VLAN range and was able to add a compute instance, it added eth2 and eth1 came up correctly. I wasn't aware a compute instance was necessary to bring up the redundant VPC routers, perhaps that should be added to the documentation? |
good.
If vm instance is created but not running, the VPC VRs should be BACKUP/BACKUP or UNKNOWN/UNKNOWN |
problem
We have a new CloudStack 4.20.0.0 environment with redundant VPC routers, but they never transition to MASTER. Instead:
…but this fails with exit code 2 (“Nexthop has invalid gateway”). We believe it fails as the interface remains in the DOWN state.
The VR script then tears eth1 down, inserts a “throw x.x.x.0/27” route in Table_eth1, and marks the router as BACKUP or FAULT.
Keepalived never starts because the script believes routing is broken. Thus no VRRP negotiation occurs, and no router becomes MASTER.
We can manually bring eth1 up (ip link set eth1 up) and add a default route to the main or custom table, and it works fine. However, CloudStack’s scripts immediately revert the interface to DOWN again and keep the router in BACKUP.
Key details:
VR logs show repeated attempts to configure the default route via x.x.x.x inside Table_eth1, followed by throw x.x.x.0/27.
Even if we remove the throw route, the script tries to add a route while eth1 is still down, fails, and resets to BACKUP.
Because of this cycle, we never see /etc/keepalived/keepalived.conf generated or keepalived started.
versions
Apache CloudStack: 4.20.0.0
System VM template: Debian GNU/Linux 12
Hypervisor: KVM
Networking: Advanced networking with VLAN trunking, rp_filter disabled
We modified the systemvm template to add a static route which our setup needs. We added
/etc/network/if-up.d/91-add-route
:We do not believe this is related to the issue.
The steps to reproduce the bug
What to do about it?
Ideally, the VR script should:
If you need more logs or specifics, we can provide full VR logs and examples of the failing ip route commands. Let us know if you have any questions or potential workarounds—thanks!
The text was updated successfully, but these errors were encountered: