Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant VPC routers stuck in BACKUP; cannot add default route, interface remains down #10281

Open
Rid opened this issue Jan 27, 2025 · 11 comments
Milestone

Comments

@Rid
Copy link

Rid commented Jan 27, 2025

problem

We have a new CloudStack 4.20.0.0 environment with redundant VPC routers, but they never transition to MASTER. Instead:

  1. Each VR tries to bring up the public interface (eth1) and add a default route, e.g.:
ip route add default via x.x.x.x table Table_eth1 proto static

…but this fails with exit code 2 (“Nexthop has invalid gateway”). We believe it fails as the interface remains in the DOWN state.

  1. The VR script then tears eth1 down, inserts a “throw x.x.x.0/27” route in Table_eth1, and marks the router as BACKUP or FAULT.

  2. Keepalived never starts because the script believes routing is broken. Thus no VRRP negotiation occurs, and no router becomes MASTER.

We can manually bring eth1 up (ip link set eth1 up) and add a default route to the main or custom table, and it works fine. However, CloudStack’s scripts immediately revert the interface to DOWN again and keep the router in BACKUP.

Key details:

VR logs show repeated attempts to configure the default route via x.x.x.x inside Table_eth1, followed by throw x.x.x.0/27.
Even if we remove the throw route, the script tries to add a route while eth1 is still down, fails, and resets to BACKUP.
Because of this cycle, we never see /etc/keepalived/keepalived.conf generated or keepalived started.

versions

Apache CloudStack: 4.20.0.0
System VM template: Debian GNU/Linux 12
Hypervisor: KVM
Networking: Advanced networking with VLAN trunking, rp_filter disabled

We modified the systemvm template to add a static route which our setup needs. We added /etc/network/if-up.d/91-add-route:

#!/bin/sh
#
# /etc/network/if-up.d/91-add-route
#
# This script is automatically invoked by ifup each time
# an interface is brought up. The environment variable $IFACE
# contains the interface name (e.g., eth0, ens3, etc.).

[ "$IFACE" = "lo" ] && exit 0

# Gather *all* IPv4 addresses (CIDR format) on this interface
IP_CIDR_LIST=$(ip -o -4 addr show dev "$IFACE" | awk '{print $4}')
[ -z "$IP_CIDR_LIST" ] && exit 0  # no IPv4 addresses on $IFACE, so exit

# Loop through each IPv4 address on this interface
for IP_CIDR in $IP_CIDR_LIST
do
  # Extract the actual IP address (without /mask)
  IP_ADDR=$(echo "$IP_CIDR" | cut -d '/' -f 1)

  # Check if IP is in x.x.x.x/27
  if echo "$IP_ADDR" | grep -Eq '^-redacted-$'; then
    echo "Interface $IFACE has IP $IP_ADDR in x.x.x.x/27; adding route..."
    ip route add x.x.x.x/27 dev "$IFACE" scope link src "$IP_ADDR" 2>/dev/null || true

    # Once we've added the route for the first matching IP, we're done.
    exit 0
  fi
done

exit 0

We do not believe this is related to the issue.

The steps to reproduce the bug

  1. Install or upgrade to CloudStack 4.20.0.0 with advanced networking.
  2. Create a VPC offering that uses redundant VR.
  3. Deploy a VPC that picks up two VRs.
  4. Observe in /var/log/cloud.log (and the VR’s cloud.log) that each router fails to add its default route via x.x.x.x, then tears down eth1 and remains BACKUP/FAULT indefinitely.

What to do about it?

Ideally, the VR script should:

  1. Ensure eth1 is brought up before adding the default route in the policy routing table (Table_eth1).
  2. Avoid placing a “throw” route for x.x.x.0/27 on the router that’s intended to be MASTER.
  3. Generate and start keepalived once the router is designated MASTER (or “PRIMARY” per the cmdline), so it can finalize the interface config instead of reverting to BACKUP.

If you need more logs or specifics, we can provide full VR logs and examples of the failing ip route commands. Let us know if you have any questions or potential workarounds—thanks!

Copy link

boring-cyborg bot commented Jan 27, 2025

Thanks for opening your first issue here! Be sure to follow the issue template!

@Rid
Copy link
Author

Rid commented Jan 27, 2025

/etc/cloudstack/monitorservice.json

{
  "config": "[ssh]:processname=sshd:servicename=ssh:pidfile=/var/run/sshd.pid:,",
  "excluded_health_checks": "gateways_check.py",
  "health_checks_advanced_run_interval": 10,
  "health_checks_basic_run_interval": 3,
  "health_checks_config": {
    "gateways": "gatewaysIps=169.254.0.1 x.x.x.222 ",
    "haproxyData": "",
    "portForwarding": "",
    "routerVersion": "templateVersion=Cloudstack Release 4.20.0.0 Fri Sep 27 01:02:15 PM UTC 2024,scriptsVersion=14829be797d19c6d38e8c77efff9aeea\n",
    "systemThresholds": "minDiskNeeded=100.0,maxCpuUsage=100.0,maxMemoryUsage=100.0;",
    "virtualMachines": ""
  },
  "health_checks_enabled": true,
  "id": "monitorservice"
}

/etc/cloudstack/ips.json

{
  "eth0": [
    {
      "add": true,
      "broadcast": "169.254.x.255",
      "cidr": "169.254.x.x/16",
      "device": "eth0",
      "gateway": "",
      "netmask": "255.255.0.0",
      "network": "169.254.0.0/16",
      "nic_dev_id": "0",
      "nw_type": "control",
      "one_to_one_nat": false,
      "public_ip": "169.254.x.x",
      "size": "16",
      "source_nat": false
    }
  ],
  "eth1": [
    {
      "add": true,
      "broadcast": "x.x.x.223",
      "cidr": "x.x.x.196/27",
      "device": "eth1",
      "first_i_p": true,
      "gateway": "x.x.x.222",
      "is_private_gateway": false,
      "mtu": "1500",
      "netmask": "255.255.255.224",
      "network": "x.x.x.192/27",
      "new_nic": false,
      "nic_dev_id": 1,
      "nw_type": "public",
      "one_to_one_nat": false,
      "public_ip": "x.x.x.196",
      "size": "27",
      "source_nat": true,
      "vif_mac_address": "1e:00:f7:00:ff:00"
    }
  ],
  "id": "ips"
}

ip a show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:00:a9:fe:9c:bd brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 169.254.x.x/16 brd 169.254.x.255 scope global eth0
       valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 1e:00:f7:00:ff:00 brd ff:ff:ff:ff:ff:ff
    altname enp0s8
    altname ens8
    inet x.x.x.196/27 brd x.x.x.223 scope global eth1
       valid_lft forever preferred_lft forever

/etc/cloudstack/cmdline.json

{
  "type": "cmdline",
  "cmd_line": {
    "vpccidr": "10.x.x.x/16",
    "domain": "mycloud.internal",
    "publicMtu": "1500",
    "dns1": "1.1.1.1",
    "dns2": "8.8.8.8",
    "template": "domP",
    "name": "r-29-VM",
    "authorized_key": "ZWNkc2Etc2hhMi1uaXN0cDI1NiBBQUFBRTJWalpITmhMWE...",
    "eth0ip": "169.254.x.x",
    "eth0mask": "255.255.0.0",
    "redundant_router": "1",
    "advert_int": "1",
    "router_id": "7",
    "router_password": "12149581094199833879[...]65351083090438",
    "redundant_state": "PRIMARY",
    "type": "vpcrouter",
    "disable_rp_filter": "true",
    "baremetalnotificationsecuritykey": "Hm9nj6Y3zYPPwj2saTdeRZNInwXpSPDbY7rjbj4Xkes-tSENX6O33uZ...",
    "baremetalnotificationapikey": "WAJz4lQ6Dx3ARNQnjVVafhjzS1CqYdFhOI0EZjVb5LGt1WOWufmi...",
    "host": "x.x.x.193",
    "port": "8080",
    "logrotatefrequency": "*:00:00",
    "source_nat_ip": "x.x.x.196"
  }
}

/var/log/cloud.log

Mon Jan 27 12:01:01 PM UTC 2025 Starting guest services for kvm
Mon Jan 27 12:01:06 PM UTC 2025 acpiphp and pci_hotplug module already compiled in
Mon Jan 27 12:01:07 PM UTC 2025 Received a new non-empty cmdline file from qemu-guest-agent
Mon Jan 27 12:01:07 PM UTC 2025 Booting from cloudstack, remove old configuration files in /etc/cloudstack/
Mon Jan 27 12:01:18 PM UTC 2025 Applying iptables rules
Mon Jan 27 12:01:18 PM UTC 2025 Setting up interface: eth0
Mon Jan 27 12:01:18 PM UTC 2025 Set up route for management network:  via local gateway:  for device eth0 for hypervisor: kvm
Mon Jan 27 12:01:19 PM UTC 2025 Executing cloud-early-config
Mon Jan 27 12:01:19 PM UTC 2025 Scripts checksum detected: oldmd5=c4e800567ec1a366252816b4c4f386e6 newmd5=c4e800567ec1a366252816b4c4f386e6
Mon Jan 27 12:01:21 PM UTC 2025 Could not find patch file, retrying
Mon Jan 27 12:01:23 PM UTC 2025 Could not find patch file, retrying
Mon Jan 27 12:01:31 PM UTC 2025 Could not find patch file, retrying
Mon Jan 27 12:01:33 PM UTC 2025 Could not find patch file, retrying
Mon Jan 27 12:01:35 PM UTC 2025 Could not find patch file, retrying
Mon Jan 27 12:01:35 PM UTC 2025 Scripts checksum detected: oldmd5=c4e800567ec1a366252816b4c4f386e6 newmd5=14829be797d19c6d38e8c77efff9aeea
Mon Jan 27 12:01:35 PM UTC 2025 Patched scripts using /var/cache/cloud/cloud-scripts.tgz
Mon Jan 27 12:01:36 PM UTC 2025 Bootstrapping systemvm appliance
Mon Jan 27 12:01:38 PM UTC 2025 Configuring systemvm type=vpcrouter
Mon Jan 27 12:01:38 PM UTC 2025 Setting up VPC virtual router system vm
Mon Jan 27 12:01:38 PM UTC 2025 Set up route for management network:  via local gateway:  for device eth0 for hypervisor:
Mon Jan 27 12:01:38 PM UTC 2025 Setting up apache web server for VPC
Mon Jan 27 12:01:39 PM UTC 2025 Processors = 10  Enable service  = 1
Mon Jan 27 12:01:39 PM UTC 2025 cloud: disable rp_filter
Mon Jan 27 12:01:39 PM UTC 2025 disable rpfilter
Mon Jan 27 12:01:39 PM UTC 2025 cloud: enable_fwding = 1
Mon Jan 27 12:01:39 PM UTC 2025 enable_fwding = 1
Mon Jan 27 12:01:39 PM UTC 2025 cloud: enabling passive FTP for guest VMs
Mon Jan 27 12:01:40 PM UTC 2025 Unziping /var/cache/cloud/agent.zip
Mon Jan 27 12:01:45 PM UTC 2025 Adding PubkeyAcceptedAlgorithms=+ssh-rsa to sshd_config
Mon Jan 27 12:01:45 PM UTC 2025 Skipped the installation of package python-is-python3 on Debian 12 as it can only be installed on Debian 11.
Mon Jan 27 12:01:46 PM UTC 2025 Skipped the installation of package python3-netaddr on Debian 12 as it can only be installed on Debian 11.
Mon Jan 27 12:01:46 PM UTC 2025 Finished setting up systemvm
Mon Jan 27 12:01:46 PM UTC 2025 Finished setting up systemvm
2025-01-27 12:01:47,103 INFO     update_config.py :: Processing incoming file => cmd_line.json
2025-01-27 12:01:47,103 INFO     Processing JSON file cmd_line.json
2025-01-27 12:01:47,103 INFO     Continuing with the processing of file '/var/cache/cloud/cmd_line.json'
2025-01-27 12:01:47,104 INFO     Command of type cmdline received
2025-01-27 12:01:47,104 INFO     Command of type ips received
2025-01-27 12:01:47,105 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:47,111 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:47,114 INFO     Address found in DataBag ==> {'add': True, 'broadcast': '169.254.x.255', 'cidr': '169.254.x.x/16', 'device': 'eth0', 'gateway': '', 'netmask': '255.255.0.0', 'network': '169.254.0.0/16', 'nic_dev_id': '0', 'nw_type': 'control', 'one_to_one_nat': False, 'public_ip': '169.254.x.x', 'size': '16', 'source_nat': False}
2025-01-27 12:01:47,114 INFO     Address 169.254.x.x/16 on device eth0 already configured
2025-01-27 12:01:47,116 INFO     Nothing to commit. The /etc/radvd.conf.new file did not change
2025-01-27 12:01:47,116 INFO     Processing CsBgpPeers file ==> {'id': 'bgppeers'}
2025-01-27 12:01:47,120 INFO     Wrote edited file /etc/frr/daemons
2025-01-27 12:01:47,120 INFO     Updated file in-cache configuration
2025-01-27 12:01:47,122 INFO     Wrote edited file /etc/frr/frr.conf
2025-01-27 12:01:47,122 INFO     Updated file in-cache configuration
2025-01-27 12:01:47,122 INFO     Executing: systemctl enable frr
2025-01-27 12:01:47,855 INFO     Executing: systemctl restart frr
2025-01-27 12:01:48,228 INFO     Executing: ip addr show |grep -v secondary
2025-01-27 12:01:48,235 INFO     Wrote edited file /etc/dnsmasq.d/cloud.conf
2025-01-27 12:01:48,235 INFO     Updated file in-cache configuration
2025-01-27 12:01:48,235 INFO     Nothing to commit. The /etc/dhcphosts.txt file did not change
2025-01-27 12:01:48,235 INFO     Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2025-01-27 12:01:48,236 INFO     Nothing to commit. The /etc/dhcpopts.txt file did not change
2025-01-27 12:01:48,236 INFO     Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file
2025-01-27 12:01:48,236 ERROR    Caught error while trying to delete entries from dnsmasq.leases file: [Errno 2] No such file or directory: '/etc/dhcphosts.txt'
2025-01-27 12:01:48,236 INFO     Wrote edited file /etc/hosts
2025-01-27 12:01:48,236 INFO     Updated file in-cache configuration
2025-01-27 12:01:48,236 INFO     Updated hosts file
2025-01-27 12:01:48,236 INFO     Executing: systemctl restart dnsmasq
2025-01-27 12:01:48,427 INFO     Service dnsmasq restart
2025-01-27 12:01:48,427 INFO     Executing: ip addr show |grep -v secondary
2025-01-27 12:01:48,432 INFO     Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change
2025-01-27 12:01:48,432 INFO     Nothing to commit. The /etc/dhcphosts.txt file did not change
2025-01-27 12:01:48,432 INFO     Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change
2025-01-27 12:01:48,432 INFO     Nothing to commit. The /etc/dhcpopts.txt file did not change
2025-01-27 12:01:48,432 INFO     Executing2: systemctl is-active dnsmasq
2025-01-27 12:01:48,440 INFO     Executing: systemctl reload dnsmasq
2025-01-27 12:01:48,486 INFO     Service dnsmasq reload
2025-01-27 12:01:48,487 INFO     Wrote edited file /etc/cron.d/process
2025-01-27 12:01:48,487 INFO     Updated file in-cache configuration
2025-01-27 12:01:48,487 INFO     Flush all IPv6 ACL rules
2025-01-27 12:01:48,487 INFO     Executing: nft list tables ip6 | grep ip6_acl
2025-01-27 12:01:48,503 ERROR    Command 'nft list tables ip6 | grep ip6_acl' returned non-zero exit status 1.
2025-01-27 12:01:48,503 INFO     Executing: iptables-save | grep '^:FW_EGRESS_RULES' || iptables -t filter -N FW_EGRESS_RULES
2025-01-27 12:01:48,509 INFO     Executing: iptables-save | grep '^-A FW_EGRESS_RULES -j ACCEPT$' | sed 's/^-A/iptables -t filter -D/g' | bash
2025-01-27 12:01:48,513 INFO     Executing: iptables -F FW_EGRESS_RULES
2025-01-27 12:01:48,516 INFO     Executing: ipset -L | grep Name:  | awk {'print $2'} | ipset flush
2025-01-27 12:01:48,534 INFO     Executing: ipset -L | grep Name:  | awk {'print $2'} | ipset destroy
2025-01-27 12:01:48,538 INFO     Flush all IPv6 firewall rules
2025-01-27 12:01:48,538 INFO     Executing: nft list tables ip6 | grep ip6_firewall
2025-01-27 12:01:48,543 ERROR    Command 'nft list tables ip6 | grep ip6_firewall' returned non-zero exit status 1.
2025-01-27 12:01:48,543 INFO     Processing IPv6 firewall rules {'id': 'ipv6firewallrules'}; []
2025-01-27 12:01:48,544 INFO     Executing: iptables-save
2025-01-27 12:01:48,548 INFO     Configuring nftables IPv4 firewall rules []
2025-01-27 12:01:48,548 INFO     Executing: iptables-save
2025-01-27 12:01:48,552 INFO     Configuring nftables IPv4 ACL rules []
2025-01-27 12:01:48,552 INFO     Executing: iptables-save
2025-01-27 12:01:48,555 INFO     Configuring nftables IPv6 ACL rules []
2025-01-27 12:01:48,555 INFO     Executing: iptables-save
2025-01-27 12:01:48,559 INFO     Configuring nftables IPv6 firewall rules []
2025-01-27 12:01:48,559 INFO     Executing: iptables-save
2025-01-27 12:01:48,567 INFO     Executing: iptables-save
2025-01-27 12:01:48,631 INFO     Executing: ip6tables-save
2025-01-27 12:01:48,635 INFO     Executing: nft list ruleset
2025-01-27 12:01:48,643 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:48,662 INFO     Service conntrackd stop
2025-01-27 12:01:48,662 INFO     Executing: systemctl stop ipsec
2025-01-27 12:01:48,676 INFO     Service ipsec stop
2025-01-27 12:01:48,676 INFO     Executing: systemctl stop xl2tpd
2025-01-27 12:01:48,692 INFO     Service xl2tpd stop
2025-01-27 12:01:48,692 INFO     Executing: systemctl stop dnsmasq
2025-01-27 12:01:48,768 INFO     Service dnsmasq stop
2025-01-27 12:01:48,768 INFO     Router switched to backup mode
2025-01-27 12:01:48,768 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:48,779 INFO     Service conntrackd stop
2025-01-27 12:01:48,779 INFO     Executing: systemctl stop keepalived
2025-01-27 12:01:48,791 INFO     Service keepalived stop
2025-01-27 12:01:48,791 INFO     Executing: mount
2025-01-27 12:01:55,495 INFO     update_config.py :: Processing incoming file => ip_associations.json.d95e85ef-a0bb-4d55-9cbd-dc40ff5f089f
2025-01-27 12:01:55,495 INFO     Processing JSON file ip_associations.json.d95e85ef-a0bb-4d55-9cbd-dc40ff5f089f
2025-01-27 12:01:55,495 INFO     Continuing with the processing of file '/var/cache/cloud/ip_associations.json.d95e85ef-a0bb-4d55-9cbd-dc40ff5f089f'
2025-01-27 12:01:55,496 INFO     Command of type ips received
2025-01-27 12:01:55,496 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:55,500 INFO     Executing: ip addr show dev eth1
2025-01-27 12:01:55,503 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:55,506 INFO     Address found in DataBag ==> {'add': True, 'broadcast': '169.254.x.255', 'cidr': '169.254.x.x/16', 'device': 'eth0', 'gateway': '', 'netmask': '255.255.0.0', 'network': '169.254.0.0/16', 'nic_dev_id': '0', 'nw_type': 'control', 'one_to_one_nat': False, 'public_ip': '169.254.x.x', 'size': '16', 'source_nat': False}
2025-01-27 12:01:55,506 INFO     Address 169.254.x.x/16 on device eth0 already configured
2025-01-27 12:01:55,506 INFO     Executing: ip addr show dev eth1
2025-01-27 12:01:55,510 INFO     Address found in DataBag ==> {'add': True, 'broadcast': 'x.x.x.223', 'cidr': 'x.x.x.196/27', 'device': 'eth1', 'first_i_p': True, 'gateway': 'x.x.x.222', 'is_private_gateway': False, 'mtu': '1500', 'netmask': '255.255.255.224', 'network': 'x.x.x.192/27', 'new_nic': False, 'nic_dev_id': 1, 'nw_type': 'public', 'one_to_one_nat': False, 'public_ip': 'x.x.x.196', 'size': '27', 'source_nat': True, 'vif_mac_address': '1e:00:f7:00:ff:00'}
2025-01-27 12:01:55,510 INFO     Address x.x.x.196/27 on device eth1 not configured
2025-01-27 12:01:55,510 INFO     Configuring address x.x.x.196/27 on device eth1
2025-01-27 12:01:55,511 INFO     Executing: ip addr add dev eth1 x.x.x.196/27 brd +
2025-01-27 12:01:55,515 INFO     {'add': True, 'broadcast': 'x.x.x.223', 'cidr': 'x.x.x.196/27', 'device': 'eth1', 'first_i_p': True, 'gateway': 'x.x.x.222', 'is_private_gateway': False, 'mtu': '1500', 'netmask': '255.255.255.224', 'network': 'x.x.x.192/27', 'new_nic': False, 'nic_dev_id': 1, 'nw_type': 'public', 'one_to_one_nat': False, 'public_ip': 'x.x.x.196', 'size': '27', 'source_nat': True, 'vif_mac_address': '1e:00:f7:00:ff:00'}
2025-01-27 12:01:55,515 INFO     Executing: ifconfig eth1 mtu 1500
2025-01-27 12:01:55,520 INFO     Adding route table: 101 Table_eth1 to /etc/iproute2/rt_tables if not present
2025-01-27 12:01:55,521 INFO     Executing: ip rule show
2025-01-27 12:01:55,524 INFO     Executing: ip rule show
2025-01-27 12:01:55,527 INFO     Executing: ip link show eth1 | grep ' state '
2025-01-27 12:01:55,530 INFO     Check state command => ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:55,530 INFO     Executing: ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:55,537 INFO     Route state => BACKUP
2025-01-27 12:01:55,538 INFO     Executing2: arping -c 1 -I eth1 -A -U -s x.x.x.196 x.x.x.222
2025-01-27 12:01:55,540 INFO     Adding route: dev eth1 table: Table_eth1 network: x.x.x.222 if not present
2025-01-27 12:01:55,540 INFO     Executing: ip route show default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:55,549 ERROR    Command 'ip route show default via x.x.x.222 table Table_eth1 proto static' returned non-zero exit status 2.
2025-01-27 12:01:55,549 INFO     Add default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:55,550 INFO     Executing: ip route add default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:55,553 ERROR    Command 'ip route add default via x.x.x.222 table Table_eth1 proto static' returned non-zero exit status 2.
2025-01-27 12:01:55,553 INFO     Executing: ip rule show
2025-01-27 12:01:55,556 INFO     Executing: ip rule add from x.x.x.192/27 table Table_eth1
2025-01-27 12:01:55,559 INFO     Added rule ip rule add from x.x.x.192/27 table Table_eth1 for Table_eth1
2025-01-27 12:01:55,559 INFO     Adding route: dev eth1 table: Table_eth1 network: x.x.x.192/27 if not present
2025-01-27 12:01:55,559 INFO     Executing: ip route show  x.x.x.192/27 table Table_eth1 proto static
2025-01-27 12:01:55,563 INFO     Add throw x.x.x.192/27 table Table_eth1 proto static
2025-01-27 12:01:55,563 INFO     Executing: ip route add throw x.x.x.192/27 table Table_eth1 proto static
2025-01-27 12:01:55,566 INFO     Executing: sudo ip route flush cache
2025-01-27 12:01:55,575 INFO     Checking if default ipv4 route is present
2025-01-27 12:01:55,575 INFO     Executing: ip -4 route list 0/0
2025-01-27 12:01:55,578 WARNING  No default route found!
2025-01-27 12:01:55,578 INFO     Adding default route
2025-01-27 12:01:55,578 INFO     Executing: ip route show default via x.x.x.222
2025-01-27 12:01:55,581 INFO     Add default via x.x.x.222
2025-01-27 12:01:55,581 INFO     Executing: ip route add default via x.x.x.222
2025-01-27 12:01:55,583 ERROR    Command 'ip route add default via x.x.x.222' returned non-zero exit status 2.
2025-01-27 12:01:55,584 WARNING  Unable to find and process databag for file: ip_associations.json.d95e85ef-a0bb-4d55-9cbd-dc40ff5f089f, for json type=ip_associations
2025-01-27 12:01:55,584 INFO     Bringing public interface eth1 down
2025-01-27 12:01:55,584 INFO     Executing: ip link set eth1 down
2025-01-27 12:01:55,587 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:55,599 INFO     Service conntrackd stop
2025-01-27 12:01:55,599 INFO     Executing: systemctl stop ipsec
2025-01-27 12:01:55,609 INFO     Service ipsec stop
2025-01-27 12:01:55,610 INFO     Executing: systemctl stop xl2tpd
2025-01-27 12:01:55,622 INFO     Service xl2tpd stop
2025-01-27 12:01:55,622 INFO     Executing: systemctl stop dnsmasq
2025-01-27 12:01:55,634 INFO     Service dnsmasq stop
2025-01-27 12:01:55,635 INFO     Executing: ip link show eth1 | grep ' state '
2025-01-27 12:01:55,639 INFO     Check state command => ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:55,639 INFO     Executing: ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:55,645 INFO     Route state => BACKUP
2025-01-27 12:01:55,645 INFO     Router switched to backup mode
2025-01-27 12:01:55,645 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:55,657 INFO     Service conntrackd stop
2025-01-27 12:01:55,657 INFO     Executing: systemctl stop keepalived
2025-01-27 12:01:55,666 INFO     Service keepalived stop
2025-01-27 12:01:55,666 INFO     Executing: mount
2025-01-27 12:01:56,336 INFO     update_config.py :: Processing incoming file => monitor_service.json.c524434e-ea31-4c51-951b-dc81357938ad
2025-01-27 12:01:56,336 INFO     Processing JSON file monitor_service.json.c524434e-ea31-4c51-951b-dc81357938ad
2025-01-27 12:01:56,336 INFO     Continuing with the processing of file '/var/cache/cloud/monitor_service.json.c524434e-ea31-4c51-951b-dc81357938ad'
2025-01-27 12:01:56,409 INFO     Command of type monitorservice received
2025-01-27 12:01:56,410 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:56,414 INFO     Executing: ip addr show dev eth1
2025-01-27 12:01:56,417 INFO     Executing: ip addr show dev eth0
2025-01-27 12:01:56,428 INFO     Address found in DataBag ==> {'add': True, 'broadcast': '169.254.x.255', 'cidr': '169.254.x.x/16', 'device': 'eth0', 'gateway': '', 'netmask': '255.255.0.0', 'network': '169.254.0.0/16', 'nic_dev_id': '0', 'nw_type': 'control', 'one_to_one_nat': False, 'public_ip': '169.254.x.x', 'size': '16', 'source_nat': False}
2025-01-27 12:01:56,428 INFO     Address 169.254.x.x/16 on device eth0 already configured
2025-01-27 12:01:56,428 INFO     Executing: ip addr show dev eth1
2025-01-27 12:01:56,431 INFO     Address found in DataBag ==> {'add': True, 'broadcast': 'x.x.x.223', 'cidr': 'x.x.x.196/27', 'device': 'eth1', 'first_i_p': True, 'gateway': 'x.x.x.222', 'is_private_gateway': False, 'mtu': '1500', 'netmask': '255.255.255.224', 'network': 'x.x.x.192/27', 'new_nic': False, 'nic_dev_id': 1, 'nw_type': 'public', 'one_to_one_nat': False, 'public_ip': 'x.x.x.196', 'size': '27', 'source_nat': True, 'vif_mac_address': '1e:00:f7:00:ff:00'}
2025-01-27 12:01:56,431 INFO     Address x.x.x.196/27 on device eth1 already configured
2025-01-27 12:01:56,431 INFO     Adding route table: 101 Table_eth1 to /etc/iproute2/rt_tables if not present
2025-01-27 12:01:56,431 INFO     Executing: ip rule show
2025-01-27 12:01:56,435 INFO     Executing: ip rule show
2025-01-27 12:01:56,439 INFO     Executing: ip link show eth1 | grep ' state '
2025-01-27 12:01:56,442 INFO     Check state command => ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:56,442 INFO     Executing: ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:56,448 INFO     Route state => BACKUP
2025-01-27 12:01:56,448 INFO     Executing2: arping -c 1 -I eth1 -A -U -s x.x.x.196 x.x.x.222
2025-01-27 12:01:56,449 INFO     Adding route: dev eth1 table: Table_eth1 network: x.x.x.222 if not present
2025-01-27 12:01:56,449 INFO     Executing: ip route show default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:56,452 INFO     Add default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:56,452 INFO     Executing: ip route add default via x.x.x.222 table Table_eth1 proto static
2025-01-27 12:01:56,456 ERROR    Command 'ip route add default via x.x.x.222 table Table_eth1 proto static' returned non-zero exit status 2.
2025-01-27 12:01:56,456 INFO     Executing: ip rule show
2025-01-27 12:01:56,460 INFO     Adding route: dev eth1 table: Table_eth1 network: x.x.x.192/27 if not present
2025-01-27 12:01:56,460 INFO     Executing: ip route show  x.x.x.192/27 table Table_eth1 proto static
2025-01-27 12:01:56,463 INFO     Executing: sudo ip route flush cache
2025-01-27 12:01:56,471 INFO     Checking if default ipv4 route is present
2025-01-27 12:01:56,471 INFO     Executing: ip -4 route list 0/0
2025-01-27 12:01:56,474 WARNING  No default route found!
2025-01-27 12:01:56,474 INFO     Adding default route
2025-01-27 12:01:56,474 INFO     Executing: ip route show default via x.x.x.222
2025-01-27 12:01:56,478 INFO     Add default via x.x.x.222
2025-01-27 12:01:56,478 INFO     Executing: ip route add default via x.x.x.222
2025-01-27 12:01:56,481 ERROR    Command 'ip route add default via x.x.x.222' returned non-zero exit status 2.
2025-01-27 12:01:56,482 INFO     Wrote edited file /etc/monitor.conf
2025-01-27 12:01:56,482 INFO     Updated file in-cache configuration
2025-01-27 12:01:56,483 INFO     Wrote edited file /etc/cron.d/process
2025-01-27 12:01:56,483 INFO     Updated file in-cache configuration
2025-01-27 12:01:56,483 INFO     Bringing public interface eth1 down
2025-01-27 12:01:56,483 INFO     Executing: ip link set eth1 down
2025-01-27 12:01:56,486 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:56,497 INFO     Service conntrackd stop
2025-01-27 12:01:56,497 INFO     Executing: systemctl stop ipsec
2025-01-27 12:01:56,509 INFO     Service ipsec stop
2025-01-27 12:01:56,509 INFO     Executing: systemctl stop xl2tpd
2025-01-27 12:01:56,524 INFO     Service xl2tpd stop
2025-01-27 12:01:56,524 INFO     Executing: systemctl stop dnsmasq
2025-01-27 12:01:56,535 INFO     Service dnsmasq stop
2025-01-27 12:01:56,535 INFO     Executing: ip link show eth1 | grep ' state '
2025-01-27 12:01:56,539 INFO     Check state command => ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:56,539 INFO     Executing: ip addr show dev eth1 | grep state | awk '{print $9;}' | xargs bash -c 'if [ $0 == "UP" ]; then echo "PRIMARY"; else echo "BACKUP"; fi'
2025-01-27 12:01:56,546 INFO     Route state => BACKUP
2025-01-27 12:01:56,546 INFO     Router switched to backup mode
2025-01-27 12:01:56,546 INFO     Executing: systemctl stop conntrackd
2025-01-27 12:01:56,557 INFO     Service conntrackd stop
2025-01-27 12:01:56,557 INFO     Executing: systemctl stop keepalived
2025-01-27 12:01:56,566 INFO     Service keepalived stop

@weizhouapache
Copy link
Member

@Rid
what's the state of vm and network (vpc tier) when VPC VRs become FAULT state ?

in my testing, when vm is Runing, the VRs become PRIMARY/BACKUP which look fine.

@Rid
Copy link
Author

Rid commented Jan 27, 2025

@weizhouapache both VMs are "Running", VPC state is "Enabled", both VPC routers are in "FAULT" Redundant state.

@weizhouapache
Copy link
Member

@weizhouapache both VMs are "Running", VPC state is "Enabled", both VPC routers are in "FAULT" Redundant state.

the VPC VRs will be brought to PRIMARY/BACKUP by keepalived.

can you check the status of keepalived services and its configuration /etc/keepalived/keepalived.conf ?

what's the state of the network/vpctier ? does eth2 exist in the VPC VRs ?

@Rid
Copy link
Author

Rid commented Jan 27, 2025

That's part of the issue, there's no /etc/keepalived/keepalived.conf created.

○ keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Mon 2025-01-27 12:03:01 UTC; 1h 55min ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org

Jan 27 12:03:01 r-30-VM systemd[1]: keepalived.service - Keepalive Daemon (LVS and VRRP) was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/keepalived/keepalived.conf).

There's no eth2:

# ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:00:a9:fe:17:8b brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 169.254.23.139/16 brd 169.254.255.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 1e:00:f7:00:ff:00 brd ff:ff:ff:ff:ff:ff
    altname enp0s8
    altname ens8
    inet x.x.x.x/27 brd x.x.x.x scope global eth1
       valid_lft forever preferred_lft forever

@weizhouapache
Copy link
Member

That's part of the issue, there's no /etc/keepalived/keepalived.conf created.
There's no eth2:

it is the main issue.

Have you create a vpc tier ? If yes, what's the state of the vpc tier ?

@Rid
Copy link
Author

Rid commented Jan 27, 2025

If I create a vpc tier (guest network), it stays in the "Allocated" state. No virtual routers are listed.

If I try to create an instance I get the error:

2025-01-27 14:13:25,996 DEBUG [c.c.n.e.VpcVirtualRouterElement] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Router VM instance {"id":29,"instanceName":"r-29-VM","type":"DomainRouter","uuid":"11f0259c-87ef-4ebd-9773-1d0bc62f45e0"} is not a part the network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72}
2025-01-27 14:13:25,996 DEBUG [c.c.n.e.VpcVirtualRouterElement] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Router VM instance {"id":30,"instanceName":"r-30-VM","type":"DomainRouter","uuid":"0aa9440d-2df3-421e-bde6-2b1a0cb8a3d5"} is not a part the network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72}
2025-01-27 14:13:25,996 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Network id=205 is shutdown successfully, cleaning up corresponding resources now.
2025-01-27 14:13:26,008 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Lock is released for network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72} as a part of network shutdown
2025-01-27 14:13:26,012 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Lock is released for network id 205 as a part of network implement
2025-01-27 14:13:26,012 WARN  [c.c.v.ClusteredVirtualMachineManagerImpl] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Insufficient capacity com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to allocate vnet as a part of network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72} implement Scope=interface com.cloud.dc.DataCenter; id=2
	at com.cloud.network.guru.GuestNetworkGuru.allocateVnet(GuestNetworkGuru.java:355)
	at com.cloud.network.guru.GuestNetworkGuru.implement(GuestNetworkGuru.java:384)
	at com.cloud.network.guru.ExternalGuestNetworkGuru.implement(ExternalGuestNetworkGuru.java:159)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:1565)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:1427)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepare(NetworkOrchestrator.java:2129)
	at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1262)
	at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5467)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:106)
	at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5591)
	at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:99)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:652)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:600)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

@weizhouapache
Copy link
Member

If I create a vpc tier (guest network), it stays in the "Allocated" state. No virtual routers are listed.

If I try to create an instance I get the error:

2025-01-27 14:13:25,996 DEBUG [c.c.n.e.VpcVirtualRouterElement] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Router VM instance {"id":29,"instanceName":"r-29-VM","type":"DomainRouter","uuid":"11f0259c-87ef-4ebd-9773-1d0bc62f45e0"} is not a part the network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72}
2025-01-27 14:13:25,996 DEBUG [c.c.n.e.VpcVirtualRouterElement] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Router VM instance {"id":30,"instanceName":"r-30-VM","type":"DomainRouter","uuid":"0aa9440d-2df3-421e-bde6-2b1a0cb8a3d5"} is not a part the network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72}
2025-01-27 14:13:25,996 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Network id=205 is shutdown successfully, cleaning up corresponding resources now.
2025-01-27 14:13:26,008 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Lock is released for network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72} as a part of network shutdown
2025-01-27 14:13:26,012 DEBUG [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Lock is released for network id 205 as a part of network implement
2025-01-27 14:13:26,012 WARN  [c.c.v.ClusteredVirtualMachineManagerImpl] (Work-Job-Executor-12:[ctx-c397d200, job-782/job-792, ctx-a5ef6537]) (logid:afefbbec) Insufficient capacity com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to allocate vnet as a part of network Network {"id": 205, "name": "Kubernetes network", "uuid": "dc5288c7-dace-466e-9c61-205d5d072199", "networkofferingid": 72} implement Scope=interface com.cloud.dc.DataCenter; id=2
	at com.cloud.network.guru.GuestNetworkGuru.allocateVnet(GuestNetworkGuru.java:355)
	at com.cloud.network.guru.GuestNetworkGuru.implement(GuestNetworkGuru.java:384)
	at com.cloud.network.guru.ExternalGuestNetworkGuru.implement(ExternalGuestNetworkGuru.java:159)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:1565)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:1427)
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepare(NetworkOrchestrator.java:2129)
	at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1262)
	at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5467)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:106)
	at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5591)
	at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:99)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:652)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:600)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

it looks the vpc tier cannot be implemented due to insufficient vlan.

can you check zone->physical networks -> choose the physical network with Guest traffic type -> Update physical network to see if VLAN/VNI is set ? are all vlans in use ?

@Rid
Copy link
Author

Rid commented Jan 27, 2025

Thanks @weizhouapache, it turns out that the initial wizard configuration didn't complete successfully as it queried too many resources in the browser, we setup the network configuration manually and didn't put a VLAN range.

Once we added the VLAN range and was able to add a compute instance, it added eth2 and eth1 came up correctly.

I wasn't aware a compute instance was necessary to bring up the redundant VPC routers, perhaps that should be added to the documentation?

@weizhouapache
Copy link
Member

Thanks @weizhouapache, it turns out that the initial wizard configuration didn't complete successfully as it queried too many resources in the browser, we setup the network configuration manually and didn't put a VLAN range.

Once we added the VLAN range and was able to add a compute instance, it added eth2 and eth1 came up correctly.

good.

I wasn't aware a compute instance was necessary to bring up the redundant VPC routers, perhaps that should be added to the documentation?

If vm instance is created but not running, the VPC VRs should be BACKUP/BACKUP or UNKNOWN/UNKNOWN
I think we can consider FAULT/FAULT as a minor bug.

@DaanHoogland DaanHoogland added this to the 4.20.1 milestone Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants