Skip to content

🐛 BUG: Unable to reconnect after server crash #1125

@Cyberes

Description

@Cyberes

What version of nebula are you using? (nebula -version)

1.8.2

What operating system are you using?

Linux

Describe the Bug

One of my servers has been unstable and will sometimes become unresponsive, requiring a hard reset (this is host A, 172.0.3.109). When it comes back up, it's unable to establish a connection to my node that is a nginx reverse proxy (this is host B, 172.0.2.120). If I manually restart the Nebula service on host A, it fixes the issue and host A can connect to host B.

Logs from affected hosts

Lighthouse
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:31 cloud-proxy nebula[689]: time="2024-04-22T12:13:31-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:32 cloud-proxy nebula[689]: time="2024-04-22T12:13:32-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:33 cloud-proxy nebula[689]: time="2024-04-22T12:13:33-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:33 cloud-proxy nebula[689]: time="2024-04-22T12:13:33-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:34 cloud-proxy nebula[689]: time="2024-04-22T12:13:34-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Apr 22 12:13:34 cloud-proxy nebula[689]: time="2024-04-22T12:13:34-00:00" level=debug msg="Recv error sent" index=0 udpAddr="<host's external IP>:5694"
Host A
Apr 22 12:12:25 host-a nebula[811]: time="2024-04-22T12:12:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Tunnel status" certName=host-b localIndex=804429705 remoteIndex=3094684430 tunnelCheck="map[method:active state:dead]" v>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:26 host-a nebula[811]: time="2024-04-22T12:12:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:27 host-a nebula[811]: time="2024-04-22T12:12:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2480024129 rela>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:28 host-a nebula[811]: time="2024-04-22T12:12:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2480024129 remoteIndex=108412485 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=817659693 relay>
Apr 22 12:12:31 host-a nebula[811]: time="2024-04-22T12:12:31-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:23 host-a nebula[811]: time="2024-04-22T12:10:23-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:24 host-a nebula[811]: time="2024-04-22T12:10:24-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Tunnel status" certName=host-b localIndex=1362300847 remoteIndex=2069131475 tunnelCheck="map[method:active state:dead]" >
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:25 host-a nebula[811]: time="2024-04-22T12:10:25-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:26 host-a nebula[811]: time="2024-04-22T12:10:26-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:27 host-a nebula[811]: time="2024-04-22T12:10:27-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="sendNoMetrics failed to find HostInfo" certName=host-b error="unable to find host with relay" localIndex=2903481744 rela>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:28 host-a nebula[811]: time="2024-04-22T12:10:28-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Taking new handshake" certName=host-b localIndex=2903481744 remoteIndex=3824716788 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message received" certName=host-b fingerprint=...>
Apr 22 12:10:30 host-a nebula[811]: time="2024-04-22T12:10:30-00:00" level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" relay=172.0.0.2 vpnIp=172.0.2.120

This pattern repeats.

Host B
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Attempt to relay through hosts" localIndex=1813799298 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1>
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:25 host-b nebula[111]: time="2024-04-22T12:06:25-00:00" level=info msg="Send handshake via relay" localIndex=1813799298 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:26 host-b nebula[111]: time="2024-04-22T12:06:26-00:00" level=info msg="Handshake timed out" durationNs=6152868332 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1813799298 localIndex=1813799>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3495069400 localIndex=3495069400 remoteIndex=0 u>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Attempt to relay through hosts" localIndex=3495069400 relays="[172.0.0.2 172.0.0.3 172.0.0.2 172.0.0.3]" remoteIndex=0 vpnIp=1>
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.2 remoteIndex=0 vpnIp=172.0.3.109
Apr 22 12:06:30 host-b nebula[111]: time="2024-04-22T12:06:30-00:00" level=info msg="Send handshake via relay" localIndex=3495069400 relay=172.0.0.3 remoteIndex=0 vpnIp=172.0.3.109

This pattern repeats

Config files from affected hosts

Host A
firewall:
  conntrack:
    default_timeout:    10m
    max_connections:    100000
    tcp_timeout:        12m
    udp_timeout:        3m
  inbound:
  - host:               any
    port:               any
    proto:              icmp
  - host:               any
    port:               any
    proto:              any
  outbound:
  - host:               any
    port:               any
    proto:              any
lighthouse:
  am_lighthouse:        false
  hosts:
  - 172.0.0.2
  - 172.0.0.3
  interval:             60
listen:
  batch:                128
  host:                 0.0.0.0
  port:                 0
  read_buffer:          10485760
  write_buffer:         10485760
logging:
  format:               text
  level:                info
pki:
  ca:                   |
  cert:                 |
  key:                  |
punchy:
  delay:                1s
  punch:                true
  punch_back:           true
  respond:              true
relay:
  am_relay:             false
  relays:
  - 172.0.0.2
  - 172.0.0.3
  use_relays:           true
static_host_map:
  172.0.0.2:
  - ....:4242
  172.0.0.3:
  - ....:4242
tun:
  dev:                  nebula1
  disabled:             false
  drop_local_broadcast: false
  drop_multicast:       false
  routes:               null
  tx_queue:             5000
  unsafe_routes:        null
Host B
firewall:
  conntrack:
    default_timeout:    10m
    max_connections:    100000
    tcp_timeout:        12m
    udp_timeout:        3m
  inbound:
  - host:               any
    port:               any
    proto:              icmp
  - host:               any
    port:               any
    proto:              any
  outbound:
  - host:               any
    port:               any
    proto:              any
lighthouse:
  am_lighthouse:        false
  hosts:
  - 172.0.0.2
  - 172.0.0.3
  interval:             60
listen:
  batch:                128
  host:                 0.0.0.0
  port:                 0
  read_buffer:          10485760
  write_buffer:         10485760
logging:
  format:               text
  level:                info
pki:
  ca:                   |
  cert:                 |
  key:                  |
preferred_ranges:
- 10.0.0.0/24
punchy:
  delay:                1s
  punch:                true
  punch_back:           true
  respond:              true
relay:
  am_relay:             false
  relays:
  - 172.0.0.2
  - 172.0.0.3
  use_relays:           true
static_host_map:
  172.0.0.2:
  - ....:4242
  172.0.0.3:
  - ....:4242
tun:
  dev:                  nebula1
  disabled:             false
  drop_local_broadcast: false
  drop_multicast:       false
  routes:               null
  tx_queue:             5000
  unsafe_routes:        null
Lighthouse
firewall:
  conntrack:
    default_timeout: 10m
    max_connections: 100000
    tcp_timeout: 12m
    udp_timeout: 3m
  inbound:
  - host: any
    port: any
    proto: icmp
  - host: any
    port: any
    proto: any
  outbound:
  - host: any
    port: any
    proto: any
lighthouse:
  am_lighthouse: true
  interval: 60
listen:
  batch: 128
  host: 0.0.0.0
  port: 4242
logging:
  format: text
  level: debug
pki:
  ca: |
  cert: |
  key: |
punchy:
  delay: 1s
  punch: true
  punch_back: true
  respond: true
relay:
  am_relay: true
  use_relays: false
stats:
  host: ...:2003
  interval: 10s
  lighthouse_metrics: true
  message_metrics: true
  prefix: nebula.cloud_proxy
  protocol: tcp
  type: graphite
tun:
  dev: nebula1
  disabled: false
  drop_local_broadcast: false
  drop_multicast: false
  routes: null
  tx_queue: 5000
  unsafe_routes: null

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions