Skip to content

sync route cache on link state changes #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

KanjiMonster
Copy link
Contributor

@KanjiMonster KanjiMonster commented Apr 8, 2025

This is retry of #469 with added fixes.

Work around missing route removal notifications from kernel on link
state changes with routes targeting nexthops on that link by forcing a
cache resync on link state change events to catch removed/readded
routes.

When a link goes down, the kernel will delete all nexthops on that link
without sending out netlink notifications.

When a nexthop gets deleted, the kernel will delete all IPv4 routes with
the nexthop as destination without sending out netlink notifications.

These two combined means that when using nexthop objects, like FRR does
by default, deletion of routes on link loss are never seen by libnl, and
consequently never seen by baseboxd.

This causes flow table entries for routes to persist even though they
were deleted in the kernel.

For IPv6 it seems the routes are only hidden, and gets readded on link
up. But this re-add does not trigger a kernel notification either, so
in case we disabled the route on link down, we will miss the readd on
link up.

To work around this, use nl_cache_resync_v2() to trigger a resync of all
routes when a link goes up or down.

Changes from #469:

  • ignore updates on link local routes
  • sync on both up -> down and down -> up transitions

Syncing routes on link down may trigger updates on link local routes due
to route flag changes, but baseboxd is not prepared to handle this, and
causes all l3 neighbors on that link to be treated as unroutable.

IPv6 routes with nexthops seem to behave differently than IPv4 routes,
and seem to reappear silently on link up again. Since there is no
notification (again) from the kernel, we miss the reappearance of the
route.

Due to the amount of breakage, lets's revert this for now until we have
addresses these two issues.

This reverts commit 9a58b5d.

Signed-off-by: Jonas Gorski <[email protected]>
@KanjiMonster KanjiMonster changed the title Jogo sync routes take two sync route cache on link state changes Apr 8, 2025
If we get an update on a route, we first "add" the route again, then
"delete" the old version in update route mode which leaves the flow, but
updates everything else.

This works fine for routes with ip nexthops, but for link local rotes,
we would first mark all l3 neighbors as routable, then on handling the
delete, mark them all unroutable, breaking any routes via nexthops on
that interface.

So ignore updates for link local routes, since we cannot do
anything meaningful with the update anyway.

Signed-off-by: Jonas Gorski <[email protected]>
Work around missing route removal notifications from kernel on link
state changes with routes targeting nexthops on that link by forcing a
cache resync on link state change events to catch removed/readded
routes.

When a link goes down, the kernel will delete all nexthops on that link
without sending out netlink notifications.

When a nexthop gets deleted, the kernel will delete all IPv4 routes with
the nexthop as destination without sending out netlink notifications.

These two combined means that when using nexthop objects, like FRR does
by default, deletion of routes on link loss are never seen by libnl, and
consequently never seen by baseboxd.

This causes flow table entries for routes to persist even though they
were deleted in the kernel.

For IPv6 it seems the routes are only hidden, and gets readded on link
up. But this re-add does not trigger a kernel notification either, so
in case we disabled the route on link down, we will miss the readd on
link up.

To work around this, use nl_cache_resync_v2() to trigger a resync of all
routes when a link goes up or down.

Signed-off-by: Jonas Gorski <[email protected]>
@KanjiMonster KanjiMonster force-pushed the jogo_sync_routes_take_two branch from d92e7fc to 7ef5c29 Compare April 8, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant