-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Click for Background Context
Today's Gossip -> DB stack
As seen in the diagram, today we have the following flow for gossip messages:
Gossiper
- An
lnwire
(LN protocol) message comes in from either our peers or from us into ourgossiper
- The
gossiper
does some read calls via thegraph.Builder
to theDB
to ensure basic DoS protection (ie should we bother continuing with this announcement at all?) - The
gossiper
then does protocol level checks on the gossip such as: is the signature valid? (and other checks mentioned in bolt 7). NB: today, thegossiper
does not do funding transaction validation when it gets achannel_announcement
as this is currently done further down the stack. This should be changed. - Once validated, the
gossiper
converts thelnwire
message to our internalmodels
representation (** this is probably the wrong place for this conversion) and calls thegraph.Builder
's Add/Update methods.
graph.Builder
- The Builder does a few checks of the incoming messages before passing them on to the
ChannelGraph
.
-node_announcement
: again checks freshness (** repeats the check that the gossiper already did...)
-channel_update
: checks that we do know about the channel & that the update is fresh (** again repeating what was done in gossiper)
-channel_announcement
: does funding tx validation (** wrong place!!) along with checks like ensuring we dont already know of this channel (again already done in gossiper). - the Builder also does a couple of other maintenance tasks:
1) it is responsible for pruning closed channels & marking channels as zombies
2) it provides a topology change subscription since it knows when we go and actually persist a new update & need to notify clients.
ChannelGraph
-
This is our CRUD layer. It has a direct connection to a backing kvdb.Backend & does all our persistence logic.
-
It also constructs and maintains the
graphCache
which, is a cache that holds the info required by the router. -
many parts of the code-base currently have direct access to the
graphdb.ChannelGraph
.
Remote Graph Vision
If many LND nodes are owned by the same entity, there is really no need for them all to sync their own gossip on init. A node can instead persist just its own gossip updates but instead rely on a remote graph source for populating the rest of its graphCache
for the purposes of pathfinding.
Here is a diagram showing an example configuration: One Graph Source
that 2 clients depend on:
To get to this vision, however, there are a few things that we need to consider and change to get from the current architecture to this ideal architecture:
- In the remote graph client set-up, the
graphCache
will be populated both via our local updates along with updates from the remote. So it makes sense to have this cache lifted out of the CRUD layer. This will also lend itself to the gossip v1.75 changes (see later on). So steps here include:
- rename the currentChannelGraph
to be a more descriptiveKVDBStore
(orV1Store
(see gossip updates later).
- Create a newChannelGraph
struct which is responsible for creating thegraphCache
.
- The KVDBStore the only defines CRUD logic. Which is a cleaner separation anyways. - All read-calls should go through the new
graphdb.ChannelGraph
instead of going directly to the CRUD layer. This is needed so that these calls can correctly query the graph-cache/remote graph where needed. - Topology subscriptions/management needs to move out of the
graph.Builder
and into the newgraphdb.ChannelGraph
since this is where management of the remote source will happen and so if we want our topology subscription clients to be up to date with changes in the remote source (and not only updates from our own node), then it makes sense for this to be done in theChannelGraph
. - All calls to the
ChannelGraph
, both reads and writes, should take acontext.Context
for 2 reasons:
1) to prepare for any remote gRPC calls which need a context
2) to prepare for SQL DB backend which will also take a context.
Gossip V175 support vision
A couple of things to keep in mind for the gossip 175 support:
- we will be supporting 2 disjoint protocols. So there will be 2 distinct DBs and we should not need to check things about a given node/channel across the 2 protocols (except for a few edge cases).
- The 2 separate DBs (and ie 2 separate CRUD layers) is another good reason for getting the
graphCache
out of the current V1Store layer and into the newChannelGraph
layer. - Given that we will have 2 DBs, we will need a layer to Mux things: notice that the
V1Store
CRUD will deal with*models.Channel1/Node1/Update1
struct types andV2Store
CRUD will deal with*models.Channel2/Node2/Update2
. So ourChannelGraph
layer will also deal with providing Read interface methods to the rest of the code base via newmodels.Channel/Node/Update
interfaces. This is good to keep in mind from the start since there will be some time when some read calls toChannelGraph
are just forwarded directly to the current CRUD layer. So it might be confusing as to why we have that extra layer - but the reason is to allow for this future where we want to mux results.
Given the detailed vision and context re other associated projects outlined above, let's narrow down on the initial goals of this ticket that is just focused on Graph Query Abstraction & general clean-up and separation of concerns.
This is what we are aiming for:

Here are the initial high level steps to completion. Along the way during review, small things will probably be added that are worth addressing.
Steps to completion (not necessarily in order)
- Move funding transaction verification logic from the
graph.Builder
to thegossiper
(discovery+graph: move funding tx validation to the gossiper #9478) - For each sub-system in LND that currently has access to a direct pointer to the DB, let them define interfaces instead and let those interface methods be implemented by the new
ChannelGraph
.- autopilot server: graph+autopilot: remove
autopilot
access to rawgraphdb.ChannelGraph
#9480 - invoices rpc server: invoicesrpc: remove direct access to ChannelGraph pointer #9516
- graph session/pathfinding: graph+routing: refactor to remove
graphsession
#9513
- autopilot server: graph+autopilot: remove
- Move the
graphCache
out of the CRUD layer. - Move the topology management/subscriptions from the
Builder
to the newChannelGraph
- let the gossiper only deal with
lnwire
types. TheBuilder
should then be responsible for converting to our internalmodels
types. - For each write and read method exposed via the
ChannelGraph
, update them to take a context. This will involve ensuring that any calling sub-systems actually have a context to thread through. So quite a few PRs will be dedicated to just threading contexts through.
Additional goals added on during review:
- Address this comment: once a context is passed through the Builder's AddEdge/AddNode/UpdateEdge methods, we can thread these contexts through rather & ensure that persistence happens before exiting the call.
- Address this comment: just use the existing kvdb
View
function instead of manually creating and committing a DB transaction.
Associated Issues:
- Completion of this issue, will also close this one: channeldb+lnd: eliminate direct graph access via *channeldb.DB pointer, enable custom graph db implementations #6294
- Lots of overlap (might even replace) this issue: [epic]: ChannelDB, Graph, Gossiper and Router separation #8833