-
Notifications
You must be signed in to change notification settings - Fork 567
Create Live Snapshot without shutting down node #2816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cmd/nitro/nitro.go
Outdated
@@ -641,6 +643,11 @@ func mainImpl() int { | |||
deferFuncs = []func(){func() { currentNode.StopAndWait() }} | |||
} | |||
|
|||
// Live db snapshot creation is only supported on archive nodes | |||
if nodeConfig.Execution.Caching.Archive { | |||
go liveDBSnapshotter(ctx, chainDb, arbDb, execNode.ExecEngine.CreateBlocksMutex(), func() string { return liveNodeConfig.Get().SnapshotDir }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that:
- we should use stopwaiter pattern for the liveDBSnapshotter
- it might be nice to have a config option to disable (not start) the snapshotter, eg. if we are running sequencer, to be extra safe
- we should be able to support also full nodes (non archive), I am describing it more in the comment for
liveDBSnapshotter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want a config option to enable the snapshotter, off by default.
cmd/nitro/nitro.go
Outdated
continue | ||
} | ||
|
||
createBlocksMutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we could rearrange order of things a bit and patch geth a bit we could also support a non archive node (what I believe would be main use case for the db snapshoting).
- Instead of triggering the snapshot here, we could schedule a snapshot after next block is created. We could call e.g.
execNode.ScheduleDBSnapshot
, so we wouldn't need to have access tocreateBlockMutex
or any internals of ExecutionNode (that probably will be especially important for execution split). - In
ExecutionEngine.appendBlock
if a snapshot was scheduled, we could trigger the snapshot afters.bc.WriteBlockAndSetHeadWithTime
.
To support full nodes (non archive) we need to make sure that the state for the block written with WriteBlockAndSetHeadWithTime
is committed to disk. To do that we need to force commit the state. It could be done e.g. with a ForceTriedbCommitHook
hook that I added in snap sync draft: https://github.com/OffchainLabs/go-ethereum/pull/280/files#diff-53d5f4b8a536ec2a8c8c92bf70b8268f1d77ad77e9f316e6f68a2bcae5303215
The hook would be set to a function created in gethexec scope and that would have access to ExecutionEngine, something like:
hook := func() bool {
return execEngine.shouldForceCommitState()
}
func (e *ExecutionEngine) shouldForceCommitState() {
return e.forceCommitState
}
func (e *ExecutionEngine) ScheduleDBSnapshot() {
e.dbSnapshotScheduled.Store(true)
}
func (e *ExecutionEngine) appendBlock() error {
...
snapshotScheduled := e.dbSnapshotScheduled.Load()
if snapshotScheduled {
e.forceCommitState = true
}
status, err := s.bc.WriteBlockAndSetHeadWithTime(...)
if err != nil {
return err
}
...
if snapshotScheduled {
e.forceCommitState = false
chainDb.CreateDBSnapshot(snapshotDir)
}
...
}
That setting of the hook can be done similarly as in SnapHelper PR draft: https://github.com/OffchainLabs/nitro/pull/2122/files#diff-19d6494fe5ff01c95bfdd1e4af6d31d75207d21743af80f57f0cf93848a32e3e
Having written that, I am no longer sure if that's that straightforward as I thought when starting this comment 😓 but should be doable :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to go that way, I can split out simplified ForceTriedbCommitHook
from my draft PRs, so it can be merged in earlier and used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this seems like a great idea and will see if it works out for snapshotting a fullnode.
Sorry for the delay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think of it now, as you're already acquiring block creation mutex, you should be able to commit state similarly to how ExecutionEngine.Maintenance
does - it calls newly added go-ethereum/core.BlockChain.FlushTrieDB
FlushTrieDB
doesn't commit state root for head block and snapshot root, but for the oldest block in BlockChain.triegc
(next one to be garbage collected). The node started from such a snapshot might be able to recover, but that would need a little bit more attention.
What might be best option, is to add new method to core.BlockChain
that will persist the same roots as when the blockchain is stopped and "Ensure that the entirety of the state snapshot is journaled to disk" by calling bc.snaps.Journal
: https://github.com/OffchainLabs/go-ethereum/blob/e6c8bea35d519098cf7cc9c0d3765aef9ab72cbb/core/blockchain.go#L1202-L1257
Then the live snapshot should have a database state similar to the one after stopping the blockchain, but without actually doing so :)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, Im currently testing your last idea out i.e to mimic what the blockchain during a stop without clearing the underlying data that a stop would do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo you don't have to worry about the side effects of committing tries from triedb (hashdb) to disk:
- the recently written trie nodes will be cached in cleans cache
- the hashdb referencing/dereferencing mechanism should work just fine with the commits
the main area to be careful and explore seem to be the operations around snapshots e.g. bc.snaps.Journal
as the method is meant to be used during shutdown, according to the comment:
// Journal commits an entire diff hierarchy to disk into a single journal entry.
// This is meant to be used during shutdown to persist the snapshot without
// flattening everything down (bad for reorgs).
//
// The method returns the root hash of the base layer that needs to be persisted
// to disk as a trie too to allow continuing any pending generation op.
func (t *Tree) Journal(root common.Hash) (common.Hash, error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach worked for database sizes from thousand blocks to about 3.5 million! Marking the PR as ready for review and for further testing with up-to-date arb1 and arbsepolia databases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that I think might need looking more into is the unclean shutdown log line e.g. from arb1 started from the snapshot:
WARN [04-14|13:31:56.376] Unclean shutdown detected booted=2025-04-14T13:28:39+0530 age=3m17s
I think that might be because on normal shutdown latest unclean shutdown marker is removed from db and because we don't stop the node the copy has the unclean shutdown marker. The boot time from the marker seems to match the time when the original node might have been started.
Here's how the shutdown tracker works: https://github.com/OffchainLabs/go-ethereum/blob/master/internal/shutdowncheck/shutdown_tracker.go
MarkStartup
is called on startup - checks if there are some unclean shutdown markers from previous boots and pushes current boot timestamp to unclean shutdown markers listStop
is called after node is fully stopped - pops latest timestamp from markers
We probably need to somehow rawdb.PopUncleanShutdownMarker
from the snapshot.
That might be tricky. The simplest solution would be to pop and then after live snapshot pushing back the marker. That requires a bit of thought what happens if node crashes when doing the live snapshot.
My initial thinking is that if we first save everything needed to disk then pop the marker we should be o.k. unless crashing during pebble Checkpoint corrupts the db. We might need to figure out another way of cleaning up the marker if that's not safe enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very dicey, as you pointed out a node crashing during checkpointing would definitely want to have this marker so we cant pop it before snapshotting to avoid appearing in our copied database.
I created The idea is to have two Currently to allow for correct generation of snapshots i.e first chainDB and then arbDB, we've introduced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This feature is quite useful 😄
I didn't review the code throughly yet since I think we should redesign its architecture a little bit, by moving snapshot triggering API from Execution to Consensus.
Added a comment questioning that, explaining a bit of the reasoning.
We can discuss that in the team's standup or through a Slack thread too :)
// TODO: after the consensus-execution split, remove this code block and modify LiveDBSnapshotter to | ||
// directly read from sigur2. Currently execution will trigger snapshot creation of consensus side once | ||
// its own snapshotting is complete. | ||
var executionDBSnapShotTrigger, consensusDBSnapShotTrigger chan struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
var executionDBSnapShotTrigger, consensusDBSnapShotTrigger chan struct{} | |
var executionDBSnapshotTrigger, consensusDBSnapshotTrigger chan struct{} |
@@ -531,6 +557,8 @@ func mainImpl() int { | |||
l1Client, | |||
func() *gethexec.Config { return &liveNodeConfig.Get().Execution }, | |||
liveNodeConfig.Get().Node.TransactionStreamer.SyncTillBlock, | |||
executionDBSnapShotTrigger, | |||
consensusDBSnapShotTrigger, // execution will invoke conensus's db snapshotting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today we have a circular call dependency between Consensus and Execution, but ideally we would like to get rid of that, and we are moving in that direction.
In the current model that we have Consensus calls Execution, e.g., Consensus retrieves messages from different sources and then calls Execution to generate blocks related to those messages.
We want to avoid the need of Execution calling Consensus.
This can also enable us to, in the future, have a single Consensus node controlling multiple Execution nodes.
That said, I think we should move the snapshot trigger API from Execution to Consensus, and Consensus will trigger Execution's DB snapshot.
How about that?
We already have some RPC APIs on Consensus, so we can reuse the API framework that is implemented, examples are the BlockValidatorAPI and the MaintenanceAPI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in the near future we intend to split Consensus and Execution in different processes.
So Execution communicating with Consensus directly through a channel can make the splitting a little bit less straightforward.
Today we rely on having functions on ConsensusSequencer interface in which Execution can use to call Consensus.
If we decide to keep the this snapshot trigger API on Execution we could use this interface to enable Execution -> Consensus communication here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context! But for the working snapshot to be generated we need to first snapshot execution's dbs and then consensus's db i.e blockheight in chainDB <= arbDB.
I understand your suggestion that the execution shouldn't call consensus actions, but this is merely a signal transfer to start db snapshotting. Moreover it is implemented in a way to be easily removed after the split with the idea that both consensus and execution will each have snaphotting API that will snapshot their respective dbs.
Trying to make consensus trigger execution's database creation is essentially reverse of what we currently have but moving code around to consensus side, the code that will anyway be need to be removed after the split.
If this is really a priority then I can completely decouple it so that livedbsnapshotter will return a success signal after db creation and nitro.go can handle signals entirely, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ping you so we can chat 🙂, it will be easier to give you more context, but here goes a summary.
But for the working snapshot to be generated we need to first snapshot execution's dbs and then consensus's db i.e blockheight in chainDB <= arbDB
There is a common misconception, that the procedure that coordinates an action between multiple components should be placed where the action starts to happen.
The analogy in computer networking, and distributed systems, is that the control plane doesn't need to live where the data plane is.
Nitro, in some parts, follows this colocation pattern today, which has some issues.
One example is that the procedure that triggers regular sequencing is placed where the sequencing takes place, Execution.
This PR #3111 moves away from that pattern, and moves sequencing triggering to Consensus, there are several benefits with that.
So, with this PR, sequencing triggering coordination will not be placed where sequencing takes place anymore.
That said, live snapshot coordination can be placed in Consensus, and we will still be able to snapshot Execution's DB before Consensus' DB.
We could even place it in another component, like a CLI similar to dbconv.
Moreover it is implemented in a way to be easily removed after the split with the idea that both consensus and execution will each have snaphotting API that will snapshot their respective dbs.
Trying to make consensus trigger execution's database creation is essentially reverse of what we currently have but moving code around to consensus side, the code that will anyway be need to be removed after the split.
I don't fully understand that 😕
Why code related to live snapshot will be removed after the split?
If I inferred correctly, you are saying that, only after the split, we will have an API in Consensus that will trigger a snapshot of Consensus's DB, so a human node operator will be able to first trigger Execution's DB snapshot, and after that trigger Consensus' DB snapshot, right?
I am more inclined to have an UX in which the human node operator only triggers snapshot once, and the software is responsible to coordinate snapshotting Execution and Consensus sides in the correct order.
It is simpler for human operators and less error prone too 🙂
Also, we should move forward in a way that we will not need to redo a big chunk of this feature in order to make Consensus/Execution split ready, which will happen soon 🙂 , we want this feature to be fully compliant with Consensus and Execution split from the start.
Also, we intend to have other Execution clients, such as one based on reth, but we do not intend to have multiple Consensus clients, at least in the near future.
If we keep live snapshot coordination in Execution side then we are unnecessarily increasing the effort to develop new Execution clients, which will need to implement that coordination behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to snapshot Consensus' DB we will need to hold insertionMutex, right?
After discussing with @diegoximenes regarding implementation specifics to be in sync with execution split- I think this PR needs to wait until the communication between consensus and execution is more concrete, but can make it more in line to the consensus-execution logic now as well if this PR is a priority. For context- we first need to snapshot chainDB (execution) before arbDB (consensus) but we want the consensus to trigger this, which means consensus should invoke a call to execution to start the snapshot process or to mark it as scheduled- but obviously we cannot keep the communication open and wait till the snapshot is generated to return a result to consensus so we would need another function that consensus can ping to, to get the status of snapshot creation. What do you think? @joshuacolvin0 @diegoximenes |
It is OK to wait until this PR #3111 is merged before you continue working on live snapshots 🙂 Considering that snapshotting can take a while, you can implement something like this:
Also, we could only lock insertionMutex after Execution's snapshot completes, to allow Consensus's DB to move forward while Execution snapshotting is happening. Makes sense to you? How long it takes to generate a snapshot for Arb1? |
yeah I agree with this
not sure, it took 5hr for sepolia (as tested by sre), so arb1 would be way longer than that Marking the PR as draft |
This PR enables creation of live snapshot of databases (
arbitrumdata
,l2chaindata
,ancient
andwasm
) without having to shutdown the node.The feature is disabled by default and can be enabled using the
--execution.live-db-snapshotter.enable
for execution databases (l2chaindata
,ancient
andwasm
) and--node.live-db-snapshotter.enable
for consensus database (arbitrumdata
)Note: This feature is only available for archive nodes running on pebble databases
Sample usage-
Testing
Triggered snapshot creation and successfully reused it run a new node in multiple scenario- small and large db sizes for arb1 and arb-sepolia nodes.
ArbSepolia-
block=34216
without any issues. Logs-Arb1
block=3,625,282
without any issues. Logs-block=3,609,438
without any issues. Logs-Pulls in geth PR- OffchainLabs/go-ethereum#380
Resolves NIT-2658