-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Background
Blueprints propagate some information to the database using rendezvous tables, as described in https://rfd.shared.oxide.computer/rfd/541.
This is implemented in nexus/src/app/background/tasks/blueprint_rendezvous.rs as a background task, which invokes reconcile_blueprint_rendezvous_tables.
In short, this operation is:
- Read latest blueprint
- Read latest inventory
- "Do Reconciliation", which means many calls to
INSERT INTO my_rendezvous_table, and either ignoring conflict errors or explicitly ignoring them.
As implemented today (1/5/2026) we are not performing "hard deletion" of any of these rendezvous table rows.
The Issue
If we want to ever perform hard deletion of any rendezvous table rows - suppose they're extremely old, no longer relevant, taking up space, etc - it's hard to guarantee that the background task won't re-insert old data.
For example, suppose we have the following sequence of events:
- Nexus A reads blueprint @ generation 1, gets ready to do reconciliation (which will insert into a rendezvous table with id =
foo-bar-baz) - Nexus A hangs unexpectedly. We'll treat this as a long sleep
- Nexus B creates, reads, and executes many more blueprints. The database row with id =
foo-bar-bazis created, and later we want to hard delete it. - At some point in the future, Nexus A will wake back up and resume reconciliation. It'll try to insert the entry with id =
foo-bar-baz. If this row is hard-deleted, the laterINSERToperation would succeed (which would be a bug - this would be a "zombie record" coming back to life unexpectedly after deletion).
Resolution
There are a couple ways we could mitigate this
- Guard the INSERT. Using a generation number stored somewhere, convert the
INSERT INTOoperations into a transaction or CTE, which is effectivelyINSERT INTO ... + ONLY DO IT WHILE THE GENERATION NUMBER OF RENDEZVOUS IS LATEST. This should cause all old rendezvous operations to be "locked out". - Track ongoing rendezvous operations, Guard the DELETE. Basically: If we know what rendezvous operations are going on, we could prevent the "hard deletion" from happening until we know it won't be revived.
- Rely on timeouts. Use timeouts, to make the assumption that "no really old Nexuses exist". E.g., if rendezvous operations have a timeout of 30 sec, but we only perform hard deletions after 24 hours. (Note: this has huge issues for e.g. interactions with mupdate, time sync, so we'd prefer to avoid it, but listing it for completeness)
(My personal preference is for option 1 - it does have slightly more boilerplate, but it seems most "clearly correct", and it really limits the duration of old rendezvous operations happening)
With the ongoing work in #9552 , this is relevant to fault management reconciliation as well. Frankly it might be more relevant there, because alerts are probably going to churn more frequently than our blueprint rendezvous tables (e.g. dataset configurations).