Skip to content

Proper handling of malformed snapshots #1708

@jasagredo

Description

@jasagredo

When a node is presented with a malformed snapshot, we can be in two situations:

  1. The snapshot is corrupted
  2. The snapshot is on a different format

In case 2, we have to make a decision whether we want to delete the snapshot directly or we want to ignore it. For uniformity it seems the reasonable choice is to delete it and continue with the next one (or start from Genesis), but it might be annoying if one switches backends, forgets that the format of the snapshot has to be converted and the node just deletes a big snapshot.

It probably is better from the user's perspective to emit a warning, something like "I cannot read this snapshot, but perhaps you have to convert it" and don't delete it, just ignore it.

In any case, the node should not crash, and @karknu found that an LMDB node starting with a legacy snapshot crashes with:

FsError {fsErrorType = FsResourceInappropriateType, fsErrorPath = /opt/cardano-node/data/db/ledger/1773/state, fsErrorString = "Not a directory", fsErrorNo = Just (Errno 20), fsErrorStack = CallStack (from HasCallStack):
  prettyCallStack, called at src/System/FS/API/Types.hs:393:23 in fs-api-0.3.0.0-b27d6f59fa31939aa5f5b1416d8875569b20290ab7de3324e006eb3042a59710:System.FS.API.Types
  ioToFsError, called at src/System/FS/IO.hs:113:41 in fs-api-0.3.0.0-b27d6f59fa31939aa5f5b1416d8875569b20290ab7de3324e006eb3042a59710:System.FS.IO
  handleError, called at src/System/FS/IO.hs:109:23 in fs-api-0.3.0.0-b27d6f59fa31939aa5f5b1416d8875569b20290ab7de3324e006eb3042a59710:System.FS.IO
  rethrowFsError, called at src/System/FS/IO.hs:50:21 in fs-api-0.3.0.0-b27d6f59fa31939aa5f5b1416d8875569b20290ab7de3324e006eb3042a59710:System.FS.IO
  hOpen, called at src/System/FS/API.hs:216:43 in fs-api-0.3.0.0-b27d6f59fa31939aa5f5b1416d8875569b20290ab7de3324e006eb3042a59710:System.FS.API
  withFile, called at src/ouroboros-consensus/Ouroboros/Consensus/Util/CBOR.hs:193:5 in ouroboros-consensus-0.27.0.0-611a7e2ecc968707b1c18e6effb4a7b0199543b019e9d7b5cb4529b702b8b0c3:Ouroboros.Consensus.Util.CBOR, fsLimitation = False}

which in reality means that /opt/cardano-node/data/db/ledger/1773 is a file. This case should be checked before attempting to open the state file (or we capture the exception).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions