Skip to content

Db-sync Hang Causing Midnight Node Outages #1949

Open
@Fentonhaslam

Description

@Fentonhaslam

We have observed multiple incidents where db-sync hangs, causing Midnight node outages due to a failure in importing blocks. This issue has occurred intermittently and has been temporarily resolved with pod restarts, but a more robust solution is needed to prevent manual intervention.

Problem:
db-sync hanging results in the node failing to import blocks, with the following error:

sync: :broken_heart: Verification failed for block 0xd0c306e29f09841635ae13ade6f9dce33e6ed2b3b565eac826154e23a89d475a received from (12D3KooWQF1x9ffPo73DRK8XKPw1Ev9BnJhNQc6QBke1tLnssumX): "Main chain state d23e68ee90dcc4677b2f67152daf8e08ebb3cf9507b9a587120882c280ed0c05 referenced in imported block at slot 290165258 with timestamp 1740991548000 not found"

Image

db-sync is 3 hours behind the Cardano tip when discovered. The Cardano node is synced and importing blocks, indicating that the issue is isolated to db-sync.
One db-sync pod entered this stuck state, requiring a manual restart to recover.
Another db-sync pod self-recovered without intervention, though logs suggest it may have undergone an automatic pod refresh.
We need a root cause analysis to determine why db-sync enters this state and a fix that eliminates the need for manual restarts.

Logs & Observations:
The last log message from db-sync showed a successful tip import before pausing indefinitely.
If db-sync stops receiving blocks, it does not necessarily throw an error, making detection and recovery more difficult.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions