Db-sync Hang Causing Midnight Node Outages

We have observed multiple incidents where db-sync hangs, causing Midnight node outages due to a failure in importing blocks. This issue has occurred intermittently and has been temporarily resolved with pod restarts, but a more robust solution is needed to prevent manual intervention.

Problem:
db-sync hanging results in the node failing to import blocks, with the following error:

`sync: :broken_heart: Verification failed for block 0xd0c306e29f09841635ae13ade6f9dce33e6ed2b3b565eac826154e23a89d475a received from (12D3KooWQF1x9ffPo73DRK8XKPw1Ev9BnJhNQc6QBke1tLnssumX): "Main chain state d23e68ee90dcc4677b2f67152daf8e08ebb3cf9507b9a587120882c280ed0c05 referenced in imported block at slot 290165258 with timestamp 1740991548000 not found"`

![Image](https://github.com/user-attachments/assets/9c49bcfe-8c37-4fb2-9ad0-83879476093b)

db-sync is 3 hours behind the Cardano tip when discovered. The Cardano node is synced and importing blocks, indicating that the issue is isolated to db-sync.
One db-sync pod entered this stuck state, requiring a manual restart to recover.
Another db-sync pod self-recovered without intervention, though logs suggest it may have undergone an automatic pod refresh.
We need a root cause analysis to determine why db-sync enters this state and a fix that eliminates the need for manual restarts.

Logs & Observations:
The last log message from db-sync showed a successful tip import before pausing indefinitely.
If db-sync stops receiving blocks, it does not necessarily throw an error, making detection and recovery more difficult.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Db-sync Hang Causing Midnight Node Outages #1949

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Db-sync Hang Causing Midnight Node Outages #1949

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions