fix(eth): eth client timeout while watching #553
Draft
+95
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Still failing in watch_eth.go, but I have a little bit more information with additional logging turned on.
Last output for watch_eth.go working on stuff is:
After that we just have this:
Which looks to me like a lock-up here since either return path should log something:
curio/tasks/message/watch_eth.go
Line 114 in 289b2db
At the same time, Lotus shows us:
That path suggests that we hit the moment in the Lotus ChainIndexer where it had a new chain head but it wasn't quite indexed, so it had to wait a little while to see it be indexed, which should be pretty quick but there's a 5 second timeout on the wait. That particular timeout isn't what's returned here though, it's a timeout from somewhere up the stack, I believe go-jsonrpc, maybe a deadline in the websocket handler, although I can't figure out exactly where. I think it might be the generic timeout that go-jsonrpc sets up for listening for responses which is 30 seconds, but that seems far too long to be relevant here. The timing says Curio asked Lotus at
13:11:00.928
for aneth_getTransactionReceipt
and Lotus experienced a timeout error at13:11:00.929
!The fact that we don't progress further than this might suggest that go-ethereum/ethclient doesn't know how to deal with the error that comes out of Lotus for this event, or maybe one of the websocket handlers (on either side) doesn't know. So my strategy here is to put a 1s timeout per transaction in the processing loop and move on if we don't get a response after that. Hopefully whatever part of the stack is pausing will respond to a context kick.