fix: snowcap-api: Workaround lua-http#241#448
fix: snowcap-api: Workaround lua-http#241#448Ph4ntomas wants to merge 1 commit intopinnacle-comp:mainfrom
Conversation
problem: When a big payload is sent while another one is polling, cqueues.poll sometime does a spurious wakeup and returns nil. This makes stream:each_chunk() exit early, which in term closes streams. When a stream exits in this state, the server may still send some event, causing the config to crash. solution: We workaround this bug by implementing our own each_chunk function which will ignore timeout (in this context, a timeout should never happen, so it's as safe as can be). We don't need to implement that for regular call to stream:get_next_chunk as they are all done with a timeout, which would already loop correctly.
bfd3186 to
8cafee0
Compare
|
Some note about this bug:
pinnacle/snowcap/src/server.rs Line 76 in 822a537 EDIT: After more troubleshooting, the issue at play isn't in hyper, but in lua-http (daurnimator/lua-http#242). Unfortunately, increasing the connection window-size while it does work doesn't avoid the race-condition, and the bug will eventually happen. We could monkey-patch a fix while we wait for the fix to be available on lua-http. We could (and should) avoid sending big blobs in a loop. I think simply exchanging image blobs with an icon identifier first, then using that instead of the raw rgba handle could be enough to avoid the issue altogether. I wouldn't avoid some issue on absurdly big payload, but sending a few big payload once in a while should leave plenty connection flow credit, and let Hyper recover on its own.
This no-longer holds true. I've run into the same issue. Right now the best "fix" is to avoid big payload as that reduces the risk of triggering the bug. |
problem: When a big payload is sent while other stream are being polled,
cqueues.pollsometime does a spurious wake-up and returns nil. This makes stream:each_chunk() exit early, which in term closes streams. When a stream exits in this state, the server may still send some event, causing the config to crash.solution: We workaround this bug by implementing our own each_chunk function which will ignore timeout (in this context, a timeout should never happen, so it's as safe as can be). We don't need to implement that for regular call to stream:get_next_chunk as they are all done with a timeout, which would already loop correctly.