-
Notifications
You must be signed in to change notification settings - Fork 461
Fix GrpcWorkerChannel.StartWorkerProcessAsync timeout #10937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
await _rpcWorkerProcess.StartProcessAsync(); | ||
_state = _state | RpcWorkerChannelState.Initializing; | ||
await _workerInitTask.Task; | ||
await _rpcWorkerProcess.StartProcessAsync(cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the important change - we will now wait on either worker fully initialized (gRPC connection established) or worker exits (in which case, we will re-throw any failures the worker experience).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good -- would just like a test added
/azp run host.integration-tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run host.public |
Azure Pipelines successfully started running 1 pipeline(s). |
@jviau @mattchenderson I was just investigating this yesterday! Turns out this is the same reason that we are stuck for 60s when trying to exit the CLI during startup (Azure/azure-functions-core-tools#4355). Glad to see there is a already a fix before I opened one :) I'm going to test your changes out with core tools and see if it addresses the issue like I expect. Looks like it works?
Before this change, it would hang for 60 seconds waiting on the timeout from PendingItem.
For my version of the fix, I was considering passing a CT to PendingItem and registering an edit: more logs - this is outside of my debug session - it takes 5 seconds to shutdown after we cancel:
For core tools at least, I would love to reduce this time but I can probably just inject options in the webhost to reduce the flush logs timer to a 1 second or something like that. Anywho not related to this PR
|
Issue describing the changes in this PR
resolves #issue_for_this_pr
Pull request checklist
IMPORTANT: Currently, changes must be backported to the
in-proc
branch to be included in Core Tools and non-Flex deployments.in-proc
branch is not requiredrelease_notes.md
-- TODOAdditional information
This PR improves the ScriptHost startup experience with a bad worker. Today, if a worker crashes or exits immediately after startup, then the
GrpcWorkerChannel.StartWorkerProcessAsync
will block on_workerInitTask.Task
until it eventually times out. This tends to fault the entire host (at least during debugging).To address this, a
WorkerProcess.WaitForExitAsync
is added andGrpcWorkerChannel.StartWorkerProcessAsync
will also wait on that, improving the responsiveness to a worker exiting before connecting gRPC events.