Skip to content

Conversation

@antoniomika-sdx
Copy link

Pull Request

Description

Close inference pool after request servers are stopped.

We have seen a couple of 500 errors occur from MLServer after a shutdown has started. The stacktrace for that is as follows:

INFO: x.x.x.x:39300 - "GET /v2/health/ready HTTP/1.1" 200 OK
2025-11-25 19:03:53,292 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
INFO: Shutting down
INFO: 10.130.91.121:47068 - "POST /invocations HTTP/1.1" 200 OK
2025-11-25 19:03:54,318 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,318 [mlserver.grpc] INFO - Waiting for gRPC server shutdown
INFO: x.x.x.x:39310 - "GET /v2/health/ready HTTP/1.1" 200 OK
2025-11-25 19:03:54,319 [mlserver.grpc] INFO - gRPC server shutdown complete
INFO: x.x.x.x:39308 - "POST /invocations HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/applications.py", line 112, in __call__
await self.middleware_stack(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/app/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/app/.venv/lib/python3.11/site-packages/starlette_exporter/middleware.py", line 499, in __call__
raise exception
File "/app/.venv/lib/python3.11/site-packages/starlette_exporter/middleware.py", line 405, in __call__
await self.app(scope, receive, wrapped_send)
File "/app/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/app/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/app/.venv/lib/python3.11/site-packages/starlette/routing.py", line 714, in __call__
await self.middleware_stack(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/routing.py", line 734, in app
await route.handle(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/app/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/app/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/app/.venv/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/mlserver/rest/app.py", line 47, in custom_route_handler
return await original_route_handler(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/mlserver/parallel/model.py", line 37, in _inner
return await self._send(method.__name__, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/mlserver/parallel/model.py", line 71, in _send
response_message = await self._dispatcher.dispatch_request(req_message)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/mlserver/parallel/dispatcher.py", line 212, in dispatch_request
worker, wpid = self._get_worker()
^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.11/site-packages/mlserver/parallel/dispatcher.py", line 223, in _get_worker
return self._workers[worker_pid], worker_pid
~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 17
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
2025-11-25 19:03:54,413 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,413 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,413 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,413 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,413 [mlserver.grpc] INFO - Waiting for gRPC server shutdown
2025-11-25 19:03:54,413 [mlserver.grpc] INFO - gRPC server shutdown complete
2025-11-25 19:03:54,413 [mlserver.grpc] INFO - Waiting for gRPC server shutdown
2025-11-25 19:03:54,413 [mlserver.grpc] INFO - gRPC server shutdown complete
INFO: Waiting for connections to close. (CTRL+C to force quit)
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2025-11-25 19:03:54,714 [mlserver.parallel] INFO - Shutdown of default inference pool complete

Based on the above, the router attempts to route a request to worker 17, but it no longer exists. We probably want to close the servers for all requests (and drain them) prior to closing the inference workers.

Changes Made

Moves the inference pool close call to after kafka, grpc, and rest servers are stopped

Related Issues

N/A

Screenshots (if applicable)

N/A

Checklist

  • Code follows the project's style guidelines
  • All tests related to the changes pass successfully
  • Documentation is updated (if necessary)
  • Code is reviewed by at least one other team member
  • Any breaking changes are communicated and documented

Additional Notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant