New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

(feat) Ingestor shutdown via pod annotation #560

Open

vpatelsj wants to merge 7 commits into main from vapa/shutdownhook

+253 −0

Contributor

vpatelsj commented Feb 5, 2025

Adding a service shutdown endpoint to the ingestor service. It stops the service from accepting new requests by sending a 503 and waits until all writes have been completed. This can eventually be used by ingestor scaler to scale down service pods safely.


          Adding a service shutdown endpoint

b167318

jwilder requested changes

View reviewed changes

ingestor/service.go Show resolved Hide resolved

ingestor/service.go Outdated

               		}
               	}()
+              	if !s.isOpen {

Collaborator

jwilder Feb 5, 2025

I'd use the DisableWrites() which will shutdown the HTTP server altogether and close the Store. That will prevent new writes from all paths.

isOpen actually creates a race condition because HandleTransfer reads it and Close writes it in separate goroutines. If you add a mutex to protect it, it creates lock contentions and slows write throughput if not done carefully.

Contributor Author

vpatelsj Feb 6, 2025

DisableWrites() function I only see close on store and metrics service. How does it shutdown http server?

Collaborator

jwilder Feb 6, 2025

My mistake. Looks like the http shutdown is done in main just before DisableWrites is called. We could either refactor the Service to include the http server so that DisableWrites could call it or use isOpen approach. I'd say use isOpen but switch it to use atomics vs member var w/ a mutex. I'd suggest going with the atomics for now (even though it's still racy), and then we can refactor the server/service separately and switch to that later.

Contributor Author

vpatelsj Feb 7, 2025

Reverted to using annotations based invocation on the ingestor pod. Hope this mitigates the denial of attack concerns.


          address code comments

aeb7494

vpatelsj marked this pull request as ready for review

February 6, 2025 16:17


          Merge branch 'main' into vapa/shutdownhook

9ff653c

jwilder requested changes

View reviewed changes

cmd/ingestor/main.go Outdated

@@ @@ -365,6 +365,7 @@ func realMain(ctx *cli.Context) error { @@
               	mux := http.NewServeMux()
               	mux.HandleFunc("/transfer", svc.HandleTransfer)
               	mux.HandleFunc("/readyz", svc.HandleReady)
+              	mux.HandleFunc("/shutdown", svc.HandleShutdown)

Collaborator

jwilder Feb 6, 2025

We can't put this on the exposed API. You could run a denial of service attack with it.

Contributor Author

vpatelsj Feb 7, 2025

No longer exposing an endpoint. Thanks


          Invoke ingestor shutdown via pod annotation

8a639fc

vpatelsj changed the title ~~Adding a service shutdown endpoint~~ (feat) Ingestor shutdown via pod annotation

vpatelsj added 2 commits

February 6, 2025 19:53


          Invoke ingestor shutdown via pod annotation

43dfc02


          Merge branch 'main' into vapa/shutdownhook

a62f977

vpatelsj requested a review from jwilder

February 7, 2025 00:54


          Invoke ingestor shutdown via pod annotation

fdc189e

jwilder requested changes

View reviewed changes

ingestor/runner/shutdown/shutdown.go

+              	//get ingestor pod in which this runner is running
+              	pod, err := r.k8sClient.CoreV1().Pods(namespace).Get(ctx, os.Getenv("HOSTNAME"), metav1.GetOptions{})
+              	if err != nil {
+              		logger.Errorf("failed to get pod annotations: %v", err)

Collaborator

jwilder Feb 7, 2025

Just return fmt.Errorf('... %w') since caller logs the error already.

ingestor/runner/shutdown/shutdown.go

+              	if _, ok := pod.Annotations[SHUTDOWN_REQUESTED]; ok {
+              		logger.Infof("shutting down the service")
+              		if err := r.httpServer.Close(); err != nil {
+              			logger.Errorf("failed to close http server: %v", err)

Collaborator

jwilder Feb 7, 2025

Return wrapped error

ingestor/runner/shutdown/shutdown.go

+              		}
+              		if err := r.service.Shutdown(); err != nil {
+              			logger.Errorf("failed to shutdown the service: %v", err)

Collaborator

jwilder Feb 7, 2025

Return wrapped error

ingestor/runner/shutdown/shutdown.go

+              			logger.Errorf("failed to shutdown the service: %v", err)
+              			return err
+              		}
+              		logger.Infof("service shutdown completed")

Collaborator

jwilder Feb 7, 2025

Capitalize Service for log messages to keep consistent with rest of the code base.

ingestor/runner/shutdown/shutdown.go

+              		//set shutdown-completed annotation
+              		pod.Annotations[SHUTDOWN_COMPLETED] = "true"
+              		if _, err := r.k8sClient.CoreV1().Pods(namespace).Update(ctx, pod, metav1.UpdateOptions{}); err != nil {
+              			logger.Errorf("failed to set shutdown-completed annotation: %v", err)

Collaborator

jwilder Feb 7, 2025

Return the wrapped error

ingestor/runner/shutdown/shutdown.go

+              	//check if shutdown-completed annotation is set
+              	if _, ok := pod.Annotations[SHUTDOWN_COMPLETED]; ok {
+              		logger.Infof("shutdown already completed on the pod, skipping shutting down")

Collaborator

jwilder Feb 7, 2025

Capital Shutdown

ingestor/runner/shutdown/shutdown.go

+              	//shutdown the service
+              	if _, ok := pod.Annotations[SHUTDOWN_REQUESTED]; ok {
+              		logger.Infof("shutting down the service")

Collaborator

jwilder Feb 7, 2025

Capitalize first word

ingestor/runner/shutdown/shutdown.go

+              		}
+              		logger.Infof("service shutdown completed")
+              		//set shutdown-completed annotation
+              		pod.Annotations[SHUTDOWN_COMPLETED] = "true"

Collaborator

jwilder Feb 7, 2025

I think this should only get set after the all uploads have completed?

ingestor/runner/shutdown/shutdown.go

+              	//shutdown the service
+              	if _, ok := pod.Annotations[SHUTDOWN_REQUESTED]; ok {
+              		logger.Infof("shutting down the service")
+              		if err := r.httpServer.Close(); err != nil {

Collaborator

jwilder Feb 7, 2025

What happens if a shutdown is request and the pods just exits unexpectedly? How can we make this SHUTDOWN_REQUESTED phase idempotent and resumable?

ingestor/service.go

+              		return err
+              	}
+              	if err := s.Close(); err != nil {

Collaborator

jwilder Feb 7, 2025

I don't think you want to Close here because then the pod will just restart and be able to take new writes. I think it would be better to have the instance stay in this disabled state and signal back that shudown is complete here. The operator can then see that pod has signaled shutdown is completed and can delete the pod/scale down the statefulset.

That actually reminds me that when we scale down we have to do it in reverse order because of the way that statefulsets works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet