Graceful Shutdown of SpringBoot Application with Istio in Kubernetes

Anup Dubey
FAUN — Developer Community 🐾
4 min readJul 19, 2023

--

Introduction:-

In the world of microservices, graceful shutdown is a critical aspect of maintaining high availability and preventing disruption to the user experience during application updates or maintenance activities. Kubernetes, with its powerful orchestration capabilities, combined with Istio, a feature-rich service mesh, provides an excellent platform for achieving graceful shutdowns. In this blog, we will explore the importance of graceful shutdowns, discuss how Istio can help with this process, and provide practical steps for implementing graceful shutdowns in Kubernetes with Istio.

Why Graceful Shutdowns Matter:
Graceful shutdowns allow applications to complete ongoing requests and perform necessary cleanup tasks before being taken offline. This process helps prevent data loss, maintain user sessions, and ensure a smooth transition during updates or maintenance activities. Without graceful shutdowns, abrupt termination of services can lead to lost data, a disrupted user experience, and potential inconsistencies across the system.

To achieve a graceful shutdown in a Kubernetes cluster with Istio, you can follow these steps:

Prepare your application:
Make sure your application is designed to handle termination signals gracefully. Let’s take an example of a Spring Boot application. As of Spring Boot 2.3, Spring Boot now supports the graceful shutdown feature for all four embedded web servers (Tomcat, Jetty, Undertow, and Netty) on both servlet and reactive platforms.
To enable the graceful shutdown, all we have to do is set up the server.shutdown property to graceful in your application.properties file:
For more details, use this link.

server.shutdown=graceful

Enable graceful shutdown in Istio:
After implementing graceful shutdown in our application, we proceeded to shut it down gracefully, allowing existing connections to complete. However, we encountered an issue where Istio prematurely terminated the connections, resulting in the client receiving an error message stating “UNAVAILABLE: upstream connect error or disconnect/reset before headers” leading to failed responses.

Upon investigation, we discovered that the Istio container was shutting down much earlier than our application’s container. As our application required some time to complete the shutdown process, we had set a terminationGracePeriodSeconds of 60 seconds in the k8s deployment. It turned out that the Istio pod had its own independent graceful shutdown setting, which was not aligned with the application container.

By default, Istio has a graceful shutdown period of 5 seconds when no explicit setting is defined. Consequently, while our application was still in the process of shutting down, external connectivity was disrupted after 5 seconds, resulting in errors. To resolve this issue, we adjusted the graceful shutdown period of the Istio sidecar to match the “terminationGracePeriodSeconds” value of the application container. In order to set the Istio container’s graceful shutdown period to, for example, 60 seconds, the following steps were taken:

Add the following configuration to define variables for individual Istio-proxy instances within the ProxyConfig:

This configuration can be set as the mesh-wide defaults in the defaultConfig section of meshConfig.

meshConfig:
defaultConfig:
terminationDrainDuration: 60s

Alternatively, to configure the graceful shutdown period on a per-workload basis, add the proxy.istio.io/config annotation to the pod, specifying the desired termination drain duration. For example:

annotations:
proxy.istio.io/config: |
terminationDrainDuration: 60s

By applying either of these configuration changes, the issue was resolved, and the application and Istio containers were able to gracefully shutdown within the defined duration of 60 seconds.
For more details use this link.

Adding preStopHook:
After implementing the above mentioned changes, we were able to resolve the issue, resulting in zero errors. However, there were still occasional errors that occurred during deployment or when multiple pods were down, and these errors seemed to be random in nature.

Due concurrent nature of deleting a pod and removing its reference from the Endpoint in Kubernetes, there is a chance that a pod may be deleted before its reference is fully removed. As updating network rules and deleting pods occur simultaneously, there is no guarantee that the network rules will be updated before the pods are deleted. This lack of synchronization can potentially result in failures when processing requests at the client end.

To address this problem, we introduced a preStop hook that includes a 15-second sleep period before terminating the pod. This allowed sufficient time for the network rules to update before the pod deletion, thus avoiding any failures in request processing.

spec:
containers:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 15"]

With the addition of the preStop hook and the implemented delay, we successfully resolved the issue encountered during deployment and instances of multiple pods going down.

Conclusion:
Graceful shutdowns are essential for maintaining high availability and providing a seamless user experience during application updates or maintenance activities. With Kubernetes and Istio, you can easily implement graceful shutdowns by leveraging Istio’s traffic management capabilities. By following the steps outlined in this blog, you can ensure that your applications gracefully handle termination signals, complete ongoing requests, and perform necessary cleanup tasks, leading to a smooth transition during maintenance activities. Embracing graceful shutdowns with Istio empowers you to build robust and resilient microservices architectures.

I hope you found this blog useful and informative. Your feedback, suggestions, and claps are greatly appreciated. If you have any further queries or comments, please feel free to leave them below.

Don't forget to check out my other posts for more insightful content!

  1. Kubernetes application logging using Fluentd
  2. Optimizing ElasticSearch and Reducing Costs: A Strategic Approach

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

--

--