Kubernetes POD Failover

kubernetes statefulset
kubernetes headless service
kubernetes deployment
pod disruption budget kubernetes
kubernetes mysql statefulset
kubernetes pod evicted reason
kubernetes restart pod
kubectl delete pod

I am toying around with Kubernetes and have managed to deploy a statefull application (jenkins instance) to a single node. It uses a PVC to make sure that I can persist my jenkins data (jobs, plugins etc).

Now I would like to experiment with failover.

My cluster has 2 digital ocean droplets.

Currently my jenkins pod is running on just one node. When that goes down, Jenkins becomes unavailable.

I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node. (so short downtime during this proces is ok).

Of course it has to use the same PVC, so that my data remains intact.

I believe, when reading, that a StatefulSet kan be used for this?

Any pointers are much appreciated!

Best regards


Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be attached to one node at a time.

I came across this blogpost which, while focused on Jenkins on Azure, has the same situation of only supporting ReadWriteOnce. The author states:

the drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node.

Note, Pod failure and node failures are different things. Since DO only supports ReadWriteOnce, there's no benefit to trying anything more sophisticated than what you have right now in terms of tolerance to node failure. Since it's ReadWriteOnce the volume will need to be unmounted from the failing node and re-mounted to the new node, and then a new Pod will get scheduled on the new node. Kubernetes will do this for you, and there's not much you can do to optimize it.

For Pod failure, you could use a Deployment since you want to read and write the same data, you don't want different PVs attached to the different replicas. There may be very limited benefit to this, you will have multiple replicas of the Pod all running on the same node, so it depends on how the Jenkins process scales and if it can support that type of scale horizontal out model while all writing to the same volume (as opposed to simply vertically scaling memory or CPU requests).

If you really want to achieve higher availability in the face of node and/or Pod failures, and the Jenkins workload you're deploying has a hard requirement on local volumes for persistent state, you will need to consider an alternative volume plugin like NFS, or moving to a different cloud provider like GKE.

Run a Replicated Stateful Application, Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be  I am a 4.5 star rated DevOps Trainer. 100,000 Students World Wide. 100+ Hands-On Labs on Browser. Mock Tests . LifeTime Access


Yes, you would use a Deployment or StatefulSet depending on the use case. For Jenkins, a StatefulSet would be appropriate. If the running pod becomes unavailable, the StatefulSet controller will see that and spawn a new one.

Pod Lifecycle, With regards to failover, if a pod were to ever fail, Kubernetes will automatically and immediately bring in a new pod to replace it. We can test  This page describes the lifecycle of a Pod. Pod phase A Pod's status field is a PodStatus object, which has a phase field. The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. The phase is not intended to be a comprehensive rollup of observations of Container or Pod state, nor is it intended to be a comprehensive state machine. The number and meanings of Pod


What you are describing is the default behaviour of Kubernetes for Pods that are managed by a controller, such as a Deployment.

You should deploy any application as a Deployment (or another controller) even if it consists just of a single Pod. You never really deploy Pods directly to Kubernetes. So, in this case, there's nothing special you need to do to get this behaviour.

When one of your nodes dies, the Pod dies too. This is detected by the Deployment controller, which creates a new Pod. This is in turn detected by the scheduler, which assigns the new Pod to a node. Since one of the nodes is down, it will assign the Pod to the other node that is still running. Once the Pod is assigned to this node, the kubelet of this node will run the container(s) of this Pod on this node.

Kubernetes POD Failover, Learn how to failover a stateful application using Kubernetes and Portworx. Try it for yourself today. kubectl get pgtasks sample-failover -o yaml where you can replace sample with the name of your cluster. Once the failover is complete, you will see that the replica is promoted to become the primary, and there is a new replica pod now available.


Ok, let me try to anwser my own question here. I think Amit Kumar Gupta came the closest to what I believe is going on here.

Since I am using a Deployment and my PVC in ReadWriteOnce, I am basically stuck with one pod, running jenkins, on one node.

weibelds answer made me realise that I was asking questions to about a concept that Kubernetes performs by default. If my pod goes down (in my case i am shutting down a node on purpose by doing a hard power down to simulate a failure), the cluster (controller?) will detect this and spawn a new pod on another node.

All is fine so far, but then I noticed that my new pod as stuck in ContainerCreating state.

Running a describe on my new pod (the one in ContainerCreating state) showed this

Warning  FailedAttachVolume  16m                attachdetach-controller  Multi-Attach error for volume "pvc-cb772fdb-492b-4ef5-a63e-4e483b8798fd" Volume is already used by pod(s) jenkins-deployment-6ddd796846-dgpnm
Warning  FailedMount         70s (x7 over 14m)  kubelet, cc-pool-bg6u    Unable to mount volumes for pod "jenkins-deployment-6ddd796846-wjbkl_default(93747d74-b208-421c-afa4-8d467e717649)": timeout expired waiting for volumes to attach or mount for pod "default"/"jenkins-deployment-6ddd796846-wjbkl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[jenkins-home default-token-wd6p7]

Then it started to hit me, this makes sense. It's a pitty, but it makes sense.

Since I did a hard power down on the node, the PV went down with it. So now the controller tries to start a new pod, on a new node but it cant transfer the PV, since the one on the previous pod became unreachable.

As I read more on this, I read that DigitalOcean only supports ReadWriteOnce , which now leaves me wondering, how the hell can I achieve a simple failover for a stateful application on a Kubernetes Cluster on Digital Ocean that consists of just a couple of simple droplets?

The Curious Case of Failing Over in Kubernetes, This pod was bound to a specified node. If the pod were to fail unexpectedly, Kubernetes (specifically, the Kubelet service) would restart the pod. By default, pods  Now I would like to experiment with failover. My cluster has 2 digital ocean droplets. Currently my jenkins pod is running on just one node. When that goes down, Jenkins becomes unavailable. I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node.


Test Failover of a MySQL pod on Portworx, HAProxy is configured with a “back end” for each Kubernetes service, which proxies traffic to individual pods. This two-step load-balancer setup is mostly in  TL;DR- in testing failover on MongoDB, Portworx enabled Kubernetes to failover a MongoDB pod 300% faster, taking only 45 seconds, compared to 180 seconds.


High Availability and Services with Kubernetes // Jetstack Blog, K8s HA is not just about the stability of Kubernetes itself. is often something ending in -10, load balances DNS traffic to the endpoint Pods. Kubernetes works kind of like Windows Failover Clustering does–in that, since we have defined our service, it runs a health check on our pod running SQL Server.


One year using Kubernetes in production: Lessons learned, A pod in Kubernetes represents the fundamental deployment unit. It may contain one or more containers packaged and deployed as a logical  If your application becomes too popular and a single pod instance can’t carry the load, Kubernetes can be configured to deploy new replicas of your pod to the cluster as necessary.