Stop Messing with Kubernetes Finalizers

We’ve all been there - it’s frustrating seeing deletion of Kubernetes resource getting stuck, hang or take a very long time. You might have “solved” this using the terrible advice of removing finalizers or running kubectl delete ... --force --grace-period=0 to force immediate deletion. 99% of the time this is a horrible idea and in this article I will show you why.

Finalizers

Before we get into why force-deletion is a bad idea, we first need to talk about finalizers.

Finalizers are values in resource metadata that signal required pre-delete operations - they tell resource controller what operations need to be performed before object is deleted.

The most common one would be:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
  finalizers:
  - kubernetes.io/pvc-protection
...

Their purpose is to stop a resource from being deleted, while controller or Kubernetes Operator cleanly and gracefully cleans-up any dependant objects such as underlying storage devices.

When you delete an object which has a finalizer, deletionTimestamp is added to resource metadata making the object read-only. Only exception to the read-only rule, is that finalizers can be removed. Once all finalizers are gone, the object is queued to be deleted.

It’s important to understand that finalizers are just items/keys in resource metadata. Finalizers don’t specify the code to execute. They have to be added/removed by the resource controller.

Also, don’t confuse finalizers with Owner References. .metadata.OwnerReferences field specify parent/child relations between objects such as Deployment -> ReplicaSet -> Pod. When you delete an object such as Deployment a whole tree of child objects can be deleted. This process (deletion) is automatic, unlike with finalizers, where controller needs to take some action and remove the finalizer field.

What Could Go Wrong?

As mentioned earlier, the most common finalizer you might encounter is the one attached to Persistent Volume (PV) or Persistent Volume Claim (PVC). This finalizer protects the storage from being deleted while it’s in use by a Pod. Therefore, if the PV or PVC doesn’t want to delete, it most likely means that it’s still mounted by a Pod. If you decide force-delete PV, be aware that backing storage in Cloud or any other infrastructure might not get deleted, therefore you might leave a dangling resource, which still costs you money.

Another example is a Namespace which can get stuck in Terminating state because resources still exist in the namespace that the namespace controller is unable to remove. Forcing deletion of namespace can leave dangling resources in your cluster which include for example Cloud provider’s load balancer which might be very hard to track down later.

While not necessarily related to finalizers, it’s good to mention that resources can get stuck for many other reasons other than waiting for finalizers:

The simplest example would be Pod being stuck in Terminating state, which usually signals issue with Node on which the Pod runs. “Solving” this with kubectl delete pod --grace-period=0 --force ... will remove the Pod from API server (etcd), but it might still be running on the Node, which is definitely not desirable.

Another example would be a StatefulSet, where Pod force-deletion can create problems because Pods have fixed identities (pod-0,pod-1). A distributed system might depend on these names/identities - if the Pod is force-deleted, but still runs on the node, you can end-up with 2 pods with same identity when StatefulSet controller replaces the original “deleted” Pod. These 2 Pods might then attempt to access same storage, which can lead to corrupted data. More on this in docs.

Finalizers in The Wild

We now know that we shouldn’t mess with resources that have finalizers attacked to them, but which resources are these?

The 3 most common ones you will encounter in “vanilla” Kubernetes are kubernetes.io/pv-protection and kubernetes.io/pvc-protection related to Persistent Volumes and Persistent Volume Claims respectively (plus couple more introduced in v1.23) as well as kubernetes finalizer present on Namespaces. The last one however isn’t in .metadata.finalizers field but rather in .spec.finalizers - this special case is described in architecture document.

Besides these “vanilla” finalizers, you might encounter many more if you install Kubernetes Operators which often perform pre-deletion logic on their custom resources. A quick search through code of some popular projects turn up the following:

Istio - istio-finalizer.install.istio.io

Cert Manager - finalizer.acme.cert-manager.io

Strimzi (Kafka) - service.kubernetes.io/load-balancer-cleanup

Quay - quay-operator/finalizer

Ceph/Rook - ceph.rook.io/disaster-protection

ArgoCD - argoproj.io/finalizer

Litmus Chaos - chaosengine.litmuschaos.io/finalizer

Written on September 16, 2022