OpenEBS cStor volumes stuck in Offline

OpenEBS cStor volumes stuck in Offline

I run the Kubernetes cluster with OpenEBS cStor as the primary provisioning engine. It is used for two reasons: primo, the engine enables dynamic volume provisioning, and secundo, it is capable of replicating Persistent Volumes to other nodes in the cluster. This brings a bit of resiliency to the cluster, as we don't have to worry when something really bad happens to a node.

I had to update the hardware on one of the nodes some time ago. Therefore I removed the node from the cluster, performed a disk replacement, and then added the node back. You can read about it in the following posts.

Removing worker and control-plane nodes from the Kubernetes cluster
Running your own Kubernetes cluster requires performing maintenance from time to time. It might be due to updating software or even replacing hardware. In my case, it is the other option. I want to replace old HDDs with new faster SSDs. My Kubernetes cluster contains 6 nodes: 3 control plane

However, after re-adding a new pool to the cStor disk pool, Pods got stuck waiting for volumes.

Down the rabbit hole

So what is the issue with the cStor pool clusters?

❯ kubectl get cstorpoolclusters.cstor.openebs.io
NAME                      HEALTHYINSTANCES   PROVISIONEDINSTANCES   DESIREDINSTANCES   AGE
openebs-cstor-disk-pool   3                  3                      3                  35d

Printing cstorpoolclusters has shown that we desire 3 instances out of which we have 3 provisioned instances and 3 healthy instances. So good so far. But printing cstorpoolinstances have shown something different.

❯ kubectl get cstorpoolinstances.cstor.openebs.io
NAME                           HOSTNAME   FREE     CAPACITY   READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   STATUS   AGE
openebs-cstor-disk-pool-9k98   vm0302     89500M   90132M     false      4                     1                 ONLINE   17h
openebs-cstor-disk-pool-d5rg   vm0102     87300M   90070M     false      4                     1                 ONLINE   35d
openebs-cstor-disk-pool-rvl2   vm0202     87300M   90070M     false      4                     1                 ONLINE   34d

We have one healthy replica only in each pool and that's pretty odd! Getting cstorvolumes has shown that we are having an issue with volumes.

❯ kubectl get cstorvolumes.cstor.openebs.io
NAME                                       CAPACITY   STATUS    AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c   4Gi        Healthy   34d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f   8Gi        Offline   34d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb   4Gi        Offline   34d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b   4Gi        Offline   34d

It happened that cStor volumes got stuck in an Offline state.

❯ kubectl get cstorvolumereplicas.cstor.openebs.io
NAME                                                                    ALLOCATED   USED    STATUS               AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-d5rg   671M        1.66G   Healthy              33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-rvl2   671M        1.66G   Healthy              33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-9k98   671M        1.66G   Healthy              33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-d5rg   209M        741M    Healthy              33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-rvl2   209M        241M    Degraded             33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-9k98   209M        1M      NewReplicaDegraded   33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-d5rg   874M        2.37G   Healthy              33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-rvl2   874M        1.37G   Degraded             33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-9k98   874M        1M      NewReplicaDegraded   33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-d5rg   917M        2.53G   Healthy              33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-rvl2   917M        1.13G   Degraded             33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-9k98   917M        1M      NewReplicaDegraded   33d

Pods got stuck as they needed volumes, and volumes were not ready...

cStor Target Pods

When a cStor volume is provisioned, it creates a new cStor target Pod that is responsible for exposing the iSCSI LUN. cStor target Pod receives the data from the workloads and then passes it on to the respective cStor volume replicas (on cStor Pools). cStor target pod handles the synchronous replication and quorum management of its replicas. And those target Pods required a restart.

❯ kubectl delete pod pvc-5ea36f92-daec-4ba8-a650-456b1b97b17a-target-5b7646677c2w4vq

Deleting each target Pod forced replicas to rebuild.

❯ kubectl get cstorvolumereplicas.cstor.openebs.io
NAME                                                                    ALLOCATED   USED    STATUS                     AGE
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-d5rg   671M        1.66G   Healthy                    33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-rvl2   671M        1.66G   Healthy                    33d
pvc-48b1eaaa-8874-4d56-9c3a-1240e30d861c-openebs-cstor-disk-pool-9k98   671M        1.66G   Healthy                    33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-d5rg   209M        741M    Healthy                    33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-rvl2   209M        241M    Rebuilding                 33d
pvc-5f57319d-a744-44fa-9aa1-33cfcadb649f-openebs-cstor-disk-pool-9k98   209M        1M      ReconstructingNewReplica   33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-d5rg   874M        2.37G   Healthy                    33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-rvl2   874M        1.37G   Rebuilding                 33d
pvc-863aa630-5368-4400-9c01-e965d17c5aeb-openebs-cstor-disk-pool-9k98   874M        322M    ReconstructingNewReplica   33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-d5rg   917M        2.53G   Healthy                    33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-rvl2   917M        1.53G   Rebuilding                 33d
pvc-8d8cb8cb-9e13-4a3a-8f12-d5a19b4afb8b-openebs-cstor-disk-pool-9k98   917M        571M    ReconstructingNewReplica   33d

Once reconstruction had been finished, all replicas became healthy again.

More useful OpenEBS cStor troubleshooting guides can be found at the following link.

Troubleshooting OpenEBS - cStor | OpenEBS Docs
This page contains a list of cStor related troubleshooting information.