This is the final installment of a five-part series on Kubernetes Fundamentals.

There are many advantages to using containers to run applications. However, ease of storage is certainly not one of them. To do its job, a container must have a temporary file system. But when a container shuts down, any changes made to its file system are lost. A side effect of easily fungible containers is that they lack an inherent concept of persistence.

While Docker has solved this issue with mount points from the host, on Kubernetes we face more difficulties along the way. The smallest deployable unit of computing in Kubernetes is a Pod. Multiple instances of a Pod may be hosted on multiple physical machines. Even worse, different containers might run in the same Pod but access the same storage.

In this post, we’ll discuss two tools Kubernetes offers to help solve storage issues: volumes and persistent volumes. We’ll cover how and why you’d use each.

About Kubernetes volumes

Volumes offer storage shared between all containers in a Pod. This allows you to reliably use the same mounted file system with multiple services running in the same Pod. This is, however, not automatic. Containers that want to use a volume have to specify which volume they want to use, and where to mount it in the container’s file system.

Additionally, volumes come with a clearly defined lifetime. They are bound to the lifecycle of the Pod they belong to. As long as the Pod is active, the volume is there, too. However, when you restart the Pod, the volume gets reset. If this is not what you want, you should either use persistent volumes (discussed in the next section) or change your application's logic to accommodate this behavior appropriately.

While Kubernetes only cares about the formal definition of a volume, you also need to have a real (physical) file system allocated somewhere. This is where Kubernetes goes beyond what Docker offers. While Docker only maps a path from the host to the container, Kubernetes allows essentially anything as long as there is a proper provider for the storage.

You could use cloud options such as Amazon Elastic Block Store (EBS) or Azure Blob Storage, or open-source solutions such as Ceph. Using something as simple and generic as NFS is possible, too. If you want to use something similar to Docker’s mount path, you can fall back to the hostPath volume type.

So how do you create these volumes? You do so in the Pod definition.

Working with volumes

For example, consider creating a new Pod called sharedvolumeexample using two containers—both just sleeping. Using the volumes key, you can describe your volumes to be used within the containers.

kind: Pod

apiVersion: v1

metadata:

  name: sharedvolumeexample

spec:

  containers:

  - name: c1

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts:

      - name: xchange

        mountPath: "/tmp/xchange"

  - name: c2

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts: 

      - name: xchange

        mountPath: "/tmp/data"

volumes:

- name: xchange

  emptyDir: {}

To use a volume in a container, you need to specify volumeMounts as shown above. The mountPath key describes the volume access path.

To demonstrate how this shares the volume between the two containers, let’s run a little test. First, you should create the Pod from the spec (for example, sharedvolumeexample.yml):

kubectl apply -f sharedvolumeexample.yml

Then, you can access the terminal on the first container, c1, using kubectl:

kubectl exec -it sharedvolumeexample -c c1 -- bash

Next, write some data into a file under the /tmp/xchange mount point:

echo 'some data' > /tmp/xchange/file.txt

Let’s open another terminal, connecting to the container called c2.

kubectl exec -it sharedvolumeexample -c c2 -- bash

The difference is that this time you read from its mounted storage at /tmp/data:

cat /tmp/data/file.txt

This yields “some data,” as expected. Now you can remove the Pod:

kubectl delete pod/sharedvolumeexample

Working with persistent volumes

When (regular) volumes don’t meet your needs, you can switch to a persistent volume.

A persistent volume is a storage object that lives at the cluster level. As a result, its lifetime isn’t tied to that of a single Pod, but rather to the cluster itself. A persistent volume makes it possible to share data between Pods.

One advantage of a persistent volume is that it can be shared not only between containers of a single Pod but also among multiple Pods. This means persistent volumes can be scaled by expanding their size. Reducing size, however, is not possible.

A persistent volume offers the same options for selecting the physical provider as a regular volume. Provisioning, however, is a bit different.

There are two ways to provision a persistent volume:

  • Statically: You already allocated everything on the storage side. Nothing to be done. The physical storage behind will always be the same.
  • Dynamically: You may want to extend the available storage space when the demand grows. The demand is settled via a volume claim resource, which we’ll discuss in a bit. To enable dynamic storage provisioning, you have to enable the DefaultStorageClass admission controller on the Kubernetes API server.

For growing systems with demand increase backed by scalable resources, dynamic provisioning makes more sense. Otherwise, we recommend staying with the simpler static provisioning.

Let’s try to create a persistent volume for a hostPath backed storage. Note that instead of configuring kind as Pod, we instead configure as PersistentVolume:

kind: PersistentVolume

apiVersion: v1

metadata:

  name: persvolumeexample

  labels:

    type: local

spec:

  capacity:

    storage: 10Gi

  accessModes:

    - ReadWriteOnce

  hostPath:

    path: "/tmp/data"

Same as Pods, these resources are created using the kubectl tool:

kubectl apply -f persvolumeexample.yml

In the example above, we created a new persistent volume named persvolumeexample, with the maximum storage capacity of 10 GB. As for the different access modes, you could specify ReadWriteOnce, ReadOnlyMany, and ReadWriteMany, though not all of these modes are available for every storage provider. For instance, AWS EBS only supports ReadWriteOnce.

You can use the created persistent volume via another resource: PersistentVolumeClaim. The claim ensures that there is enough space available. This may fail even if, during dynamic provisioning, Kubernetes actively tries to allocate more space.

Let’s create a claim for provisioning 3 GB:

kind: PersistentVolumeClaim

apiVersion: v1

metadata:

  name: myclaim-1

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 3Gi

The provisioning requires the use of kubectl:

kubectl apply -f myclaim-1.yml

When you run this command, Kubernetes looks for a persistent volume that matches the claim. Using the claim is simple:

kind: Pod

apiVersion: v1

metadata:

  name: volumeexample

spec:

  containers:

  - name: c1

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts:

      - name: xchange

        mountPath: "/tmp/xchange"

   - name: c2

     image: centos:7

     command:

       - "bin/bash"

       - "-c"

       - "sleep 10000"

     volumeMounts:

       - name: xchange

         mountPath: "/tmp/data"

volumes:

- name: xchange

  persistentVolumeClaim:

    claimName: myclaim-1

If you compare this example with the previous one, you’ll see that only the volumes section has changed, nothing else.

The claim manages only a fraction of the volume. To free this fraction, you’d have to delete the claim. The reclaim policy for a persistent volume tells Kubernetes what to do with the volume after it has been released of its claim. The options are Retain, Recycle (deprecated in preference of dynamic provisioning), and Delete.

To set the reclaim policy, you need to define the persistentVolumeReclaimPolicy option in the spec section of the PersistentVolume config. For instance, in the previous config this would look like:

kind: PersistentVolume

apiVersion: v1

metadata:

  name: persvolumeexample

  labels:

    type: local

spec:

  capacity:

    storage: 10Gi

  accessModes:

    - ReadWriteOnce

  persistentVolumeReclaimPolicy: Retain

  hostPath:

    path: "/tmp/data"

Wrapping up

Both volumes and persistent volumes allow you to add data storage that survives container restarts. While volumes are bound to the lifecycle of the Pod, persistent volumes can be defined independently of a specific Pod. They can then be used in any Pod.

The one you choose depends on your needs. A volume is deleted when the containing Pod shuts down, yet it is perfect when you need to share data between containers running in a Pod.

Since persistent volumes outlive individual Pods, they’re ideal when you have data that must survive Pod restarts or has to be shared between Pods.

Both types of storage are easy to set up and use in a cluster. Happy orchestrating!

Ready for a deep dive into Kubernetes monitoring? Check out A Complete Introduction to Monitoring Kubernetes with New Relic.