This is the final installment of a five-part series on Kubernetes Fundamentals.

There are many advantages to using containers to run applications. However, ease of storage is certainly not one of them. To do its job, a container must have a temporary file system. But when a container shuts down, any changes made to its file system are lost. A side effect of easily fungible containers is that they lack an inherent concept of persistence.

While Docker has solved this issue with mount points from the host, on Kubernetes we face more difficulties along the way. The smallest deployable unit of computing in Kubernetes is a Pod. Multiple instances of a Pod may be hosted on multiple physical machines. Even worse, different containers might run in the same Pod but access the same storage.

In this post, we’ll discuss two tools Kubernetes offers to help solve storage issues: volumes and persistent volumes. We’ll cover how and why you’d use each.

 

NEW RELIC KUBERNETES INTEGRATION
kubernetes logo

About Kubernetes volumes

Volumes offer storage shared between all containers in a Pod. This allows you to reliably use the same mounted file system with multiple services running in the same Pod. This is, however, not automatic. Containers that want to use a volume have to specify which volume they want to use, and where to mount it in the container’s file system.

Kubernetes volumes vs volumeMounts

In the Kubernetes ecosystem, the terms "volumes" and "volumeMounts" often come up, and while they're closely related, they serve distinct purposes.

Kubernetes "volumes" define the storage medium or source itself. Think of them as the actual hard drive or storage device. They can be local storage, cloud storage, or other types.

On the other hand, "volumeMounts" dictate how these volumes are mounted within the containers in a pod. It's analogous to picking a specific folder on that hard drive to access your files. When you define a volumeMount, you specify the mount path (where in the container you'd like to see the volume) and often reference a named volume defined in the same pod. Essentially, while "volumes" provide the mechanism for storage, "volumeMounts" determine how and where that storage is accessible from within the containers.

Additionally, volumes come with a clearly defined lifetime. They are bound to the lifecycle of the Pod they belong to. As long as the Pod is active, the volume is there, too. However, when you restart the Pod, the volume gets reset. If this is not what you want, you should either use persistent volumes (discussed in the next section) or change your application's logic to accommodate this behavior appropriately.

While Kubernetes only cares about the formal definition of a volume, you also need to have a real (physical) file system allocated somewhere. This is where Kubernetes goes beyond what Docker offers. While Docker only maps a path from the host to the container, Kubernetes allows essentially anything as long as there is a proper provider for the storage. Before we dive into working with volumes, let’s review the types of volumes for your Kubernetes storage needs.

Types of Kubernetes volumes

There are many types of Kubernetes volumes designed to suit different storage needs. Let’s walk through them here.

  • EmptyDir: A temporary directory that exists as long as the pod hosting it is running. When a pod dies, data in its EmptyDir is lost.
  • HostPath: Allows a pod to access a file or directory located on the host node's filesystem, often used for single-node testing.
  • NFS (Network File System): Allows mounting an NFS share, making it accessible to the pod. It's suitable for multiple pod reads and writes on the same shared storage.
  • PersistentVolume (PV) and PersistentVolumeClaim (PVC): PV is a piece of storage in the cluster, and PVC is a request for that storage. This abstraction helps users to provision storage with different underlying capabilities, like performance or backup policies, without needing to know the specifics of the storage provider.
  • Cloud Provider Volumes: Major cloud providers like AWS, Azure, and GCP offer their block and file storage solutions, like EBS, Azure Disk, and GCE Persistent Disk, which are integrated into Kubernetes as volume types.
  • ConfigMap and Secret: Used to inject configuration data or secrets into pods. They aren't primarily used for storing data, but rather configuration settings or sensitive data.
  • CSI (Container Storage Interface): An industry standard that allows storage providers to develop plugins that expose their storage systems to containerized workloads in a standardized manner.
  • GlusterFS, CephFS, and others: Distributed file systems that can be integrated directly into Kubernetes to provide shared and scalable storage solutions.

So how do you create these volumes? You do so in the Pod definition.

Working with volumes

For example, consider creating a new Pod called sharedvolumeexample using two containers—both just sleeping. Using the volumes key, you can describe your volumes to be used within the containers.

kind: Pod

apiVersion: v1

metadata:

  name: sharedvolumeexample

spec:

  containers:

  - name: c1

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts:

      - name: xchange

        mountPath: "/tmp/xchange"

  - name: c2

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts: 

      - name: xchange

        mountPath: "/tmp/data"

volumes:

- name: xchange

  emptyDir: {}

To use a volume in a container, you need to specify volumeMounts as shown above. The mountPath key describes the volume access path.

To demonstrate how this shares the volume between the two containers, let’s run a little test. First, you should create the Pod from the spec (for example, sharedvolumeexample.yml):

kubectl apply -f sharedvolumeexample.yml

Then, you can access the terminal on the first container, c1, using kubectl:

kubectl exec -it sharedvolumeexample -c c1 -- bash

Next, write some data into a file under the /tmp/xchange mount point:

echo 'some data' > /tmp/xchange/file.txt

Let’s open another terminal, connecting to the container called c2.

kubectl exec -it sharedvolumeexample -c c2 -- bash

The difference is that this time you read from its mounted storage at /tmp/data:

cat /tmp/data/file.txt

This yields “some data,” as expected. Now you can remove the Pod:

kubectl delete pod/sharedvolumeexample

Working with persistent volumes

When (regular) volumes don’t meet your needs, you can switch to a persistent volume.

A persistent volume is a storage object that lives at the cluster level. As a result, its lifetime isn’t tied to that of a single Pod, but rather to the cluster itself. A persistent volume makes it possible to share data between Pods.

One advantage of a persistent volume is that it can be shared not only between containers of a single Pod but also among multiple Pods. This means persistent volumes can be scaled by expanding their size. Reducing size, however, is not possible.

A persistent volume offers the same options for selecting the physical provider as a regular volume. Provisioning, however, is a bit different.

There are two ways to provision a persistent volume:

  • Statically: You already allocated everything on the storage side. Nothing to be done. The physical storage behind will always be the same.
  • Dynamically: You may want to extend the available storage space when the demand grows. The demand is settled via a volume claim resource, which we’ll discuss in a bit. To enable dynamic storage provisioning, you have to enable the DefaultStorageClass admission controller on the Kubernetes API server.

For growing systems with demand increase backed by scalable resources, dynamic provisioning makes more sense. Otherwise, we recommend staying with the simpler static provisioning.

Let’s try to create a persistent volume for a hostPath backed storage. Note that instead of configuring kind as Pod, we instead configure as PersistentVolume:

kind: PersistentVolume

apiVersion: v1

metadata:

  name: persvolumeexample

  labels:

    type: local

spec:

  capacity:

    storage: 10Gi

  accessModes:

    - ReadWriteOnce

  hostPath:

    path: "/tmp/data"

Same as Pods, these resources are created using the kubectl tool:

kubectl apply -f persvolumeexample.yml

In the example above, we created a new persistent volume named persvolumeexample, with the maximum storage capacity of 10 GB. As for the different access modes, you could specify ReadWriteOnce, ReadOnlyMany, and ReadWriteMany, though not all of these modes are available for every storage provider. For instance, AWS EBS only supports ReadWriteOnce.

You can use the created persistent volume via another resource: PersistentVolumeClaim. The claim ensures that there is enough space available. This may fail even if, during dynamic provisioning, Kubernetes actively tries to allocate more space.

Let’s create a claim for provisioning 3 GB:

kind: PersistentVolumeClaim

apiVersion: v1

metadata:

  name: myclaim-1

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 3Gi

The provisioning requires the use of kubectl:

kubectl apply -f myclaim-1.yml

When you run this command, Kubernetes looks for a persistent volume that matches the claim. Using the claim is simple:

kind: Pod

apiVersion: v1

metadata:

  name: volumeexample

spec:

  containers:

  - name: c1

    image: centos:7

    command:

      - "bin/bash"

      - "-c"

      - "sleep 10000"

    volumeMounts:

      - name: xchange

        mountPath: "/tmp/xchange"

   - name: c2

     image: centos:7

     command:

       - "bin/bash"

       - "-c"

       - "sleep 10000"

     volumeMounts:

       - name: xchange

         mountPath: "/tmp/data"

volumes:

- name: xchange

  persistentVolumeClaim:

    claimName: myclaim-1

If you compare this example with the previous one, you’ll see that only the volumes section has changed, nothing else.

The claim manages only a fraction of the volume. To free this fraction, you’d have to delete the claim. The reclaim policy for a persistent volume tells Kubernetes what to do with the volume after it has been released of its claim. The options are Retain, Recycle (deprecated in preference of dynamic provisioning), and Delete.

To set the reclaim policy, you need to define the persistentVolumeReclaimPolicy option in the spec section of the PersistentVolume config. For instance, in the previous config this would look like:

kind: PersistentVolume

apiVersion: v1

metadata:

  name: persvolumeexample

  labels:

    type: local

spec:

  capacity:

    storage: 10Gi

  accessModes:

    - ReadWriteOnce

  persistentVolumeReclaimPolicy: Retain

  hostPath:

    path: "/tmp/data"

Wrapping up

Both volumes and persistent volumes allow you to add data storage that survives container restarts. While volumes are bound to the lifecycle of the Pod, persistent volumes can be defined independently of a specific Pod. They can then be used in any Pod.

The one you choose depends on your needs. A volume is deleted when the containing Pod shuts down, yet it is perfect when you need to share data between containers running in a Pod.

Since persistent volumes outlive individual Pods, they’re ideal when you have data that must survive Pod restarts or has to be shared between Pods.

Both types of storage are easy to set up and use in a cluster. Happy orchestrating!

Ready for a deep dive into Kubernetes monitoring? Check out A Complete Introduction to Monitoring Kubernetes with New Relic.