This is the final installment of a five-part series on Kubernetes Fundamentals.
There are many advantages to using containers to run applications. However, ease of storage is certainly not one of them. To do its job, a container must have a temporary file system. But when a container shuts down, any changes made to its file system are lost. A side effect of easily fungible containers is that they lack an inherent concept of persistence.
While Docker has solved this issue with mount points from the host, on Kubernetes we face more difficulties along the way. The smallest deployable unit of computing in Kubernetes is a Pod. Multiple instances of a Pod may be hosted on multiple physical machines. Even worse, different containers might run in the same Pod but access the same storage.
In this post, we’ll discuss two tools Kubernetes offers to help solve storage issues: volumes and persistent volumes. We’ll cover how and why you’d use each.
About Kubernetes volumes
Volumes offer storage shared between all containers in a Pod. This allows you to reliably use the same mounted file system with multiple services running in the same Pod. This is, however, not automatic. Containers that want to use a volume have to specify which volume they want to use, and where to mount it in the container’s file system.
Additionally, volumes come with a clearly defined lifetime. They are bound to the lifecycle of the Pod they belong to. As long as the Pod is active, the volume is there, too. However, when you restart the Pod, the volume gets reset. If this is not what you want, you should either use persistent volumes (discussed in the next section) or change your application's logic to accommodate this behavior appropriately.
While Kubernetes only cares about the formal definition of a volume, you also need to have a real (physical) file system allocated somewhere. This is where Kubernetes goes beyond what Docker offers. While Docker only maps a path from the host to the container, Kubernetes allows essentially anything as long as there is a proper provider for the storage.
You could use cloud options such as Amazon Elastic Block Store (EBS) or Azure Blob Storage, or open-source solutions such as Ceph. Using something as simple and generic as NFS is possible, too. If you want to use something similar to Docker’s mount path, you can fall back to the hostPath volume type.
So how do you create these volumes? You do so in the Pod definition.
Working with volumes
For example, consider creating a new Pod called sharedvolumeexample using two containers—both just sleeping. Using the volumes key, you can describe your volumes to be used within the containers.
kind: Pod apiVersion: v1 metadata: name: sharedvolumeexample spec: containers: - name: c1 image: centos:7 command: - "bin/bash" - "-c" - "sleep 10000" volumeMounts: - name: xchange mountPath: "/tmp/xchange" - name: c2 image: centos:7 command: - "bin/bash" - "-c" - "sleep 10000" volumeMounts: - name: xchange mountPath: "/tmp/data" volumes: - name: xchange emptyDir: {}
To use a volume in a container, you need to specify volumeMounts
as shown above. The mountPath
key describes the volume access path.
To demonstrate how this shares the volume between the two containers, let’s run a little test. First, you should create the Pod from the spec (for example, sharedvolumeexample.yml
):
kubectl apply -f sharedvolumeexample.yml
Then, you can access the terminal on the first container, c1, using kubectl:
kubectl exec -it sharedvolumeexample -c c1 -- bash
Next, write some data into a file under the /tmp/xchange
mount point:
echo 'some data' > /tmp/xchange/file.txt
Let’s open another terminal, connecting to the container called c2.
kubectl exec -it sharedvolumeexample -c c2 -- bash
The difference is that this time you read from its mounted storage at /tmp/data
:
cat /tmp/data/file.txt
This yields “some data,” as expected. Now you can remove the Pod:
kubectl delete pod/sharedvolumeexample
Working with persistent volumes
When (regular) volumes don’t meet your needs, you can switch to a persistent volume.
A persistent volume is a storage object that lives at the cluster level. As a result, its lifetime isn’t tied to that of a single Pod, but rather to the cluster itself. A persistent volume makes it possible to share data between Pods.
One advantage of a persistent volume is that it can be shared not only between containers of a single Pod but also among multiple Pods. This means persistent volumes can be scaled by expanding their size. Reducing size, however, is not possible.
A persistent volume offers the same options for selecting the physical provider as a regular volume. Provisioning, however, is a bit different.
There are two ways to provision a persistent volume:
- Statically: You already allocated everything on the storage side. Nothing to be done. The physical storage behind will always be the same.
- Dynamically: You may want to extend the available storage space when the demand grows. The demand is settled via a volume claim resource, which we’ll discuss in a bit. To enable dynamic storage provisioning, you have to enable the DefaultStorageClass admission controller on the Kubernetes API server.
For growing systems with demand increase backed by scalable resources, dynamic provisioning makes more sense. Otherwise, we recommend staying with the simpler static provisioning.
Let’s try to create a persistent volume for a hostPath
backed storage. Note that instead of configuring kind as Pod, we instead configure as PersistentVolume
:
kind: PersistentVolume apiVersion: v1 metadata: name: persvolumeexample labels: type: local spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/tmp/data"
Same as Pods, these resources are created using the kubectl
tool:
kubectl apply -f persvolumeexample.yml
In the example above, we created a new persistent volume named persvolumeexample
, with the maximum storage capacity of 10 GB. As for the different access modes, you could specify ReadWriteOnce
, ReadOnlyMany
, and ReadWriteMany
, though not all of these modes are available for every storage provider. For instance, AWS EBS only supports ReadWriteOnce
.
You can use the created persistent volume via another resource: PersistentVolumeClaim
. The claim ensures that there is enough space available. This may fail even if, during dynamic provisioning, Kubernetes actively tries to allocate more space.
Let’s create a claim for provisioning 3 GB:
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim-1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
The provisioning requires the use of kubectl
:
kubectl apply -f myclaim-1.yml
When you run this command, Kubernetes looks for a persistent volume that matches the claim. Using the claim is simple:
kind: Pod apiVersion: v1 metadata: name: volumeexample spec: containers: - name: c1 image: centos:7 command: - "bin/bash" - "-c" - "sleep 10000" volumeMounts: - name: xchange mountPath: "/tmp/xchange" - name: c2 image: centos:7 command: - "bin/bash" - "-c" - "sleep 10000" volumeMounts: - name: xchange mountPath: "/tmp/data" volumes: - name: xchange persistentVolumeClaim: claimName: myclaim-1
If you compare this example with the previous one, you’ll see that only the volumes section has changed, nothing else.
The claim manages only a fraction of the volume. To free this fraction, you’d have to delete the claim. The reclaim policy for a persistent volume tells Kubernetes what to do with the volume after it has been released of its claim. The options are Retain
, Recycle
(deprecated in preference of dynamic provisioning), and Delete
.
To set the reclaim policy, you need to define the persistentVolumeReclaimPolicy
option in the spec section of the PersistentVolume
config. For instance, in the previous config this would look like:
kind: PersistentVolume apiVersion: v1 metadata: name: persvolumeexample labels: type: local spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain hostPath: path: "/tmp/data"
Wrapping up
Both volumes and persistent volumes allow you to add data storage that survives container restarts. While volumes are bound to the lifecycle of the Pod, persistent volumes can be defined independently of a specific Pod. They can then be used in any Pod.
The one you choose depends on your needs. A volume is deleted when the containing Pod shuts down, yet it is perfect when you need to share data between containers running in a Pod.
Since persistent volumes outlive individual Pods, they’re ideal when you have data that must survive Pod restarts or has to be shared between Pods.
Both types of storage are easy to set up and use in a cluster. Happy orchestrating!
Ready for a deep dive into Kubernetes monitoring? Check out A Complete Introduction to Monitoring Kubernetes with New Relic.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.