Search

Kubernetes Fundamentals, Part 5: Working with Kubernetes Volumes

There are many advantages to using containers to run applications. However, ease of storage is certainly not one of them. To do its job, a container must have a temporary file system. But when a container shuts down, any changes made to its file system are lost. A side effect of easily fungible containers is that they lack an inherent concept of persistence.

While Docker has solved this issue with mount points from the host, on Kubernetes we face more difficulties along the way. The smallest deployable unit of computing in Kubernetes is a Pod. Multiple instances of a Pod may be hosted on multiple physical machines. Even worse, different containers might run in the same Pod but access the same storage.

In this post, we’ll discuss two tools Kubernetes offers to help solve storage issues: volumes and persistent volumes. We’ll cover how and why you’d use each.

About Kubernetes volumes

Volumes offer storage shared between all containers in a Pod. This allows you to reliably use the same mounted file system with multiple services running in the same Pod. This is, however, not automatic. Containers that want to use a volume have to specify which volume they want to use, and where to mount it in the container’s file system.

Additionally, volumes come with a clearly defined lifetime. They are bound to the lifecycle of the Pod they belong to. As long as the Pod is active, the volume is there, too. However, when you restart the Pod, the volume gets reset. If this is not what you want, you should either use persistent volumes (discussed in the next section) or change your application’s logic to accommodate this behavior appropriately.

While Kubernetes only cares about the formal definition of a volume, you also need to have a real (physical) file system allocated somewhere. This is where Kubernetes goes beyond what Docker offers. While Docker only maps a path from the host to the container, Kubernetes allows essentially anything as long as there is a proper provider for the storage.

You could use cloud options such as Amazon Elastic Block Store (EBS) or Azure Blob Storage, or open-source solutions such as Ceph. Using something as simple and generic as NFS is possible, too. If you want to use something similar to Docker’s mount path, you can fall back to the hostPath volume type.

So how do you create these volumes? You do so in the Pod definition.

Working with volumes

For example, consider creating a new Pod called sharedvolumeexample using two containers—both just sleeping. Using the volumes key, you can describe your volumes to be used within the containers.

				
					kind: Pod
apiVersion: v1
metadata:
  name: sharedvolumeexample
spec:
  containers:
  - name: c1
    image: centos:7
    command:
      - "bin/bash"
      - "-c"
      - "sleep 10000"
    volumeMounts:
      - name: xchange
        mountPath: "/tmp/xchange"
  - name: c2
    image: centos:7
    command:
      - "bin/bash"
      - "-c"
      - "sleep 10000"
    volumeMounts: 
      - name: xchange
        mountPath: "/tmp/data"
volumes:
- name: xchange
  emptyDir: {}
				
			

To use a volume in a container, you need to specify volumeMounts as shown above. The mountPath key describes the volume access path.

To demonstrate how this shares the volume between the two containers, let’s run a little test. First, you should create the Pod from the spec (for example, sharedvolumeexample.yml):

				
					kubectl apply -f sharedvolumeexample.yml
				
			

Then, you can access the terminal on the first container, c1, using kubectl:

				
					kubectl exec -it sharedvolumeexample -c c1 -- bash
				
			

Next, write some data into a file under the /tmp/xchange mount point:

				
					echo 'some data' > /tmp/xchange/file.txt
				
			

Let’s open another terminal, connecting to the container called c2.

				
					kubectl exec -it sharedvolumeexample -c c2 -- bash
				
			

The difference is that this time you read from its mounted storage at /tmp/data:

				
					cat /tmp/data/file.txt
				
			

This yields “some data,” as expected. Now you can remove the Pod:

				
					kubectl delete pod/sharedvolumeexample
				
			

Working with persistent volumes

When (regular) volumes don’t meet your needs, you can switch to a persistent volume.

A persistent volume is a storage object that lives at the cluster level. As a result, its lifetime isn’t tied to that of a single Pod, but rather to the cluster itself. A persistent volume makes it possible to share data between Pods.

One advantage of a persistent volume is that it can be shared not only between containers of a single Pod but also among multiple Pods. This means persistent volumes can be scaled by expanding their size. Reducing size, however, is not possible.

A persistent volume offers the same options for selecting the physical provider as a regular volume. Provisioning, however, is a bit different.

There are two ways to provision a persistent volume:

  • Statically: You already allocated everything on the storage side. Nothing to be done. The physical storage behind will always be the same.
  • Dynamically: You may want to extend the available storage space when the demand grows. The demand is settled via a volume claim resource, which we’ll discuss in a bit. To enable dynamic storage provisioning, you have to enable the DefaultStorageClass admission controller on the Kubernetes API server.

For growing systems with demand increase backed by scalable resources, dynamic provisioning makes more sense. Otherwise, we recommend staying with the simpler static provisioning.

Let’s try to create a persistent volume for a hostPath backed storage. Note that instead of configuring kind as Pod, we instead configure as PersistentVolume:

				
					kind: PersistentVolume
apiVersion: v1
metadata:
  name: persvolumeexample
  labels:
    type: local
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/tmp/data"
				
			

Same as Pods, these resources are created using the kubectl tool:

				
					kubectl apply -f persvolumeexample.yml
				
			

In the example above, we created a new persistent volume named persvolumeexample, with the maximum storage capacity of 10 GB. As for the different access modes, you could specify ReadWriteOnceReadOnlyMany, and ReadWriteMany, though not all of these modes are available for every storage provider. For instance, AWS EBS only supports ReadWriteOnce.

You can use the created persistent volume via another resource: PersistentVolumeClaim. The claim ensures that there is enough space available. This may fail even if, during dynamic provisioning, Kubernetes actively tries to allocate more space.

Let’s create a claim for provisioning 3 GB:

				
					kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
				
			

The provisioning requires the use of kubectl:

				
					kubectl apply -f myclaim-1.yml
				
			

When you run this command, Kubernetes looks for a persistent volume that matches the claim. Using the claim is simple:

				
					kind: Pod
apiVersion: v1
metadata:
  name: volumeexample
spec:
  containers:
  - name: c1
    image: centos:7
    command:
      - "bin/bash"
      - "-c"
      - "sleep 10000"
    volumeMounts:
      - name: xchange
        mountPath: "/tmp/xchange"
   - name: c2
     image: centos:7
     command:
       - "bin/bash"
       - "-c"
       - "sleep 10000"
     volumeMounts:
       - name: xchange
         mountPath: "/tmp/data"
volumes:
- name: xchange
  persistentVolumeClaim:
    claimName: myclaim-1
				
			

If you compare this example with the previous one, you’ll see that only the volumes section has changed, nothing else.

The claim manages only a fraction of the volume. To free this fraction, you’d have to delete the claim. The reclaim policy for a persistent volume tells Kubernetes what to do with the volume after it has been released of its claim. The options are RetainRecycle (deprecated in preference of dynamic provisioning), and Delete.

To set the reclaim policy, you need to define the persistentVolumeReclaimPolicy option in the spec section of the PersistentVolume config. For instance, in the previous config this would look like:

				
					kind: PersistentVolume
apiVersion: v1
metadata:
  name: persvolumeexample
  labels:
    type: local
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/tmp/data"
				
			

 

Wrapping up

Both volumes and persistent volumes allow you to add data storage that survives container restarts. While volumes are bound to the lifecycle of the Pod, persistent volumes can be defined independently of a specific Pod. They can then be used in any Pod.

The one you choose depends on your needs. A volume is deleted when the containing Pod shuts down, yet it is perfect when you need to share data between containers running in a Pod.

Since persistent volumes outlive individual Pods, they’re ideal when you have data that must survive Pod restarts or has to be shared between Pods.

Both types of storage are easy to set up and use in a cluster. Happy orchestrating!

This article was originally published on New Relic’s blog.

If you’re interested in developing expert technical content that performs, let’s have a conversation today.

Facebook
Twitter
LinkedIn
Reddit
Email

POST INFORMATION

If you work in a tech space and aren’t sure if we cover you, hit the button below to get in touch with us. Tell us a little about your content goals or your project, and we’ll reach back within 2 business days. 

Share via
Copy link
Powered by Social Snap