Before we dive into Kubernetes resources, let’s clarify what the term “resource” refers to here. Anything we create in a Kubernetes cluster is considered a resource: deployments, pods, services, and more. However, Kubernetes’s primary resources are CPU and RAM. Kubernetes also includes some other resource types, like ephemeral storage and extended resources.
One aspect of cluster management is to assign these resources automatically to containers running in pods so that, ideally, each container has as many resources as needed, but no more.
In this article, we’ll highlight hardware resources that we provide to containers running on a cluster. We’ll break down four of the most common Kubernetes resources developers work with on a daily basis: CPU, RAM, ephemeral storage, and extended resources. For each resource, we’ll explore how it’s measured within Kubernetes, review how to monitor each particular resource, and highlight some best practices for optimizing resource use.
Understanding Kubernetes Resource Types
Let’s explore each primary Kubernetes resource type in-depth. Then let’s see these resource types in action with some code samples.
A Kubernetes cluster typically runs on multiple machines, each with multiple CPU cores. They sum up to a total number of available cores, like 4 machines times 4 cores for a total of 16.
We don’t need to work with whole numbers of cores. We can specify any fraction of a CPU core in 1/1000th increments (for example, half a core or 500 mill-CPU).
Kubernetes containers run on the Linux kernel, which allows specifying
cgroups to limit resources. The Linux scheduler compares the CPU time used (defined by internal time slices) with the defined limit to decide whether or not to run a container in the next time slice. We can query CPU resources with the
kubectl top command, invoking it for a pod or node.
We can optimize our use of processor time by making the program running in a container more efficient, either through improved algorithms and coding or by compiler optimization. The cluster user doesn’t have much influence on the speed or efficiency of pre-compiled containers.
The machines in a Kubernetes cluster also each have RAM, which again sums up to a cluster total. For example, 4 machines times 32 GiB is 128 GiB.
The kernel level controls main memory, similarly to CPU time with
cgroups. If a routine in a container requests memory allocation beyond a hard limit, it signals an out-of-memory error.
Optimizing resource use is largely up to the application’s development effort. One step is to improve garbage collection frequency to keep a heap-based image from allocating memory beyond a hard limit. Again, the kubectl top command can provide information about memory use.
Exploring CPU and RAM
As our first in-depth example, let’s deploy three replicated containers of the popular web server Nginx to a local Kubernetes installation. We’re running a one-node “cluster” on our laptop, which only has two cores and 2 GiB of memory.
The code below defines such a pod deployment and grants each of three Nginx containers one-tenth of a core (100 milli-CPU) and 100 MiB of main memory. The code below also limits their use to double the requested values.
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "200m" memory: "200Mi" ports: - containerPort: 80
We can deploy into the default namespace like this:
kubectl apply -f nginx.yaml
The local cluster only has a single node. Use this command to return detailed information about it:
kubectl describe nodes docker-desktop
After clipping most of the output, we can examine some information about resource use:
[...] Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- default nginx-deployment-585bd9cc5f-djql8 100m (5%) 200m (10%) 100Mi (5%) 200Mi (10%) 66s default nginx-deployment-585bd9cc5f-gz98r 100m (5%) 200m (10%) 100Mi (5%) 200Mi (10%) 66s default nginx-deployment-585bd9cc5f-vmdnc 100m (5%) 200m (10%) 100Mi (5%) 200Mi (10%) 66s [...] Resource Requests Limits -------- -------- ------ cpu 1150m (57%) 600m (30%) memory 540Mi (28%) 940Mi (49%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) [...]
This information shows the CPU and memory use requests and limits, just as our deployment object specified. It also displays the values as a percentage of the maximum possible allotment.
Next are the current totals for this node, again listed as absolute values and percentages. These numbers include some other containers running in the
kube-system namespace that we haven’t shown here, so there will be a discrepancy not covered by the output above.
The above snippet’s last three lines indicate other types of resources beyond CPU and memory, which don’t have set requests or limits in this example.
One additional Kubernetes resource type is ephemeral storage. This is mounted hard drive (HDD) or SSD storage that doesn’t survive the pod’s lifecycle. Kubernetes often uses ephemeral storage for caching or logs, but never uses it for important data, like user records. We can request or limit ephemeral storage like main memory, but it’s often not as scarce as larger hard drives can serve it.
So what do
hugepages-2Mi mean in the code snippet above? Huge pages are a modern memory feature of the Linux kernel to allocate large main memory pages of configurable size to processes. We can do this for efficiency.
Kubernetes supports assigning such large pages to containers. These form a resource type per page size that we can request separately.
When specifying a request or limit, we set the total amount of memory, not the number of pages.
limits: hugepages-2Mi: "100Mi" hugepages-1Gi: "2Gi"
Here, we limit the number of 2 MiB pages to 50 and the number of 1 GiB pages to 2.
Cluster users can also define their own resource types — per cluster or node — using the extended resource type. Once we’ve defined a type and specified available units, we can use requests and limits, just as with the built-in resources we’ve used so far.
An example is:
limits: cpu: "200m" myproject.com/handles: 100
This setting limits the container to 20 percent of a core and 100 of our project’s handles.
Resource Requests and Limits
Notice that resource requests and limits were key to our conversation about ephemeral storage and extended resources. This is because the cluster user can specify resource requests and limits. Requests indicate how much of a resource a container should have. They help assign pods to nodes.
Limits also indicate a hard upper bound on how much of a resource a container can use, enforced at the operating system level. Requests and limits are optional, and if we don’t specify a limit, a container can use most of the node’s resources. So, we must be cautious.
Bear in mind that although a pod can contain more than one container, the regular use case is one container per pod. We allocate resources to containers, but all of a pod’s containers are together on a node.
Considering Quality of Service
The resources system we’ve described so far is a fairly simple way of managing compute resources. Kubernetes offers a simple Quality of Service (QoS) system on top of this.
QoS describes a technical system’s means of offering different service levels while maintaining the best overall quality, given the hardware’s limitations. The Kubernetes QoS system assigns one of three levels to a pod: Guaranteed, Burstable, and BestEffort. Refer to the Kubernetes documentation to learn how to assign these levels and how they affect pod scheduling.
The Guaranteed level offers exactly the requested and limited resources during the pod’s lifetime and suits applications like monitoring systems that run at a constant load.
The Burstable service level is suitable for pods with a basic use profile that can sometimes increase above the baseline due to increased demand. This level is ideal for databases or web servers, whose load depends on the number of incoming requests.
Finally, BestEffort makes no resource availability guarantee. So, it’s best suited for applications like batch jobs that can repeat if needed, or for staging environments that aren’t mission-critical.
Kubernetes clusters maintain hardware resources like CPU time, memory, ephemeral storage, and extended resources, and assign them to running containers. Through a system of requests and limits, operators can tailor resource use to individual containers, then let the Kubernetes system assign them to nodes appropriately.
Extended resources enable us to define our own resource types and use them similarly. Kubernetes also assigns Quality of Service designations to pods according to requests and limits. It then uses these designations to make scheduling and termination decisions.
Kubernetes resource optimization is essential to balance costs with the end-user experience. Yet, assigning parameters by hand using this article’s methods can be time-consuming, costly, and difficult to scale.
We’d rather spend our time creating exciting new features that drive competitive advantage or improved user experience than worrying about optimization and resource use. The StormForge platform helps manage and optimize your Kubernetes resources automatically, using machine learning to find the best configuration based on our cost and performance goals.
If you’re interested in developing expert technical content that performs, let’s have a conversation today.