You’ve probably seen / had to use Kubernetes to some extent, but maybe you don’t really get it yet? And you’d like to? Then this post is for you. In Part 1, we dug into Pods, Deployments, and Services. In Part 2 we looked at configuration and persistence. Here we move into “what’s left”

Map of the territory

Ingress

Ingress is how you surface HTTP services running in Kubernetes. It supports path-based routing as well as host-based routing.

Recreate your cluster

We’ll need a bit of extra setup here since by default a kind cluster has no port mapping to your host. So let’s fix that.

Create a file called kind-config.yaml with the following contents:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 8080
        protocol: TCP
      - containerPort: 443
        hostPort: 8443
        protocol: TCP

And then tear down your cluster and recreate it using this config:

kind delete cluster --name learning
kind create cluster --name learning --config kind-config.yaml

With this configuration, kind maps ports 80 and 443 in your container to ports 8080 and 8443 on your host.

Install two HTTP services

Create a file called manifests/ingress-demo.yaml with the following contents:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-one
spec:
  replicas: 1
  selector:
    matchLabels: { app: app-one }
  template:
    metadata: { labels: { app: app-one } }
    spec:
      containers:
        - name: app
          image: hashicorp/http-echo
          args: ["-text=Hello from App One"]
          ports: [{ containerPort: 5678 }]
---
apiVersion: v1
kind: Service
metadata:
  name: app-one-svc
spec:
  selector: { app: app-one }
  ports: [{ port: 80, targetPort: 5678 }]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-two
spec:
  replicas: 1
  selector:
    matchLabels: { app: app-two }
  template:
    metadata: { labels: { app: app-two } }
    spec:
      containers:
        - name: app
          image: hashicorp/http-echo
          args: ["-text=Hello from App Two"]
          ports: [{ containerPort: 5678 }]
---
apiVersion: v1
kind: Service
metadata:
  name: app-two-svc
spec:
  selector: { app: app-two }
  ports: [{ port: 80, targetPort: 5678 }]

Apply the manifest to launch the Services and Deployments. http-echo is a tiny webserver that always responds with a specific string – we’re just using it here so that it’s obvious which Service you’re hitting.

kubectl apply -f manifests/ingress-demo.yaml

Install an ingress controller

Kubernetes does not come with a default implementation for how to build Ingress objects, so before we can create an Ingress we need to install an Ingress Controller. Install nginx-ingress using its kind-specific manifest:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=120s

Path-based Routing

Create a file called manifests/ingress.yaml with the following contents:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
    - http:
        paths:
          - path: /app1
            pathType: Prefix
            backend:
              service:
                name: app-one-svc
                port: { number: 80 }
          - path: /app2
            pathType: Prefix
            backend:
              service:
                name: app-two-svc
                port: { number: 80 }

And then apply it:

kubectl apply -f manifests/ingress.yaml
kubectl get ingress

Now, since we mapped port 8080 on our cluster to port 80 inside it, we can finally look at these in your browser.

Visit:

You should see “Hello from app one” and “Hello from app two” respectively. Effectively what this has bought us is a single load-balancer in front of an arbitrary number of internal Services.

Host-based routing

Add a couple rules to your manifest:

rules:
  - host: app1.localtest.me
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service: { name: app-one-svc, port: { number: 80 } }
  - host: app2.localtest.me
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service: { name: app-one-svc, port: { number: 80 } }

localtest.me is a real public domain that just resolves to 127.0.0.1, which makes our lives a bit easier here.

Now re-apply the manifest and visit:

Clean up

kubectl delete -f manifests/ingress.yaml
kubectl delete -f manifests/ingress-demo.yaml

Health

On its own, Kubernetes’ visibility into whether your application is working or not is just whether or not its main process has crashed. But there’s lots of other ways your application can be non-functional: maybe your database isn’t responding, or you’ve run out of disk space, etc. Health checks let you define what “healthy” actually means for your application.

Deploy a boken app

Create a file called manifests/health-app.yaml with the following contents:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: health-demo
spec:
  replicas: 1
  selector:
    matchLabels: { app: health-demo }
  template:
    metadata: { labels: { app: health-demo } }
    spec:
      containers:
        - name: app
          image: nginx:1.25
          ports: [{ containerPort: 80 }]
          livenessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /healthz
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
  name: health-demo-svc
spec:
  selector: { app: health-demo }
  ports: [{ port: 80, targetPort: 80 }]

We’ve defined two health probes, a livenessProbe which hits /, and a readinessProbe which hits /healthz.

Aside: It’s convention for kubernetes-specific endpoints to have a “z” at the end (e.g. healthz, statusz, varz, etc). The idea is just to append a “z” at the end because it’s an unusual-enough character that these endpoints are not likely to conflict with anything “real” in your application. But you can of course call them whatever you want.

Observe the result

Apply the manifest and watch the pods come up:

kubectl apply -f manifests/health-app.yaml
kubectl get pods -w

You’ll see the Pod come up, with the STATUS column eventually settling on “Running”, but the READY column stays at 0/1. Since we haven’t configured this webserver to have a /healthz path, the readiness probes fail with a 404 error (which counts as a probe failure).

You can see this more explicitly if you describe the Pod:

kubectl describe pod -l app=health-demo

Under “Conditions” you’ll see Ready and ContainerReady reporting “False”, and if you look at the events section at the bottom, you should see that the readiness probe failed with a 404, as expected. Liveness reports as fine, since we configured it to probe / which exists in a default nginx installation.

Now look at the Service:

kubectl get endpoints health-demo-svc

You’ll see it doesn’t have any addresses. The Pod exists and is running but the Service (aware that the Pod is not ready) refuses to send it any traffic. That’s essentially the point of Readiness: to ensure the Service does not route traffic to Pods that are not ready to handle it.

Fix the readiness

In real life, you would build a /healthz endpoint in your application that exercised all critical dependencies (such as performing a SELECT 1 on your database to ensure that the database is working and responding) before responding with a 200 status code.

In our case, just edit the manifest to change the readinessProbe path from /healthz to / in order to get a clean “success”. Reapply, then look at your pods again:

kubectl apply -f manifests/health-app.yaml
kubectl get pods -w

You’ll see it spin up a new Pod and terminate the old one, and this time the Pod will report as Ready. Now when you look at the Service:

kubectl get endpoints health-demo-svc

you’ll now see the Pod’s IP address.

Trigger a liveness failure

Update the manifest again, this time changing the livenessProbe path to /idontexist. Re-apply the manifest and watch the Pods:

kubectl apply -f manifests/health-app.yaml
kubectl get pods -w

This time, the application comes up as expected, but you’ll see that the container keeps restarting before eventually ending up in a CrashLoopBackoff state. You can confirm this by describing the Pod:

kubectl describe pod -l app=health-demo

Look for “Last State: Terminated”, and for the telltale signature of the problem in the Events section.

Readiness vs Liveness

Readiness probes answer the question “is this Pod ready to receive traffic right now”. Your readiness endpoint should fail on things like if the Pod is still warming up, or if it’s temporarily overloaded and needs time to handle its existing workload, or if a downstream API that you depend on is having an outage.

Liveness probes answer the question “is this Pod fundamentally broken, needing to be replaced”. Your liveness endpoint should fail on things like deadlocks, or if its internal state is corrupted and unrecoverable.

Note that it’s an antipattern to point your liveness probe at something that depends on an external system like a database or downstream API. Liveness checks whether your own process is healthy, not whether your dependencies are. Consider: killing and restarting your application Pods will not fix a Github API outage.

StartupProbes

If you have the misfortune of working on a Java application, or your application otherwise takes a long time to start up, livenessProbes can fire prematurely and kill a perfectly fine app that hasn’t finished booting.

You can use a startupProbe defers liveness checking until after startup-specific checks pass:

startupProbe:
  httpGet: { path: /startupz, port: 80 }
  failureThreshold: 30
  periodSeconds: 2

Adding the above stanza to your manifest will give your application 60 seconds to become reachable before liveness probing begins.

You could just add a initialDelaySeconds: 60 to your liveness probe instead, but this would always defer liveness probes for a full minute regardless of how long startup takes. The startupProbe above will check every 2 seconds, and then begin your liveness probes as soon as the first success occurs.

Clean up

kubectl delete -f manifests/health-app.yaml

Namespaces & Resource Limits

You didn’t think we’d get through all this without at least one OOMKill, did you?

Set up a couple namespaces

First create two namespaces:

kubectl create namespace team-a
kubectl create namespace team-b

And then deploy our nginx deployment into each. Note that if your manifest still points to a non-existent image, you might need to fix it first:

kubectl apply -f manifests/deployment.yaml -n team-a
kubectl apply -f manifests/deployment.yaml -n team-b

kubectl get pods -n team-a
kubectl get pods -n team-b

You can also get pods from all namespaces:

kubectl get pods -A

Set your default namespace

If you’re already tired of constantly typing -n team-a on every command, you’re not alone. You can switch your current namespace with this command:

kubectl config set-context --current --namespace=team-a
kubectl get pods

And then switch back when you’re done:

kubectl config set-context --current --namespace=default

…and if you’re already tired of typing that command, congratulations, now you know why tools and aliases like kubens exists. At this point you’re probably also getting a feel for why people tend to configure an alias for k=kubectl.

Set resource limits

Edit manifests/deployment.yaml to add a resources section to your container spec:

resources:
  requests:
    cpu: "100m"
    memory: "64Mi"
  limits:
    cpu: "200m"
    memory: "128Mi"

The “requests” section is what the scheduler uses to determine which Node has space for this Pod. The “limits” section are the actual ceilings enforced at runtime by the kernel.

Apply the manifest and see the constraints in action:

kubectl apply -f manifests/deployment.yaml -n team-a
kubectl describe pod -n team-a

You should see the requests and limits sections populated in the output.

OOMKill

Create a file called manifests/oom-test.yaml with the following contents:

apiVersion: v1
kind: Pod
metadata:
  name: oom-test
spec:
  containers:
    - name: hog
      image: polinux/stress
      resources:
        requests:
          memory: "50Mi"
        limits:
          memory: "100Mi"
      command: ["stress"]
      args: ["--vm", "1", "--vm-bytes", "200M", "--vm-hang", "1"]
  restartPolicy: Never

Here, we’re launching a process that attempts to consume 200 megs of RAM in a container limited to just 100Mi.

Apply the manifest, and look at the resulting Pod:

kubectl apply -f manifests/oom-test.yaml
kubectl get pod oom-test -n team-a

You should see its status listed as OOMKilled. You can also see this when describing the Pod:

kubectl describe pod oom-test -n team-a

Note that the process isn’t being terminated by Kubernetes itself, or anything at the scheduling layer. This is the kernel’s cgroup memory controller killing the process for exceeding its hard limit.

CPU throttling

Create a file called manifests/cpu-test.yaml with the following contents:

apiVersion: v1
kind: Pod
metadata:
  name: cpu-test
spec:
  containers:
    - name: hog
      image: polinux/stress
      resources:
        requests:
          cpu: "100m"
        limits:
          cpu: "200m"
      command: ["stress"]
      args: ["--cpu", "2"]

Here, we’re launching a process that attempts to occupy 2 full CPU cores in a container limited to just 20% of a core.

Apply the manifest, and look at the resulting Pod:

kubectl apply -f manifests/cpu-test.yaml -n team-a
kubectl get pod cpu-test -n team-a

Unlike the OOM case above, here the container is running just fine, albeit slowly: the CPU is being throttled down to the configured limit.

Here’s a dumb little mnemonic to help you remember the difference in behavior:

“No more room, RAM go boom; Hog the cores, process snores.”

Namespace limits

Create a file called manifests/resourcequota.yaml with the following contents:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "1"
    requests.memory: "1Gi"
    limits.cpu: "2"
    limits.memory: "2Gi"
    pods: "10"

Then apply it:

kubectl apply -f manifests/resourcequota.yaml

Previously, we set requests and limits on individual containers, but now we’re doing it at the namespace level. In addition to “requests” and “limits”, we’ve also thrown in a “max 10 pods” constraint for good measure.

Let’s deploy our old nginx deployment into both namespaces, and scale them up.

kubectl apply -f manifests/deployment.yaml -n team-a
kubectl apply -f manifests/deployment.yaml -n team-b

kubectl scale deployment nginx-deploy --replicas=30 -n team-a
kubectl scale deployment nginx-deploy --replicas=30 -n team-b

kubectl get pods -A

You’ll see that while the deployment in the team-b namespace scaled up, the one in team-a was capped at 10 Pods. We can look at this in more detail:

kubectl get events -n team-a

You’ll see a number of events of the form

Error creating: pods “nginx-deploy-6b986887fc-mbcc4” is forbidden: exceeded quota: team-a-quota, requested: limits.cpu=200m,pods=1,requests.cpu=100m, used: limits.cpu=2,pods=10,requests.cpu=1, limited: limits.cpu=2,pods=10,requests.cpu=1

Clean up

Deleting a namespace deletes everything in it:

kubectl delete namespace team-a
kubectl delete namespace team-b

Next steps

Stay tuned for part 4!