How to use Kubernetes with Microsoft Azure and GitHub and how to debug if it does not workout<!-- --> | <!-- -->Patrick Desjardins Blog
Patrick Desjardins Blog
Patrick Desjardins picture from a conference

How to use Kubernetes with Microsoft Azure and GitHub and how to debug if it does not workout

Posted on: July 28, 2022

Kubernetes is a convenient way to deploy an infrastructure of services in a central place. Once you have your service Docker images built and deployed you can use Kubernetes to deploy multiple instances across the cloud. In this article, we will see how Microsoft Azure can use a Kubernetes configuration generated with Helm and deploy many images from Azure Registry Container.

Create an Azure Kubernetes Service (AKS)

There are two ways to create a Azure Kubernetes Services: Azure Cli and Azure Portal.

Azure Cli

A few pieces of information is required that were built when we created our Azure Docker Image Repository. You need to use the Azure resource group name after --resource-group. The name of the Kubernetes service is after --name. The --attach-acr is the name of the Azure Container Registry (acr).

1az aks create --resource-group realtimepixel_resourcegroup --name realpixelask --location eastus --attach-acr realtimepixel --generate-ssh-keys

There are a few things to know:

  1. The name cannot have an underscore. Even if you enclose with double quote, it does not work
  2. The command takes a while to run. Expect at least 1 minute.

Azure Portal

Many online tutorials show the Azure CLI (az command), but it is also possible to create the Kubernetes service on the Microsoft Azure Portal.

azure kbs creation

I personally prefer to use a web interface. For creating a Kubernetes cluster, the Azure Portal guides you through several steps, which in my opinion worth doing that step that you will rarely do compared to operation Azure CLI command.

Configure Github Workflow

You can edit the existing Github workflow defined previously that was building the image and pushing it into Azure Container Registry (ACR). However, I created a new workflow to manually decide when to push the image into production.

Here is the entire Github workflow that I store in the repository in .github/workflows/k8sdeploy.yml

1name: Kubernetes Prod Deployment
2on:
3 workflow_dispatch:
4
5# Environment variables available to all jobs and steps in this workflow
6env:
7 REGISTRY_NAME: realtimepixel
8 CLUSTER_NAME: realpixelask2
9 CLUSTER_RESOURCE_GROUP: realtimepixel_resourcegroup
10 NAMESPACE: realtimepixel-prod
11
12jobs:
13 build:
14 runs-on: ubuntu-latest
15 steps:
16 - uses: actions/checkout@main
17
18 # Set the target Azure Kubernetes Service (AKS) cluster.
19 - uses: azure/aks-set-context@v1
20 with:
21 creds: ${{ secrets.AZURE_CREDENTIALS }}
22 cluster-name: ${{ env.CLUSTER_NAME }}
23 resource-group: ${{ env.CLUSTER_RESOURCE_GROUP }}
24
25 # Create namespace if doesn't exist
26 - run: |
27 kubectl create namespace ${{ env.NAMESPACE }} --dry-run=client -o json | kubectl apply -f -
28
29 - name: Helm tool installer
30 uses: Azure/setup-helm@v1
31
32 - name: Azure Login
33 uses: Azure/login@v1.1
34 with:
35 creds: ${{ secrets.AZURE_CREDENTIALS }}
36
37 - name: Get Latest Tag Redis
38 id: latesttagredis
39 run: |
40 tag_redis=$(az acr repository show-tags --name ${{env.REGISTRY_NAME}} --repository realtimepixel_redis --top 1 --orderby time_desc -o tsv)
41 echo "::set-output name=tag_redis::$tag_redis"
42
43 - name: Tag Redis
44 run: echo "Tag Redis is ${{ steps.latesttagredis.outputs.tag_redis }}"
45
46 - name: Get Latest Tag Backend
47 id: latesttagbackend
48 run: |
49 tag_backend=$(az acr repository show-tags --name ${{env.REGISTRY_NAME}} --repository realtimepixel_backend --top 1 --orderby time_desc -o tsv)
50 echo "::set-output name=tag_backend::$tag_backend"
51
52 - name: Tag Backend
53 run: echo "Tag Backend is ${{ steps.latesttagbackend.outputs.tag_backend }}"
54
55 - name: Get Latest Tag Frontend
56 id: latesttagfrontend
57 run: |
58 tag_frontend=$(az acr repository show-tags --name ${{env.REGISTRY_NAME}} --repository realtimepixel_frontend --top 1 --orderby time_desc -o tsv)
59 echo "::set-output name=tag_frontend::$tag_frontend"
60
61 - name: Tag Frontend
62 run: echo "Tag Frontend is ${{ steps.latesttagfrontend.outputs.tag_frontend }}"
63
64 - name: Deploy
65 run: >
66 helm upgrade realtimepixel ./kubernetes/realtimepixel
67 --install
68 --namespace=${{ env.NAMESPACE }}
69 --set namespace=${{env.NAMESPACE}}
70 --set image.pullPolicy=Always
71 --set image.redis.repository=${{env.REGISTRY_NAME}}.azurecr.io/realtimepixel_redis
72 --set image.redis.tag=${{ steps.latesttagredis.outputs.tag_redis }}
73 --set image.backend.repository=${{env.REGISTRY_NAME}}.azurecr.io/realtimepixel_backend
74 --set image.backend.tag=${{ steps.latesttagbackend.outputs.tag_backend }}
75 --set image.frontend.repository=${{env.REGISTRY_NAME}}.azurecr.io/realtimepixel_frontend
76 --set image.frontend.tag=${{ steps.latesttagfrontend.outputs.tag_frontend }}

Here is a description of what is going on:

  1. The first section, called env, is variable that can be used across the workflow. It is a simple way to configure data in a central place for the script. It defined the registry (Azure Container Registry) name, the Kubernetes cluster name created in this blog post, the Azure resource group (previous article), and the Kubernetes namespace.

  2. The second section connects to AKS: Azure Kubernetes Service

  3. We create the namespace

  4. We then connect with Helm and Azure Login. From now, we are ready to perform some commands

  5. First, we get the latest image tag for each image

  6. Finally, we use Helm to install or update the Kubernetes configuration. What is important is to override a lot of values from kubernetes\realtimepixel\values.yaml (Helm values file)

Verification

At this point, the Helm command pushes the instruction to Azure Kubernetes Service. Going in the portal you can see under Workloads the deployment.

azure k8s workloads

Everything should be running as expected! The screenshot shows three orange symbols next to my three deployments because there is an issue with the Kubernetes configuration, which is outside the goal of this post. However, it is still interesting to know that you can drill and see the error reason being: ErrImageNeverPull.

Debug Azure Kubernetes ErrImageNeverPull

The first steps are to access the reason the image is not pulling. You need to get one of the pods under the deployment that is falling.

1kubectl get pods -n realtimepixel-prod
2kubectl describe pod redis-deployment-6495cd48cc-fhzjg -n realtimepixel-prod

The last command gives more information saying the policy is to Never

1Events:
2Type Reason Age From Message
3---- ------ ---- ---- -------
4Normal Scheduled 59m default-scheduler Successfully assigned realtimepixel-prod/redis-deployment-6495cd48cc-fhzjg to aks-agentpool-28884595-vmss000002
5Warning Failed 57m (x12 over 59m) kubelet Error: ErrImageNeverPull
6Warning ErrImageNeverPull 4m36s (x260 over 59m) kubelet Container image "realtimepixel.azurecr.io/realtimepixel_redis:67dfbf27b868bd0b9e7c77aefafb596f2adb3ca0" is not present with pull policy of Never

So, once we know the problem is the Never, we can see what is happening between Helm and Azure Kubernetes. Trying to see what is sent to Kubernetes is by using the template command of Helm.

1helm template realtimepixel ./kubernetes/realtimepixel --set namespace=realtimepixel-prod --set image.pullPolicy=Always --set image.redis.repository=realtimepixel.azurecr.io/realtimepixel_redis --set image.redis.tag=123123 --set image.backend.repository=realtimepixel.azurecr.io/realtimepixel_backend --set image.backend.tag=123123 --set image.frontend.repository=realtimepixel.azurecr.io/realtimepixel_frontend --set image.frontend.tag=123123 > temp.yaml

I found a case-sensitive issue with the image.pullPolicy where the p was lowercase and needed to be uppercase.

How to Debug with Azure Kubernetes Containers

The first issue was out of the way, and a second appeared. This time, the Azure Kubernetes was showing the pods having a CrashLoopBackOff.There is a command to get the health of all your pods. In my case, some of them were crashing:

1kubectl get pods -n realtimepixel-prod

Resulted with:

1NAME READY STATUS RESTARTS AGE
2backend-deployment-69c99548d9-g2w5d 0/1 CrashLoopBackOff 10 (3m28s ago) 30m
3backend-deployment-69c99548d9-wrhzp 0/1 CrashLoopBackOff 10 (3m42s ago) 30m
4backend-deployment-7dfbc4f7f8-m99kl 0/1 CrashLoopBackOff 10 (3m51s ago) 109m
5backend-deployment-7dfbc4f7f8-vbp24 0/1 CrashLoopBackOff 10 (4m4s ago) 109m
6frontend-deployment-6f88fdb587-2p2lk 1/1 Running 0 30m
7redis-deployment-5d48cc44bd-8w869 1/1 Running 0 30m

It is possible to see the logs of the deployment:

1kubectl describe pod backend-deployment-69c99548d9-g2w5d -n realtimepixel-prod
2kubectl logs backend-deployment-69c99548d9-g2w5d -n realtimepixel-prod

The output:

1> start:production
2> node -r ts-node/register/transpile-only -r tsconfig-paths/register build/backend/src/index.js
3
4Error: Cannot find module '/node/build/backend/src/index.js'
5 at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
6 at Function.Module._resolveFilename.sharedData.moduleResolveFilenameHook.installedValue (/node/node_modules/@cspotcode/source-map-support/source-map-support.js:679:30)
7 at Function.Module._resolveFilename (/node/node_modules/tsconfig-paths/src/register.ts:90:36)
8 at Function.Module._load (node:internal/modules/cjs/loader:778:27)
9 at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
10 at node:internal/main/run_main_module:17:47 {
11 code: 'MODULE_NOT_FOUND',
12 requireStack: []
13}

My instinct was to get into the container.

1kubectl exec -it backend-deployment-69c99548d9-g2w5d -n realtimepixel-prod -- sh

But was producing:

1error: unable to upgrade connection: container not found...

The problem is that the issue is on the run command of the container that is failing because the starting file (index.js) is not present. However, because it cannot start the command, the container closes not letting me accessing the container to inspect the folder structures and files.

You can test it by creating a Pod for the problematic image that will not restart using the following commands:

1kubectl run debug-demo -n realtimepixel-prod --image=realtimepixel.azurecr.io/realtimepixel_backend:67dfbf27b868bd0b9e7c77aefafb596f2adb3ca0 --restart=Never
2kubectl get pods -n realtimepixel-prod
3kubectl exec -it debug-demo -n realtimepixel-prod -- sh

However, the problem will remain that when the Kube Control runs the image that it will crash. However, the log above provides good information. In my case, I realized two things:

  1. When testing locally, I wasn't testing correctly. The build was passing because I had a node_modules with a dependency that was installed when performing npm run install which was adding all the devDependencies. On Github, performing the same command, with the NODE_ENV to production was causing npm install to install only the dependencies without the devDependencies.
  2. I added the --target in the build for the multi-stage. I had the impression that it would start the build with the target, but it is saying that it will be the latest. Oddly, using docker-compose, the build step is only the one in the target but using the docker command runs the developer portions and the production.

Still Not Working! What to do next?

In my case, it was still not green. Same error about CrashLoopBackOff on the startup of the container. Time to get back locally.

Instead of using docker-compose, I decided to mimic what I was doing on the Github workflow.

1docker build -f ./services/backend/Dockerfile --target production --build-arg NODE_ENV=production .

Then get the image built and run it.

1docker images
2docker run 40d4941a9092
3docker ps

Take the image id from the ps command:

1docker run -it 40d4941a9092 bash

I could see the error, but like Kubernetes, the container shut down. So now, time to make the container crash but to hang it there in an infinite loop:

1docker run 40d4941a9092 /bin/sh -c "while true; do sleep 2; df -h; done"

In another terminal window, you can connect.

1docker run -it 40d4941a9092 bash

At that point, I saw that the build was messing around with the folders of the output of TypeScript. So I modified the path and was good to go.

Conclusion

At this point, the image was created with a build that was successful.

azure kubernetes deployement

I would say that the experience was enriching. However, one question kept getting in the back of my mind: why isn't the Azure Portal guiding me with more than a single keyword for the failure. As we saw, by messing around with commands, we found the root cause, but some support would have been great. Nonetheless, if something happens to you, now you should be equipped to diagnose a little bit better.