AKS & Flux via Terraform
Introduction
We've been incredibly busy at BlakYaks and haven't had the chance to release as many technical articles as we would like. We are determined to up the cadence now and are going to kick off again with one of our favorite tools: Flux.
This article will demonstrate how to bootstrap Flux onto an existing Azure AKS cluster using Terraform then delve into some examples of using Flux to deploy your resources. The article is technical in nature and deliberately uses low-level commands as, sometimes, you can't beat 'getting your hands dirty'.
What is Flux?
Flux falls into the Cloud Native Computing Foundation (CNCF) category of Continuous Integration & Delivery tools. The CNCF assigns maturity levels to cloud native software and Flux is one of only two in this category that are assigned the graduated
status, meaning they are stable and used successfully in production environments. The full list of those assigned a status in this category are:
I think it’s fair to say that for a production environment you are probably going to want to consider either Argo or Flux at this point.
Flux in its own words, straight from the Flux Documentation:
Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories), and automating updates to configuration when there is new code to deploy.
Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem. Flux supports multi-tenancy and support for syncing an arbitrary number of Git repositories
Really this is introducing a GitOps model for continuous deployment of applications to Kubernetes. A quick definition of GitOps may be useful:
GitOps is a way of implementing Continuous Deployment for cloud native applications. It focuses on a developer-centric experience when operating infrastructure, by using tools developers are already familiar with, including Git and Continuous Deployment tools.
The core idea of GitOps is having a Git repository that always contains declarative descriptions of the infrastructure currently desired in the production environment and an automated process to make the production environment match the described state in the repository. If you want to deploy a new application or update an existing one, you only need to update the repository - the automated process handles everything else. It’s like having cruise control for managing your applications in production.
Rather than repeat too much of the documentation we will make some notes on why we think this is interesting. So, Flux essentially reconciles desired state in a source Git repository with the current state on a Kubernetes cluster:
How is this different to how this might normally be done? Many teams starting out with DevOps for K8s will use a permissioned build agent to deploy resources to K8s in a push model:
The build agent will typically pull down a kube.config
file with powerful cluster permissions and use these credentials to push manifests, Helm charts, etc. on to the target clusters. Management of these credentials is a headache and they typically don't adhere to the 'least privilege' principal very well.
With Flux you add an SSH key to your source control repo that has read only access (unless you are doing image automation, we have a follow up blog to come on this) and Flux then watches the repo for commit (desired state) changes and applies the changes locally within the cluster. No need for powerful credentials hanging around in key vaults and build agents:
The power of the GitOps model is a tricky one to grasp intuitively but we’ll hopefully help with this in the demonstration later in the article.
Installing Flux on AKS
1. Before you start
There are a few requirements for our demo code to run, so let’s spend some time checking these before we go any further:
The code assumes you are using a Linux OS, we are using Ubuntu 22.04 but any other Debian-based distribution will be fine
Kubectl installed and connected to your cluster to follow the examples
We also need
ssh-keyscan
,git
andpwsh
(Windows Powershell Core) installed on our OS. The latter we can install following this processYou will also need the Azure CLI installed in order to login to your Azure subscription. Obviously, you’ll also need an AKS cluster ready and available to have Flux installed!
We will use Terraform for our Flux bootstrap deployment. This needs to be at version 1.3.0 or later to work with our demo code
Finally, you will need SSH access to read/write to your GitHub account - yes, you’ll need one of those too!
Ready? Let’s move on.
2. Setup Deploy Keys
In this example, we’ll use GitHub as our target Flux repository. GitHub has supported read-only SSH keys for some time and provides a good base for our demo. To keep things simple, we’ll create a fork of the BlakYaks demo repository to use as our Flux target.
Browse to our repository, and click the Fork button to create a copy in your organization or account. You must ensure that “Copy the main
branch only” is unticked, as we use one of the branches later in our example:
Once this is done, we will also need to create a read-only SSH (deploy) key that we can use with Flux to check-in with our new forked repository.
Note: the
Settings
below are your forked repository settings, not your account settings
Select Settings → Deploy keys → Add deploy key to create a new read-only SSH key.
Have a look here if you need a guide on creating a new SSH key pair.
Keep a note of your SSH private key location, you will need this when setting up your Flux configuration later
3. Creating our Flux bootstrap configuration
Once you’ve created your fork, create a folder to hold the repository code and clone it locally, for example:
Note: All code images are clickable and will take you to the text version in our Github
We’re assuming that you already have a read-write SSH key linked to your Github account and configured on your workstation
The terraform/my.auto.tfvars
file needs to be updated to reflect your AKS environment. The latest Flux provider uses the same authentication method as the Hashicorp native Kubernetes provider to push the Flux components into the target cluster. In our demo we will pull our credentials directly from the AKS cluster (though you could also simply pass in a kubeconfig file).
Edit the file in your repo with your details, and save when complete:
Notes on the credentials:
The
bootstrap_credentials
should reference your administrative SSH private key. This is used for the initial bootstrap of the cluster, and therefore requires read/write access to your repo to upload the installation manifests once they have been createdThe
flux_credentials
should reference your read-only (deploy) SSH key pair, created earlier. This key will be used by the Flux source controller to check and pull updates from your repo. It won’t need to write anything back to the target repository
4. Bootstrap the cluster
We will use Terraform to deploy Flux onto an existing AKS cluster. For the sake of brevity, we’ll assume you’ve already built your cluster and have access to it. Our demo codebase also requires you have logged in to your Azure tenant and have selected your target subscription, so let’s do this first, using the Azure CLI:
Bootstrapping the cluster is now simply a task of running a Terraform apply against our code.
If all went to plan, your cluster should now be running the Flux controllers within a new flux-system
namespace. Let’s double check:
Make sure that you have set your kubectl context to your cluster before you run these commands
Finally, let’s perform one last piece of tidying up before we move on:
Awesome!
So, what exactly just happened?
The current Flux provider contains a single resource, flux_bootstrap_git
, which is responsible for not only installing Flux onto the target cluster, but also updating the source code repository with the bootstrap files. This essentially provides us with a blank canvas to start adding additional kustomizations, which we’ll do in a short while. The files added to the flux/clusters/development/flux-system
folder of your fork (and deployed to your cluster) are:
We will describe these in more detail in the Flux architecture section.
Points worth mentioning about the installation process:
The current Flux provider assumes that you want to run synchronisations using your read/write SSH key, which we provided in the
bootstrap_credentials
variable. If we use a read-only key during bootstrapping, the process will fail since there is no option to skip upload of files back into Git with this resourceTo work around the limitation, we create a second secret (
flux-sync
), and then use akustomization_override
that adds a patch to the standard Flux deployment so that it uses this secret rather than the original read/write secret (flux-system
)
The original secret can be deleted once we have finished our bootstrapping, which is why we ran the kubectl delete secret ...
step earlier.
Now, let’s move on; The Terraform code has made changes to your source control branch so you will need to pull the changes down:
git pull
…and then push the changes to your autos
file back up:
git add -A && git commit -m "Updated auto vars file" && git push
Now that we’ve installed Flux, let’s explore what Flux looks like inside our cluster….
Flux Architecture
Flux has 4 main controllers:
Definitions from Flux:
Helm Controller: The Helm Controller is a Kubernetes operator, allowing one to declaratively manage Helm chart releases with Kubernetes manifests
Kustomize Controller: The kustomize-controller is a Kubernetes operator, specialized in running continuous delivery pipelines for infrastructure and workloads defined with Kubernetes manifests and assembled with Kustomize
Notification Controller: The Notification Controller is a Kubernetes operator, specialized in handling inbound and outbound events
Source Controller: The main role of the source management component is to provide a common interface for artifacts acquisition. The source API defines a set of Kubernetes objects that cluster admins and various automated operators can interact with to offload the Git and Helm repositories operations to a dedicated controller
Note: There are another couple of controllers that are optionally installed for image automation. We will cover these another time
The official Flux documentation is good (and pleasingly succinct) here. So we highly recommend taking a few minutes to review:
You should have the controllers running within their deployments:
The controllers are underpinned by a set of CustomResourceDefinition
objects:
But most of your work with Flux is via the surfaced API resources:
Official docs for these API resources:
Hopefully the article is self sufficient so you could get away without reading these docs (maybe with the exception of kustomization, worth reading that one) but if you wanted to follow the docs as we go:
For the example section of the article we will be using:
kustomization, helmrelease, helmchart, helmrepo, gitrepo
For the alert section we will be using:
alert, provider
When we applied our Terraform to the cluster we created these resources for the Flux system but also configured Flux to monitor on it's own configuration and reconcile when there are changes in the source repository. Let’s go through the initial config files one-by-one:
1 gotk-components.yaml
- Essentially installs Flux itself. Namespace, CRDs, service accounts, roles, role bindings, services, deployments etc..
2 gotk-sync.yaml
- Configures Flux to reconcile with the source repository. It creates two resources:
A
GitRepository
source calledflux-system
. Using this resource Flux is now aware of, and permissioned to read from, the forked source control repositoryA
Kustomization
resource, also calledflux-system
. This resource is used to tell Flux to reconcile with the GitRepository source above, every 10 mins, on the path./flux/clusters/development
3 kustomization.yaml
- Flux will look in the path supplied in the flux-system kustomization for file named kustomization.yaml
. These files are the entry point for Flux to understand what it should do with the folder contents. Essentially this file tells Flux to reconcile gotk-components.yaml
and gotk-sync.yaml
:
Now, there are a fair few concepts introduced here. Let’s run through an example of adding a new application to our source repository and have that sync down to the cluster. This will hopefully make things clearer. As kube-state-metrics is a service we commonly see installed to expose metrics to logging systems we’ll use it in the following example.
Example: kube-state-metrics (KSM)
Note: We are going to skip best practice on repository structure for this example in preference for clarity. Getting this right in production environments however, is key for the smooth operation of Flux
In your fork there is a branch called feature/ksm
. Go to the branches in Github:
Create a new pull request from the feature/ksm
branch:
Ensure the pull request is pulling from feature/ksm
into your forked base: main
branch and not back to the BlakYaks source:
Click Create pull request
then Merge pull request
and Confirm merge
on the next screen.
This will have pulled in the file and folder structure we will discuss in the rest of the example and, crucially, will also have deployed the resources to the cluster. The following sections 1 & 2 have been triggered by this pull request and Flux has reconciled the changes to the cluster. The sections describe exactly what we did.
1. ‘Before’ resource configuration
What are we trying to achieve here? We have configured Flux to:
Create the
metrics
namespace where we installkube-state-metrics
Create a new Helm repository source referencing
https://prometheus-community.github.io/helm-charts
which is where we will pull KSM from
When using Flux there are typically going to be some things you need to do before you can pull your applications down. In our example we need to create a metrics
namespace in to which we will install KSM and we will also create the HelmRepository
resource so that Flux knows where to pull the KSM Helm chart from.
The file and folder structure we created is (we will step through this shortly):
A git repo source was created by Terraform and is already being monitored by Flux. The gitrepo
source object created by Terraform looks like this (some attributes removed for brevity):
The url
tells you we are watching our forked repo and ref.branch
tells you we are reconciling with the main
branch. The interval
attribute tells you we are checking the repo every minute for new commits. Changes made to this source repository in the correct manner will be reconciled by any clusters monitoring the folders.
It is probably worth taking a moment here to go over the files within the folder structure while defining their purpose. There is an element of building by convention with Flux which can get really confusing if you are not familiar with it. The example structure given in the official docs is:
├── apps
│ ├── base
│ ├── production
│ └── staging
├── infrastructure
│ ├── base
│ ├── production
│ └── staging
└── clusters
├── production
└── staging
With the description:
Each cluster state is defined in a dedicated dir e.g. clusters/production where the specific apps and infrastructure overlays are referenced.
The separation between apps and infrastructure makes it possible to define the order in which a cluster is reconciled, e.g. first the cluster addons and other Kubernetes controllers, then the applications.
We have a simplified version of this in our forked repo:
├───apps
│ └───development
│ ├───before
│ │ ├───helmrepos
│ │ └───namespaces
│ └───kube-state-metrics
└───clusters
└───development
├───apps
└───flux-system
So, let’s go through it in detail:
1 before.yaml
- The files under this directory can be thought of as the entry point which tells Flux what to reconcile and how often to do it. This file creates a Flux Kustomization called before-apps
:
Note: line 8 is where we direct Flux to item 2 in our diagram
2 kustomization.yaml
- This Kustomization is purely organisational and directs Flux to our helmrepos
and namespaces
folders:
3 kustomization.yaml
- When Flux is directed to this folder it doesn't know what to do with the contents, so you have another kustomization file to instruct it what to do:
4 metrics.yaml
- Standard K8s namespace yaml manifest:
5 kustomization.yaml
- Same deal as no.3, lets Flux know what to do with the contents of this folder:
6 prometheus-community.yaml
- Defines our Helm repository source so Flux is aware of the prometheus-community
Helm chart repository:
We should be able to see our new before-apps
kustomization (and kube-state-metrics
, described in the next section) and be able to check the status:
Given that our kustomization
is (hopefully) reporting that the revision is applied we should see our new metrics
namespace:
…and our new prometheus-community
helmrepo source:
2. kube-state-metrics configuration
What are we trying to achieve here? We have configured Flux to:
Deploy and reconcile the KSM chart from the
prometheus-community
Helm chart source repository
We followed the same pattern for the folder structure as before. At the risk of labouring the point, although we will, because it’s really important; Let’s go through the files again:
1 kube-state-metrics.yaml
- Our entry point, defining our source, path and interval:
Line 8 is important. This dependency ensures this Kustomization will wait until the
before-apps
Kustomization has deployed successfullyLine 11 is where we direct Flux to item 2 in our diagram
2 kustomization.yaml
- Simply directing Flux to our HelmRelease
resource which will control our Helm release:
3 kube-state-metrics.helm.yaml
- The resource which will actually deploy KSM to our cluster:
Note: Line 10: The strategy to use for chart reconciliation
We should see our new kube-state-metrics
kustomization
with READY=True
:
We should also see the helmrelease
from file kube-state-metrics.helm.yaml
with READY=True
:
And our running kube-state-metrics
deployment
:
Now there is, admittedly, a lot to understand here. But we have to remember that what we are doing is configuring automation and typically the payoff for this is after you have set it all up. Let’s have a look at what we need to do to upgrade KSM to the latest version in the next section.
3. KSM Upgrade
From our fork we installed kube-state-metrcs
pinned to chart v4.30.0
:
The chart, in turn, has pinned the KSM image version to v2.8.0
Let’s upgrade the chart version in our fork; Change your kube-state-metrics.helm.yaml
file so that the image version is now 5.0.0
:
Note: Line 15 needs to change from =4.30.0 to =5.0.0
And commit this back to source control:
git add -A && git commit -m "Upgrading KSM chart to v5.0.0" && git push
After a minute or so the Kustomization will pick up the change and your chart version will be upgraded to 5.0.0
:
…and the running image version upgraded to v2.8.1
:
Let’s demonstrate one more change. There is an experimental feature of KSM that will allow you to run it as a statefulset
and have the data sharded across the pods in the set. We can set the values that are submitted to the Helm chart (as you would with a Helm values file) via the kube-state-metrics.helm.yaml
file:
Note: Lines 17 - 20 need to be added
After updating; commit back to source control:
git add -A && git commit -m "Enabling KSM statefulset sharding" && git push
After a minute or so we can check our resources and we should see our new statefulset
:
This is (hopefully) where the power of the GitOps model becomes apparent. A very simple change to a git repository was all it took to upgrade our service and, because it was done in source control, we have an audit trail and a mechanism to undo the changes by also simply changing the git commits.
In the real world we wouldn’t be committing directly to main source control branches the way we are here. The branches would have protection policies and approvers which mean only validated changes can be made to the clusters (maybe an article for another time).
Setting up notifications
The current documentation is a little sparse in places, so let’s look at an example of how we would setup an integration between our Flux deployment and Microsoft Teams. It’s probably worth an overview of the Notification controller at this point:
The controller handles all ingress and egress events for the Flux reconcilers; this includes receivers which can trigger events based on external system (incoming) webhooks, and providers, which interface with external systems. Alerts are defined that stipulate which events will trigger communication with specific providers.
Microsoft Teams allows integration with third party systems using incoming webhooks, so to link our Flux deployment to Teams, we have to create the following:
A provider instance. This defines the address that Flux should send payload to when requested
One or more alert instances, linked to the provider. The alerts define which events Flux should send notifications for
On the Teams side, we need to setup an Incoming Webhook connector, as per the guide here. Make a note of the connector URL, and update the
notifications/flux-msteams.yaml
file in the forked repo with the URL:
Note: We are going to apply and delete the manifests directly using kubectl to keep this bit as simple as possible, you should of course do this via a Kustomization :)
Once saved, apply the file into your cluster:
Within a few minutes, alerts should begin to appear within Teams:
You will probably notice that the example above is quite chatty. In a production environment you would definitely want to tune the alerts down to a more sensible level, but for demo purposes this should give you some messages to look at. Once you’ve seen enough, you can stop the alerts by removing the objects from Kubernetes:
Wrapping Up
Hopefully by now you’ve got an idea of why we’re big advocates of GitOps workflows and, in particular, Flux.
The convention-based approach allows us to build complex application deployments with ease, and by leveraging source control best practices such as pull requests and branch policy we can quickly integrate deployments into our existing CI/CD toolchains in a secure and scalable manner.
Look out for our follow-on blogs where we’ll be using Flux to demonstrate some more Kubernetes features!