Kubernetes is a complex beast and any best practice or security guide you read will hit you with dozens of best-practice rules your clusters should adhere to in order to make them manageable and secure. In most situations the reality is that the rules are only worth their salt if they are at least audited and, ideally, enforced. Kubernetes policy allows you to define your policy as code, then audit and enforce the rules as you see fit.

Kyverno, a Kubernetes policy engine

Kyverno is a K8s policy engine that we like a lot. From their website:

"Kyverno is a policy engine designed for Kubernetes. With Kyverno, policies are managed as Kubernetes resources and no new language is required to write policies. This allows using familiar tools such as kubectl, git, and kustomize to manage policies. Kyverno policies can validate, mutate, and generate Kubernetes resources."

Kyverno is a CNCF sandbox project and one of the initial questions might be "why would you use it over the graduated Open Policy Agent (OPA) engine"? While the OPA is great, it is complex with a high barrier to entry, requiring adopters to know the rego policy language. For example, a simple policy that requires a specific label to be applied to resources is defined in a constraint template something like this:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
        listKind: K8sRequiredLabelsList
        plural: k8srequiredlabels
        singular: k8srequiredlabels
      validation:
        openAPIV3Schema:
          properties:
            labels:
              type: array
              items: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }

...and then the constraint applied with:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-cc
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["app.kubernetes.io/costcentre"]

While with Kyverno I can define my policy like this:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
  annotations:
    policies.kyverno.io/title: Require labels
    policies.kyverno.io/category: Best practice
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Namespace
    policies.kyverno.io/description: >-
      Require a app.kubernetes.io/costcentre label
spec:
  validationFailureAction: enforce
  background: false
  rules:
  - name: check-for-labels
    match:
      resources:
        kinds:
        - Namespace
    validate:
      message: "The label `app.kubernetes.io/costcentre` is required."
      pattern:
        metadata:
          labels:
            app.kubernetes.io/costcentre: "?*"

Even in a trivial example there is huge difference in how easy the latter is to understand over the former and the OPA rego can get pretty hairy as complexity increases. The OPA is undoubtedly more flexible than Kyverno but, for a lot of teams managing Kubernetes clusters, the complexity is too much of a maintenance and management issue. Horses for courses and all that.

Installing Kyverno

Kyverno's documentation is good here so won't dwell on this. To install the Kyverno and the policy reporter GUI:

helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno --namespace kyverno --create-namespace

Install policy reporter GUI:

helm repo add policy-reporter https://kyverno.github.io/policy-reporter
helm repo update
helm install policy-reporter policy-reporter/policy-reporter --set kyvernoPlugin.enabled=true --set ui.enabled=true --set ui.plugins.kyverno=true -n policy-reporter --create-namespace

Port forward to GUI:

kubectl port-forward service/policy-reporter-ui 8082:8080 -n policy-reporter

Writing Kyverno policies

The basics of how you write a Kyverno policy are:

We will use our require-labels policy as our example:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
  annotations:
    policies.kyverno.io/title: Require labels
    policies.kyverno.io/category: Best practice
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Namespace
    policies.kyverno.io/description: >-
      Require a app.kubernetes.io/costcentre label
spec:
  validationFailureAction: enforce
  background: false
  rules:
  - name: check-for-labels
    match:
      resources:
        kinds:
        - Namespace
    validate:
      message: "The label `app.kubernetes.io/costcentre` is required."
      pattern:
        metadata:
          labels:
            app.kubernetes.io/costcentre: "?*"

Select resources

Select the resources that this policy will apply to. So in our require labels example:

...
    match:
      resources:
        kinds:
        - Namespace
...

The policy will apply to all of my namespaces. I'm using kind here but you can of course select on things like names and labels.

Validate resources

Validate that the resources match my standard. In the example this was ensuring the app.kubernetes.io/costcentre label had something in it:

...
    validate:
      message: "The label `app.kubernetes.io/costcentre` is required."
      pattern:
        metadata:
          labels:
            app.kubernetes.io/costcentre: "?*"
...

What happens when validation fails? The line:

validationFailureAction: enforce

...in the example will not allow the resource to deploy if it does not have the label as we are enforcing the rule. It can also be set to audit so that we would allow the resource to deploy but report the failure.

Mutate resources

You can modify or mutate a resource to match your standards rather than rejecting it. For example, if you wanted to make sure all pods had a label type=user then we can mutate the resource as it enters the system:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-label
  annotations:
    policies.kyverno.io/title: Add default label
    policies.kyverno.io/category: Best practice
    policies.kyverno.io/severity: low
    policies.kyverno.io/subject: Label
    policies.kyverno.io/description: >-
      This policy performs a simple mutation which adds a label
      `type=user` to Pods, Services, ConfigMaps, and Secrets.
spec:
  rules:
  - name: add-label
    match:
      resources:
        kinds:
        - Pod
    mutate:
      patchStrategicMerge:
        metadata:
          labels:
            type: user

Generate resources

Generate rules can create additional resources. For example, It is good practice to set a network policy in K8s to deny all network traffic to and from a namespace/service and only allow traffic with explicit rules. This policy will apply a default deny rule to any namespace created or amended that is not in the exclusion list of (kube-system, default, kube-public, kyverno):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: default
spec:
  rules:
  - name: deny-all-traffic
    match:
      resources:
        kinds:
        - Namespace
    exclude:
      resources:
        namespaces:
        - kube-system
        - default
        - kube-public
        - kyverno
    generate:
      kind: NetworkPolicy
      name: deny-all-traffic
      namespace: "{{request.object.metadata.name}}"
      data:  
        spec:
          # select all pods in the namespace
          podSelector: {}
          policyTypes:
          - Ingress
          - Egress

Testing

Example 1 - labels

In a file called require_label_ns.yaml in my policies folder, I have defined this namespace label rule to ensure the app.kubernetes.io/costcentre label is on all new namespaces:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
  annotations:
    policies.kyverno.io/title: Require labels
    policies.kyverno.io/category: Best practice
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Namespace
    policies.kyverno.io/description: >-
      Require a app.kubernetes.io/costcentre label
spec:
  validationFailureAction: enforce
  background: false
  rules:
  - name: check-for-labels
    match:
      resources:
        kinds:
        - Namespace
    validate:
      message: "The label `app.kubernetes.io/costcentre` is required."
      pattern:
        metadata:
          labels:
            app.kubernetes.io/costcentre: "?*"

Now apply it to the cluster:

$ kubectl apply -f policies/require_label_ns.yaml
clusterpolicy.kyverno.io/require-labels created

Now try to add a namespace that does not have the app.kubernetes.io/costcentre label:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: kyverno-testing
  name: kyverno-testing
EOF

resource Namespace//kyverno-testing was blocked due to the following policies

require-labels:
  check-for-labels: 'validation error: The label `app.kubernetes.io/costcentre` is
    required. Rule check-for-labels failed at path /metadata/labels/app.kubernetes.io/costcentre/'

As we are enforcing the policy the namespace creation is rightly denied.

Now try to add a namespace that does have the app.kubernetes.io/costcentre label:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    app.kubernetes.io/costcentre: "engineering"
    name: kyverno-testing
  name: kyverno-testing
EOF

namespace/kyverno-testing created

Example 2 - requests and limits

In a file called require_pod_requests_limits.yaml in my policies folder, I have defined this pod rule to ensure all containers have CPU & Memory requests & limits:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-requests-limits
  annotations:
    policies.kyverno.io/title: Require Limits and Requests
    policies.kyverno.io/category: Multi-Tenancy
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      This policy validates that all containers have specified memory and CPU requests and limits.
spec:
  validationFailureAction: enforce
  background: true
  rules:
  - name: validate-resources
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "CPU and memory resource requests and limits are required."
      pattern:
        spec:
          containers:
          - resources:
              requests:
                memory: "?*"
                cpu: "?*"
              limits:
                memory: "?*"
                cpu: "?*"

Now apply it to the cluster:

$ kubectl apply -f policies/require_pod_requests_limits.yaml
clusterpolicy.kyverno.io/require-requests-limits created

Now try to add a pod that does not set any requests or limits:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox1
  labels:
    app: busybox1
spec:
  containers:
  - image: busybox:latest
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always
EOF

The creation is denied:

resource Pod/default/busybox1 was blocked due to the following policies

require-requests-limits:
  validate-resources: 'validation error: CPU and memory resource requests and limits
    are required. Rule validate-resources failed at path /spec/containers/0/resources/limits/'

Now try again but with requests and limits set:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox1
  labels:
    app: busybox1
spec:
  containers:
  - image: busybox:latest
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
    resources:
      requests:
        memory: "50Mi"
        cpu: "100m"
      limits:
        memory: "50Mi"
        cpu: "100m"
  restartPolicy: Always
EOF

pod/busybox1 created

Example 3 - Default namespace quotas

Namespace resource quotas are a good way to govern fair usage of a cluster shared by multiple teams. Ideally application teams will be able to help define their overall resource requirements for a namespace and you can set these limits in a resource quota. In reality it is not always practical to do this. In the scenario where resource quotas are not set it is helpful to set defaults to ensure that one namespace doesn't overwhelm cluster resources and cause issues for it's neighbors.

In a file called add_ns_quota.yaml in my policies folder, I have defined a namespace rule to generate resource quotas:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-ns-quota
  annotations:
    policies.kyverno.io/title: Add Quota
    policies.kyverno.io/category: Multi-Tenancy
    policies.kyverno.io/subject: ResourceQuota
    policies.kyverno.io/description: >-
      This policy will generate ResourceQuota resources 
      when a new Namespace is created.
spec:
  rules:
  - name: generate-resourcequota
    match:
      resources:
        kinds:
        - Namespace
    generate:
      kind: ResourceQuota
      name: default-resourcequota
      synchronize: true
      namespace: "{{request.object.metadata.name}}"
      data:
        spec:
          hard:
            requests.cpu: '4'
            requests.memory: '16Gi'
            limits.cpu: '4'
            limits.memory: '16Gi'
            requests.storage: '100Gi'
            persistentvolumeclaims: 5

Now apply it to the cluster:

$ kubectl apply -f policies/add_ns_quota.yaml
clusterpolicy.kyverno.io/add_ns_quota created

And apply my namespace again from the label test:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    app.kubernetes.io/costcentre: "engineering"
    name: kyverno-testing
  name: kyverno-testing
EOF

namespace/kyverno-testing created

The namespace created ok, what about my resource quota:

kubectl get resourceQuotas -n kyverno-testing -o yaml

Some output removed for brevity:

apiVersion: v1
items:
- apiVersion: v1
  kind: ResourceQuota
  metadata:
    creationTimestamp: "2021-10-07T14:14:48Z"
    labels:
      app.kubernetes.io/managed-by: kyverno
      kyverno.io/generated-by-kind: Namespace
      kyverno.io/generated-by-name: kyverno-testing
      kyverno.io/generated-by-namespace: ""
      policy.kyverno.io/gr-name: gr-xnvgw
      policy.kyverno.io/policy-name: add-ns-quota
      policy.kyverno.io/synchronize: enable
    name: default-resourcequota
    namespace: kyverno-testing
  spec:
    hard:
      limits.cpu: "4"
      limits.memory: 16Gi
      persistentvolumeclaims: "5"
      requests.cpu: "4"
      requests.memory: 16Gi
      requests.storage: 100Gi

We see that Kyvero generated the resource quota as per our policy definition.

Policy definition

I've glossed over some of the yaml in the policy definitions, so let's go through one of the policy definitions line-by-line to be clear about what is being done. We'll go through the 'require requests and limits' policy:

Lines 1 and 2 set the Kyverno api version and the kind ClusterPolicy:

The policies can be either cluster scoped with ClusterPolicy or namespace scoped with Policy.

Lines 3-11 are the meta data for organising and describing your polices. Important to organise properly when you have a lot of policies:

From line 12 on is the specification of the policy.

Line 13 determines the action you take when the policy is violated. When we enforce, the resource creation is denied when the policy is violated. When we audit, the resource creation succeeds but the violation is recorded in the log:

The background option on line 14 tells Kyverno what to do with existing cluster resources. They will be audited if set to true:

From line 15 are the actual rule definitions, of which there can be multiple. Line 16 starts our first (and only) rule and names it validate-resources.

Lines 17-20 defines the resources we are going to match and thus apply this policy to. Every Pod in our case:

Line 21 defines the type of policy rule. validate in our case but can also be mutate or generate:

Line 22 is the policy rule violation message:

Lines 23-32 define the pattern we need to match for the policy rule to be passed:

So for a pod to pass the rule it must have:

Something in the pod's spec.containers.resources.requests.memory field
Something in the pod's spec.containers.resources.requests.cpu field
Something in the pod's spec.containers.limits.requests.memory field
Something in the pod's spec.containers.limits.requests.cpu field

The 'something' in our example is the ?* wildcard:

? - matches a single alphanumeric character
* - matches zero or more alphanumeric characters

Kyverno policy reports

If we are enforcing policy you can see the impact when resources are denied or mutated but what about audit? For our namespace label cluster policy we will have clusterpolicyreports that report the current state of the policies:

kubectl get clusterpolicyreports

Gives me:

NAME                  PASS   FAIL   WARN   ERROR   SKIP   AGE
clusterpolicyreport   0      2      0      0       0      8m9s

...and digging into the output of the report:

kubectl get clusterpolicyreports clusterpolicyreport -o yaml

Let's me see the resources which have failed auditing (some output omitted for brevity):

apiVersion: wgpolicyk8s.io/v1alpha2
kind: ClusterPolicyReport
metadata:
  name: clusterpolicyreport
results:
- category: Best practice
  message: 'validation error: The label `app.kubernetes.io/costcentre` is required.
    Rule check-for-labels failed at path /metadata/labels/app.kubernetes.io/costcentre/'
  policy: require-labels
  resources:
  - apiVersion: v1
    kind: Namespace
    name: default
    uid: 905f857f-8c89-4f76-9001-5cf5ddc7a5ea
  result: fail
  rule: check-for-labels
  scored: true
  severity: medium
  source: Kyverno
- category: Best practice
  message: 'validation error: The label `app.kubernetes.io/costcentre` is required.
    Rule check-for-labels failed at path /metadata/labels/app.kubernetes.io/costcentre/'
  policy: require-labels
  resources:
  - apiVersion: v1
    kind: Namespace
    name: kyverno-gui
    uid: 60a4165f-6c13-416b-92af-61acdcf67757
  result: fail
  rule: check-for-labels
  scored: true
  severity: medium
  source: Kyverno

From my output I can see that the kyverno-gui and default namespaces fail validation as they don’t have the costcentre label defined.

So the reports are Kubernetes resources and here we have the cluster level policy clusterpolicyreport. You also have namespace scoped policyreport for policies scoped at that level.

The policy-reporter GUI surfaces all of this in a clean, simple front-end.