Security Checklist

Baseline checklist for ensuring security in Kubernetes clusters.

This checklist aims at providing a basic list of guidance with links to more comprehensive documentation on each topic. It does not claim to be exhaustive and is meant to evolve.

On how to read and use this document:

  • The order of topics does not reflect an order of priority.
  • Some checklist items are detailed in the paragraph below the list of each section.

Authentication & Authorization

  • system:masters group is not used for user or component authentication after bootstrapping.
  • The kube-controller-manager is running with --use-service-account-credentials enabled.
  • The root certificate is protected (either an offline CA, or a managed online CA with effective access controls).
  • Intermediate and leaf certificates have an expiry date no more than 3 years in the future.
  • A process exists for periodic access review, and reviews occur no more than 24 months apart.
  • The Role Based Access Control Good Practices is followed for guidance related to authentication and authorization.

After bootstrapping, neither users nor components should authenticate to the Kubernetes API as system:masters. Similarly, running all of kube-controller-manager as system:masters should be avoided. In fact, system:masters should only be used as a break-glass mechanism, as opposed to an admin user.

Network security

  • CNI plugins in-use supports network policies.
  • Ingress and egress network policies are applied to all workloads in the cluster.
  • Default network policies within each namespace, selecting all pods, denying everything, are in place.
  • If appropriate, a service mesh is used to encrypt all communications inside of the cluster.
  • The Kubernetes API, kubelet API and etcd are not exposed publicly on Internet.
  • Access from the workloads to the cloud metadata API is filtered.
  • Use of LoadBalancer and ExternalIPs is restricted.

A number of Container Network Interface (CNI) plugins plugins provide the functionality to restrict network resources that pods may communicate with. This is most commonly done through Network Policies which provide a namespaced resource to define rules. Default network policies blocking everything egress and ingress, in each namespace, selecting all the pods, can be useful to adopt an allow list approach, ensuring that no workloads is missed.

Not all CNI plugins provide encryption in transit. If the chosen plugin lacks this feature, an alternative solution could be to use a service mesh to provide that functionality.

The etcd datastore of the control plane should have controls to limit access and not be publicly exposed on the Internet. Furthermore, mutual TLS (mTLS) should be used to communicate securely with it. The certificate authority for this should be unique to etcd.

External Internet access to the Kubernetes API server should be restricted to not expose the API publicly. Be careful as many managed Kubernetes distribution are publicly exposing the API server by default. You can then use a bastion host to access the server.

The kubelet API access should be restricted and not publicly exposed, the defaults authentication and authorization settings, when no configuration file specified with the --config flag, are overly permissive.

If a cloud provider is used for hosting Kubernetes, the access from pods to the cloud metadata API 169.254.169.254 should also be restricted or blocked if not needed because it may leak information.

For restricted LoadBalancer and ExternalIPs use, see CVE-2020-8554: Man in the middle using LoadBalancer or ExternalIPs and the DenyServiceExternalIPs admission controller for further information.

Pod security

  • RBAC rights to create, update, patch, delete workloads is only granted if necessary.
  • Appropriate Pod Security Standards policy is applied for all namespaces and enforced.
  • Memory limit is set for the workloads with a limit equal or inferior to the request.
  • CPU limit might be set on sensitive workloads.
  • For nodes that support it, Seccomp is enabled with appropriate syscalls profile for programs.
  • For nodes that support it, AppArmor or SELinux is enabled with appropriate profile for programs.

RBAC authorization is crucial but cannot be granular enough to have authorization on the Pods' resources (or on any resource that manages Pods). The only granularity is the API verbs on the resource itself, for example, create on Pods. Without additional admission, the authorization to create these resources allows direct unrestricted access to the schedulable nodes of a cluster.

The Pod Security Standards define three different policies, privileged, baseline and restricted that limit how fields can be set in the PodSpec regarding security. These standards can be enforced at the namespace level with the new Pod Security admission, enabled by default, or by third-party admission webhook. Please note that, contrary to the removed PodSecurityPolicy admission it replaces, Pod Security admission can be easily combined with admission webhooks and external services.

Pod Security admission restricted policy, the most restrictive policy of the Pod Security Standards set, can operate in several modes, warn, audit or enforce to gradually apply the most appropriate security context according to security best practices. Nevertheless, pods' security context should be separately investigated to limit the privileges and access pods may have on top of the predefined security standards, for specific use cases.

For a hands-on tutorial on Pod Security, see the blog post Kubernetes 1.23: Pod Security Graduates to Beta.

Memory and CPU limits should be set in order to restrict the memory and CPU resources a pod can consume on a node, and therefore prevent potential DoS attacks from malicious or breached workloads. Such policy can be enforced by an admission controller. Please note that CPU limits will throttle usage and thus can have unintended effects on auto-scaling features or efficiency i.e. running the process in best effort with the CPU resource available.

Enabling Seccomp

Seccomp can improve the security of your workloads by reducing the Linux kernel syscall attack surface available inside containers. The seccomp filter mode leverages BPF to create an allow or deny list of specific syscalls, named profiles. Those seccomp profiles can be enabled on individual workloads, a security tutorial is available. In addition, the Kubernetes Security Profiles Operator is a project to facilitate the management and use of seccomp in clusters.

For historical context, please note that Docker has been using a default seccomp profile to only allow a restricted set of syscalls since 2016 from Docker Engine 1.10, but Kubernetes is still not confining workloads by default. The default seccomp profile can be found in containerd as well. Fortunately, Seccomp Default, a new alpha feature to use a default seccomp profile for all workloads can now be enabled and tested.

Enabling AppArmor or SELinux

AppArmor

AppArmor is a Linux kernel security module that can provide an easy way to implement Mandatory Access Control (MAC) and better auditing through system logs. To enable AppArmor in Kubernetes, at least version 1.4 is required. Like seccomp, AppArmor is also configured through profiles, where each profile is either running in enforcing mode, which blocks access to disallowed resources or complain mode, which only reports violations. AppArmor profiles are enforced on a per-container basis, with an annotation, allowing for processes to gain just the right privileges.

SELinux

SELinux is also a Linux kernel security module that can provide a mechanism for supporting access control security policies, including Mandatory Access Controls (MAC). SELinux labels can be assigned to containers or pods via their securityContext section.

Pod placement

  • Pod placement is done in accordance with the tiers of sensitivity of the application.
  • Sensitive applications are running isolated on nodes or with specific sandboxed runtimes.

Pods that are on different tiers of sensitivity, for example, an application pod and the Kubernetes API server, should be deployed onto separate nodes. The purpose of node isolation is to prevent an application container breakout to directly providing access to applications with higher level of sensitivity to easily pivot within the cluster. This separation should be enforced to prevent pods accidentally being deployed onto the same node. This could be enforced with the following features:

Node Selectors
Key-value pairs, as part of the pod specification, that specify which nodes to deploy onto. These can be enforced at the namespace and cluster level with the PodNodeSelector admission controller.
PodTolerationRestriction
An admission controller that allows administrators to restrict permitted tolerations within a namespace. Pods within a namespace may only utilize the tolerations specified on the namespace object annotation keys that provide a set of default and allowed tolerations.
RuntimeClass
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime configuration is used to run a Pod's containers and can provide more or less isolation from the host at the cost of performance overhead.

Secrets

  • ConfigMaps are not used to hold confidential data.
  • Encryption at rest is configured for the Secret API.
  • If appropriate, a mechanism to inject secrets stored in third-party storage is deployed and available.
  • Service account tokens are not mounted in pods that don't require them.
  • Bound service account token volume is in-use instead of non-expiring tokens.

Secrets required for pods should be stored within Kubernetes Secrets as opposed to alternatives such as ConfigMap. Secret resources stored within etcd should be encrypted at rest.

Pods needing secrets should have these automatically mounted through volumes, preferably stored in memory like with the emptyDir.medium option. Mechanism can be used to also inject secrets from third-party storages as volume, like the Secrets Store CSI Driver. This should be done preferentially as compared to providing the pods service account RBAC access to secrets. This would allow adding secrets into the pod as environment variables or files. Please note that the environment variable method might be more prone to leakage due to crash dumps in logs and the non-confidential nature of environment variable in Linux, as opposed to the permission mechanism on files.

Service account tokens should not be mounted into pods that do not require them. This can be configured by setting automountServiceAccountToken to false either within the service account to apply throughout the namespace or specifically for a pod. For Kubernetes v1.22 and above, use Bound Service Accounts for time-bound service account credentials.

Images

  • Minimize unnecessary content in container images.
  • Container images are configured to be run as unprivileged user.
  • References to container images are made by sha256 digests (rather than tags) or the provenance of the image is validated by verifying the image's digital signature at deploy time via admission control.
  • Container images are regularly scanned during creation and in deployment, and known vulnerable software is patched.

Container image should contain the bare minimum to run the program they package. Preferably, only the program and its dependencies, building the image from the minimal possible base. In particular, image used in production should not contain shells or debugging utilities, as an ephemeral debug container can be used for troubleshooting.

Build images to directly start with an unprivileged user by using the USER instruction in Dockerfile. The Security Context allows a container image to be started with a specific user and group with runAsUser and runAsGroup, even if not specified in the image manifest. However, the file permissions in the image layers might make it impossible to just start the process with a new unprivileged user without image modification.

Avoid using image tags to reference an image, especially the latest tag, the image behind a tag can be easily modified in a registry. Prefer using the complete sha256 digest which is unique to the image manifest. This policy can be enforced via an ImagePolicyWebhook. Image signatures can also be automatically verified with an admission controller at deploy time to validate their authenticity and integrity.

Scanning a container image can prevent critical vulnerabilities from being deployed to the cluster alongside the container image. Image scanning should be completed before deploying a container image to a cluster and is usually done as part of the deployment process in a CI/CD pipeline. The purpose of an image scan is to obtain information about possible vulnerabilities and their prevention in the container image, such as a Common Vulnerability Scoring System (CVSS) score. If the result of the image scans is combined with the pipeline compliance rules, only properly patched container images will end up in Production.

Admission controllers

  • An appropriate selection of admission controllers is enabled.
  • A pod security policy is enforced by the Pod Security Admission or/and a webhook admission controller.
  • The admission chain plugins and webhooks are securely configured.

Admission controllers can help to improve the security of the cluster. However, they can present risks themselves as they extend the API server and should be properly secured.

The following lists present a number of admission controllers that could be considered to enhance the security posture of your cluster and application. It includes controllers that may be referenced in other parts of this document.

This first group of admission controllers includes plugins enabled by default, consider to leave them enabled unless you know what you are doing:

CertificateApproval
Performs additional authorization checks to ensure the approving user has permission to approve certificate request.
CertificateSigning
Performs additional authorization checks to ensure the signing user has permission to sign certificate requests.
CertificateSubjectRestriction
Rejects any certificate request that specifies a 'group' (or 'organization attribute') of system:masters.
LimitRanger
Enforce the LimitRange API constraints.
MutatingAdmissionWebhook
Allows the use of custom controllers through webhooks, these controllers may mutate requests that it reviews.
PodSecurity
Replacement for Pod Security Policy, restricts security contexts of deployed Pods.
ResourceQuota
Enforces resource quotas to prevent over-usage of resources.
ValidatingAdmissionWebhook
Allows the use of custom controllers through webhooks, these controllers do not mutate requests that it reviews.

The second group includes plugin that are not enabled by default but in general availability state and recommended to improve your security posture:

DenyServiceExternalIPs
Rejects all net-new usage of the Service.spec.externalIPs field. This is a mitigation for CVE-2020-8554: Man in the middle using LoadBalancer or ExternalIPs.
NodeRestriction
Restricts kubelet's permissions to only modify the pods API resources they own or the node API ressource that represent themselves. It also prevents kubelet from using the node-restriction.kubernetes.io/ annotation, which can be used by an attacker with access to the kubelet's credentials to influence pod placement to the controlled node.

The third group includes plugins that are not enabled by default but could be considered for certain use cases:

AlwaysPullImages
Enforces the usage of the latest version of a tagged image and ensures that the deployer has permissions to use the image.
ImagePolicyWebhook
Allows enforcing additional controls for images through webhooks.

What's next

Last modified February 22, 2023 at 9:09 AM PST: 更新编辑 (f4a7975)