Use a User Namespace With a Pod

FEATURE STATE: Kubernetes v1.25 [alpha]

This page shows how to configure a user namespace for stateless pods. This allows to isolate the user running inside the container from the one in the host.

A process running as root in a container can run as a different (non-root) user in the host; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace.

You can use this feature to reduce the damage a compromised container can do to the host or other pods in the same node. There are several security vulnerabilities rated either HIGH or CRITICAL that were not exploitable when user namespaces is active. It is expected user namespace will mitigate some future vulnerabilities too.

Without using a user namespace a container running as root, in the case of a container breakout, has root privileges on the node. And if some capability were granted to the container, the capabilities are valid on the host too. None of this is true when user namespaces are used.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

Your Kubernetes server must be at or later than version v1.25. To check the version, enter kubectl version.

  • The node OS needs to be Linux
  • You need to exec commands in the host
  • You need to be able to exec into pods
  • Feature gate UserNamespacesStatelessPodsSupport need to be enabled.

In addition, support is needed in the container runtime to use this feature with Kubernetes stateless pods:

  • CRI-O: v1.25 has support for user namespaces.

Please note that if your container runtime doesn't support user namespaces, the new pod.spec field will be silently ignored and the pod will be created without user namespaces.

Run a Pod that uses a user namespace

A user namespace for a stateless pod is enabled setting the hostUsers field of .spec to false. For example:

apiVersion: v1
kind: Pod
metadata:
  name: userns
spec:
  hostUsers: false
  containers:
  - name: shell
    command: ["sleep", "infinity"]
    image: debian
  1. Create the pod on your cluster:

    kubectl apply -f https://k8s.io/examples/pods/user-namespaces-stateless.yaml
    
  2. Attach to the container and run readlink /proc/self/ns/user:

    kubectl attach -it userns bash
    

And run the command. The output is similar to this:

readlink /proc/self/ns/user
user:[4026531837]
cat /proc/self/uid_map
0          0 4294967295

Then, open a shell in the host and run the same command.

The output must be different. This means the host and the pod are using a different user namespace. When user namespaces are not enabled, the host and the pod use the same user namespace.

If you are running the kubelet inside a user namespace, you need to compare the output from running the command in the pod to the output of running in the host:

readlink /proc/$pid/ns/user
user:[4026534732]

replacing $pid with the kubelet PID.

Items on this page refer to third party products or projects that provide functionality required by Kubernetes. The Kubernetes project authors aren't responsible for those third-party products or projects. See the CNCF website guidelines for more details.

You should read the content guide before proposing a change that adds an extra third-party link.

Last modified February 22, 2023 at 9:09 AM PST: 更新编辑 (f4a7975)