Skip to content

Capabilities

Capabilities are used to break down the super privileges enjoyed by the root user into fine-grained permissions. This means that even as a root user, you are not able to do whatever you want unless you have been granted the corresponding capabilities.

Prepare rootfs

We'll need to install some additional tool (libcap) to explore the capabilities, so here some instruction of how to prepare such a rootfs.

First, create a docker container with libcap installed,

sudo docker run -it alpine sh -c 'apk add -U libcap; capsh --print'

Use docker ps -a to find out the container ID of the one we just ran; it should be the latest one.

Then, export the rootfs to create a runc runtime bundle.

mkdir rootfs
docker export $container_id | tar -C rootfs -xvf -
runc spec

Capability

To understand what capabilities are: Using the default config.json generated from runc spec, you are not allowed to set the hostname, even as root.

$ sudo runc run xyxy67
/ # id
uid=0(root) gid=0(root)
/ # hostname cool
hostname: sethostname: Operation not permitted

That's because setting the hostname requires the CAP_SYS_ADMIN capability, even for the root user. We can add this capability by including CAP_SYS_ADMIN in the bounding, permitted, and effective lists of the capabilities attribute for the init process.

Run another container with the new configuration, and now you will be allowed to set the hostname.

$ sudo runc run xyxy67
/ # hostname
runc
/ # hostname hello
/ # hostname
hello
/ #

Run another command in the same container, and it will able to set hostname as well since it inherits the capability of the init process.

$ sudo runc exec -t xyxy67 /bin/sh
[sudo] password for binchen:
/ # hostname
hello
/ # hostname good
/ # hostname
good

Get capability

Get the PID of the two processes in the runtime PID namespace.

$ sudo runc ps xyxy67
UID        PID  PPID  C STIME TTY          TIME CMD
root     26002 25993  0 11:42 pts/0    00:00:00 /bin/sh
root     26059 26051  0 11:43 pts/1    00:00:00 /bin/sh

Install pscap on the host:

sudo apt-get install libcap-ng-utils

Check capabilities of the running process using the pids in the host namespace.

$ pscap | grep "26059\|26002"
25993 26002 root        sh                kill, net_bind_service, sys_admin, audit_write
26051 26059 root        sh                kill, net_bind_service, sys_admin, audit_write

And we can confirm those two process has the sys_admin capability.

Request additional capability

The exec can require additional caps that don't exist in the config.json.

Run another container xyxy78 without the CAP_SYS_ADMIN in the config.json.

Double check it indeed doesn't have the CAPS.

$ sudo runc ps xyxy78
UID        PID  PPID  C STIME TTY          TIME CMD
root     27385 27376  0 11:57 pts/0    00:00:00 /bin/sh
$ pscap | grep 27385
27376 27385 root        sh                kill, net_bind_service, audit_write

Start another process in xyxy78 but with additional CAP_SYS_ADMIN capability, using --cap option.

sudo runc exec --cap CAP_SYS_ADMIN xyxy78 /bin/hostname cool

Under the hood, the --cap option sets up the capability list for the process that will be executed, similar to how these settings are established in the config.json for the init process.

capsh

You can use capsh explore a little bit more.

Run capsh --print inside of the container.

This is the output with default config.json:

# capsh --print
Current: = cap_kill,cap_net_bind_service,cap_audit_write+eip
Bounding set =cap_kill,cap_net_bind_service,cap_audit_write
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=

This is the output with the added CAP_SYS_ADMIN capability. Compared with the previous one, we can see an additional cap_sys_admin+ep in the "Current" section and cap_sys_admin in the "Bounding Set". The "+ep" indicates that the preceding capabilities are in both the "effective" and "permitted" lists. For more information regarding the capability list, see capabilities.

# capsh --print
Current: = cap_kill,cap_net_bind_service,cap_audit_write+eip cap_sys_admin+ep
Bounding set =cap_kill,cap_net_bind_service,cap_sys_admin,cap_audit_write
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=

Summary

We investigated how Linux capability is used to limit the things a process can do and thus increase the security of the container.