Linux capabilities¶

What is Linux capability¶

Linux capability gives the developer the freedom to allow their developed binaries, which executed by non-root users, to perform privileged operations without the need to provide them all root permissions. There are close to 40 different capabilities supported by the Linux kernel. And docker uses 14 Linux capabilities as root user, see the chapter Linux capabilities in Docker.

What is the difference between privilege, capability and file permission¶

Before the concept of capabilities are created, there are only binary system of privileged and non-privileged processes. The conventional UNIX implementation differentiate this 2 categories: privileged process is referred as root or superuser, whose has the UID as 0 and unprivileged process as non-root user, whose UID is nonzero.

With capabilities, Linux allows the developer to breakdown the privileges of kernel into different small pieces of privileges, that can be allowed the non-root user to perform specific privileged task.

For instance, using CAP_FOWNER developer can assign non-root user to access certain file system in the root acting like a privileged file permission. Since CAP_FOWNER bypasses the file permission checks on UID of the process and UID of the file, so that non-root user can also access files in the root.

Why Linux capabilities¶

Linux capabilities can be assigned to a thread to ensure whether the thread can perform certain privileged actions. A thread can execute any part of the process code, programmed by the developer and the system. One or many threads run in the context of the process. The Linux capabilities break down the privileges of the root user into distinct units as capabilities. In this way capabilities provides more granular security control over various privileges actions for Linux kernel. Thus, Linux capabilities are useful when you want to restrict your own processes after performing privileged operations.

Reference Linux Capabilities

More information about linux capabilities is provided here: https://book.hacktricks.xyz/linux-hardening/privilege-escalation/linux-capabilities

How to use capabilities in Docker¶

There are 3 ways of using Docker capabilities:

Run containers as root with all the capabilities and try to manage capabilities manually at runtime
Run containers as root with limited capabilities and never change them in container runtime
Run containers as an unprivileged user with no capabilities

Option 1 should be avoided whenever it is possible, as attacker can easily abuse the capabilities, running malicious code inside the container and eventually escape container out onto the host as root user.

Option 2 runs with limited capabilities, adds in only those capabilities to make your application run correctly.

Option 3 runs non-root container images with no capabilities, which is the ideal way, but not realistic. As for some applications, it can be trickier and require more changes to the code than a few tweaks to the Dockerfile and some configurations.

Since the option 2 is the most realistic way to use capabilities in Docker, we will show in the HandsOn section how to find required capabilities you need in your own application and limit them by dropping all capabilities and add back the necessary ones in Dockerfile and docker-compose.yml.

There are close to 40 different capabilities in today's Linux kernel. By default Docker container uses 14 capabilities and runs as root. In the following chapters, we are going to explore that.

Linux capabilities in Docker¶

Docker uses 14 different Linux capabilities and runs as root by default. Following is a short description about those capabilities and what they are.

Linux Capabilities	Description
CAP_DAC_OVERRIDE	Bypasses write permission checks on any file, it is capable of read, write and execute any file without root permission
CAP_CHOWN	Changes the ownership of any files with UIDs/GIDs. For example it can change root password and escalate further privileges
CAP_FOWNER	Bypasses permission checks that normally require the UID of the process to match the UID of the file. It can change the permission of any file
CAP_SETUID	Sets the effective UID of the created process. It allows users to execute programs with higher privileges
CAP_SETGID	Sets the effective GID of the created process. Attacker can impersonate the Docker group and abuse it to communicate with the Docker socket and escalate privileges
CAP_SETFCAP	Allows to set other files' capabilities
CAP_SETPCAP	Allows to set other processes' capabilities
CAP_KILL	Kills any running process
CAP_NET_BIND_SERVICE	Listens to any port, also privileged port under 1024
CAP_NET_RAW	Allows the process to bind to any address within the available namespaces
CAP_SYS_CHROOT	Uses of the chroot system call to change the root directory
CAP_MKNOD	Creates special files using mknod
CAP_AUDIT_WRITE	Allows writing auditing logs to kernel
CAP_FSETID	Sets the GID bit for a file whose GID does not match the file system

Reference Linux capabilities

More about Linux capabilities you can find in the Linux manuel page: https://man7.org/linux/man-pages/man7/capabilities.7.html

The --privileged flag¶

Docker offers a specific --privileged flag option when running a container, or privileged: true in docker-compose. You should avoid using it any time. Since it sets CAP_SYS_ADMIN capability by default, which grants user to all 40 different linux capabilities, and gives full access to any other containers running on the same host.

Normally the --privileged flag is used to enable docker-in-docker, which is widely used for build tools and CI/CD systems running as container. The CI/CD Docker container needs access to Docker daemon in order to use Docker to build other container images.

To avoid using docker-in-docker for security reason you can use for example the Kaniko Docker image to build other images like it is recommended in GitLab documentation

Reference CI/CD Pipeline

More information about CI/CD in code.siemens is provided here: https://docs.gitlab.com/ee/ci/docker/using_kaniko.html

Examples of using capabilities¶

In the following examples, we will show how to use the capabilities when running docker container. We selected 3 different capabilities.

CAP_NET_BIND_SERVICE¶

CAP_NET_BIND_SERVICE gives the thread capability to bind a low-numbered below 1024 port. In this example we are going to use this capability to run container in 3 different ways as stated in How to use capabilities in Docker section.

Option 1 runs the docker container by default as root user, open your terminal and type:

# Start and run a docker container as default

docker run -d --net=host httpd

# Open another terminal and type

docker ps
>>> CONTAINER ID   IMAGE     COMMAND              CREATED         STATUS
>>> 5be0e1b5cdf3   httpd     "httpd-foreground"   6 seconds ago   Up 5 seconds

# Note it will be another Container ID and PID, use that instead

docker inspect --format '{{.State.Pid}}' 5be0e1b5cdf3
>>> 29420

# By default docker has 14 capabilities as root user

getpcaps 29420
>>> 29420:=cap_chown,cap_dac_override,...,cap_audit_write,cap_setfcap+ep

You can see there are 14 capabilities listed by running container as default. As for option 2 we first drop all the capabilities and add only the NET_BIND_SERVICE capability in order to make the application run correctly.

# Run docker container with only cap-net-bind-service capability

docker run --cap-drop=all --cap-add=NET_BIND_SERVICE  --net=host httpd

# Open another terminal and type

docker ps
>>> CONTAINER ID   IMAGE     COMMAND              CREATED          STATUS
>>> 1ff6f5ea671b   httpd     "httpd-foreground"   30 seconds ago   Up 30 seconds

docker inspect --format '{{.State.Pid}}' 1ff6f5ea671b
>>> 30196

# There is only one CAP_NET_BIND_SERVICE capability available in the container

getpcaps 30196
>>> 30196: = cap_net_bind_service+ep

The last option, option 3 runs docker container as non-root container images with no capabilities:

# Run docker container as non-root user

docker run --cap-add=NET_BIND_SERVICE -u nobody --net=host httpd
>>> (13)Permission denied: AH00072: make_sock: could not bind to address [::]:80
>>> (13)Permission denied: AH00072: make_sock: could not bind to address 0.0.0.0:80

This time the command returns an permission denied error code indicating it failed. This is because Docker does not yet support adding capabilities to non-root users and therefore cannot change the ownership of a file or directory.

CAP_CHOWN¶

Use CAP_CHOWN capability to change the ownership of file to non-root user by running container as default root user, as stated before Option 1 in How to use capabilities in Docker:

# In the terminal create a dummy file

nano dummy.txt

# Create the Dockerfile under the same directory as the terminal

FROM alpine:latest

WORKDIR /usr/src/app

COPY ./dummy.txt /usr/src/app/dummy.txt

RUN addgroup --system non-root-user && \
          adduser --system -G non-root-user non-root-user && \
          chown -R non-root-user:non-root-user /usr/src/app

# Build and Start Docker chown-test Container

docker build -t chown-test .

docker run -it chown-test

# Check the ownership of the dummy.txt file using "ls -RFlag"

/usr/src/app $ ls -RFlag
>>> ...
>>> -rw-rw-r--    1 non-root         8 Jun 22 10:18 dummy.txt

# Kill the docker container

/usr/src/app $ exit

The command works because the default behavior is for the container to be started with a root user. This root user has the chown capability by default.

Start another container and drop all capabilities for the root user other than the chown capability, as stated before Option 2 in the How to use capabilities in Docker section.

# Change the Dockerfile as the following

FROM alpine:latest

WORKDIR /usr/src/app

COPY ./dummy.txt /usr/src/app/dummy.txt

RUN addgroup --system non-root-user && \
          adduser --system -G non-root-user non-root-user

# Start Docker chown-test Container

docker run --cap-drop=all --cap-add=chown -it chown-test chown non-root-user /usr/src/app

This command gives no return code, indicating a successful run. The operation succeeds because although you dropped all capabilities for the container's root account, you added the chown capability back. The chown capability is all that is needed to change the ownership of a file

Now start another new container as a non-root user and only add the chown capability, as stated before Option 3 in the How to use capabilities in Docker section.

# Change the Dockerfile as following

FROM alpine:latest

WORKDIR /usr/src/app

COPY ./dummy.txt /usr/src/app/dummy.txt

RUN addgroup --system non-root-user && \
          adduser --system -G non-root-user non-root-user

USER non-root-user

And now build and run the docker container:

docker container run --cap-add chown -it chown-test chown non-root-user /usr/src/app
>>> chown: /: Operation not permitted

This time the command returns an error code indicating it failed. This is because Docker does not yet support adding capabilities to non-root users and therefore cannot change the ownership of a file or directory.

CAP_NET_RAW¶

The CAP_NET_RAW capability allows a thread to open a raw network socket, it is used especially with the ping tool. Here is an example how to use CAP_NET_RAW capability to run ping in a container:

# Create a Dockerfile

FROM alpine:latest
COPY ping /

HEALTHCHECK --interval=5s --timeout=3s \
    CMD ps aux | grep '[s]h ping' || exit 1

CMD ["sh", "ping"]

# Create a file named ping in the same directory


#!/bin/sh

echo ping ${HOSTNAME:=localhost} every ${TIMEOUT:=300} sec
while true; do ping -c 1 ${HOSTNAME}; sleep ${TIMEOUT}; done;

Start again with Option 1 and runs the docker container by default as root user, open your terminal and type:

# Build and start the container ping-test

docker build -t ping-test .

docker run ping-test
>>> ping 3753292ac8f1 every 300 sec
>>> PING 3753292ac8f1 (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms

# Open another terminal and type

docker ps
>>> CONTAINER ID   IMAGE       COMMAND     CREATED              STATUS
>>> 3753292ac8f1   ping-test   "sh ping"   About a minute ago   Up About a minute

docker inspect --format '{{.State.Pid}}' 3753292ac8f1
>>> 51332

# By default docker has 14 capabilities as root user

getpcaps 51332
>>> 51332:=cap_chown,cap_dac_override,...,cap_audit_write,cap_setfcap+ep

# Kill the running container

docker kill 3753292ac8f1

Same as before you can see there are 14 capabilities listed by running container as default. As for option 2 we first drop all the capabilities and add only the CAP_NET_RAW capability in order to make the application run correctly.

# Build and start the container ping-test

docker run --cap-drop=all --cap-add=net_raw ping-test
>>> ping dda4eb1054da every 300 sec
>>> PING dda4eb1054da (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms

# Open another terminal and type

docker ps
>>> CONTAINER ID   IMAGE       COMMAND     CREATED              STATUS
>>> dda4eb1054da   ping-test   "sh ping"   About a minute ago   Up About a minute

docker inspect --format '{{.State.Pid}}' dda4eb1054da
>>> 52784

# There is only one CAP_NET_RAW capability available in the container

getpcaps 52784
>>> 52784:=cap_net_raw+ep

The last option, option 3 runs docker container as non-root container images with no capabilities:

# Run docker container as non-root user

docker run --cap-add=net_raw -u nobody ping-test

>>> ping 32cdaf0304c1 every 300 sec
>>> PING 32cdaf0304c1 (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms

# Open another terminal and type

docker ps
>>> CONTAINER ID   IMAGE       COMMAND     CREATED              STATUS
>>> 32cdaf0304c1   ping-test   "sh ping"   About a minute ago   Up About a minute

docker inspect --format '{{.State.Pid}}' dda4eb1054da
>>> 67053

# There is no capability available in the container

getpcaps 67053
>>> 67053:=

From the result you can see that even by adding the CAP_NET_RAW capability to the non-root container explicitly, there is no capability available. However ping doesn't need any privileges and all user can use net.ipv4.ping_group_range parameter to create and open a raw socket.

Summary

In this section you have added and removed capabilities to a range of new containers. You have seen that capabilities can be added and removed from the root user of a container at a very granular level. You also learned that Docker does not currently support adding capabilities to non-root users.

In the HandsOn section, a more detailed workflow for dropping capabilities and identifying the required capabilities is implemented as a development guideline.

Security Audit¶

It is always recommended to setup security audit in order to monitor the configuration and changes made in Docker daemon and its associated files. So that you can test the configuration of the host operating system and Docker daemon and backtrack every changes made on the Docker daemon and it's host.

For example ensuring containers are restricted from acquiring new privileges by configuring the daemon config in "no-new-privileges": true line. It can prevent privilege escalation from inside containers and ensures no container can gain new privileges using setuid or setgid binaries.

Reference Security Audit

More about security auditing in Docker see: https://www.digitalocean.com/community/tutorials/how-to-audit-docker-host-security-with-docker-bench-for-security-on-ubuntu-16-04