Linux capabilities¶
What is Linux capability¶
Linux capability gives the developer the freedom to allow their developed binaries, which executed by non-root users, to perform privileged operations without the need to provide them all root permissions. There are close to 40 different capabilities supported by the Linux kernel. And docker uses 14 Linux capabilities as root user, see the chapter Linux capabilities in Docker.
What is the difference between privilege, capability and file permission¶
Before the concept of capabilities are created, there are only binary system of privileged and non-privileged processes. The conventional UNIX implementation differentiate this 2 categories: privileged process is referred as root or superuser, whose has the UID as 0 and unprivileged process as non-root user, whose UID is nonzero.
With capabilities, Linux allows the developer to breakdown the privileges of kernel into different small pieces of privileges, that can be allowed the non-root user to perform specific privileged task.
For instance, using CAP_FOWNER developer can assign non-root user to access certain file system in the root acting like a privileged file permission. Since CAP_FOWNER bypasses the file permission checks on UID of the process and UID of the file, so that non-root user can also access files in the root.
Why Linux capabilities¶
Linux capabilities can be assigned to a thread to ensure whether the thread can perform certain privileged actions. A thread can execute any part of the process code, programmed by the developer and the system. One or many threads run in the context of the process. The Linux capabilities break down the privileges of the root user into distinct units as capabilities. In this way capabilities provides more granular security control over various privileges actions for Linux kernel. Thus, Linux capabilities are useful when you want to restrict your own processes after performing privileged operations.
Reference Linux Capabilities
More information about linux capabilities is provided here: https://book.hacktricks.xyz/linux-hardening/privilege-escalation/linux-capabilities
How to use capabilities in Docker¶
There are 3 ways of using Docker capabilities:
- Run containers as root with all the capabilities and try to manage capabilities manually at runtime
- Run containers as root with limited capabilities and never change them in container runtime
- Run containers as an unprivileged user with no capabilities
Option 1 should be avoided whenever it is possible, as attacker can easily abuse the capabilities, running malicious code inside the container and eventually escape container out onto the host as root user.
Option 2 runs with limited capabilities, adds in only those capabilities to make your application run correctly.
Option 3 runs non-root container images with no capabilities, which is the ideal way, but not realistic. As for some applications, it can be trickier and require more changes to the code than a few tweaks to the Dockerfile and some configurations.
Since the option 2 is the most realistic way to use capabilities in Docker, we will show in the HandsOn section how to find required capabilities you need in your own application and limit them by dropping all capabilities and add back the necessary ones in Dockerfile and docker-compose.yml.
There are close to 40 different capabilities in today's Linux kernel. By default Docker container uses 14 capabilities and runs as root. In the following chapters, we are going to explore that.
Linux capabilities in Docker¶
Docker uses 14 different Linux capabilities and runs as root by default. Following is a short description about those capabilities and what they are.
Linux Capabilities | Description |
---|---|
CAP_DAC_OVERRIDE | Bypasses write permission checks on any file, it is capable of read, write and execute any file without root permission |
CAP_CHOWN | Changes the ownership of any files with UIDs/GIDs. For example it can change root password and escalate further privileges |
CAP_FOWNER | Bypasses permission checks that normally require the UID of the process to match the UID of the file. It can change the permission of any file |
CAP_SETUID | Sets the effective UID of the created process. It allows users to execute programs with higher privileges |
CAP_SETGID | Sets the effective GID of the created process. Attacker can impersonate the Docker group and abuse it to communicate with the Docker socket and escalate privileges |
CAP_SETFCAP | Allows to set other files' capabilities |
CAP_SETPCAP | Allows to set other processes' capabilities |
CAP_KILL | Kills any running process |
CAP_NET_BIND_SERVICE | Listens to any port, also privileged port under 1024 |
CAP_NET_RAW | Allows the process to bind to any address within the available namespaces |
CAP_SYS_CHROOT | Uses of the chroot system call to change the root directory |
CAP_MKNOD | Creates special files using mknod |
CAP_AUDIT_WRITE | Allows writing auditing logs to kernel |
CAP_FSETID | Sets the GID bit for a file whose GID does not match the file system |
Reference Linux capabilities
More about Linux capabilities you can find in the Linux manuel page: https://man7.org/linux/man-pages/man7/capabilities.7.html
The --privileged flag¶
Docker offers a specific --privileged flag option when running a container, or privileged: true
in docker-compose. You should avoid using it any time. Since it sets CAP_SYS_ADMIN capability by default, which grants user to all 40 different linux capabilities, and gives full access to any other containers running on the same host.
Normally the --privileged flag is used to enable docker-in-docker, which is widely used for build tools and CI/CD systems running as container. The CI/CD Docker container needs access to Docker daemon in order to use Docker to build other container images.
To avoid using docker-in-docker for security reason you can use for example the Kaniko Docker image to build other images like it is recommended in GitLab documentation
Reference CI/CD Pipeline
More information about CI/CD in code.siemens is provided here: https://docs.gitlab.com/ee/ci/docker/using_kaniko.html
Examples of using capabilities¶
In the following examples, we will show how to use the capabilities when running docker container. We selected 3 different capabilities.
CAP_NET_BIND_SERVICE¶
CAP_NET_BIND_SERVICE gives the thread capability to bind a low-numbered below 1024 port. In this example we are going to use this capability to run container in 3 different ways as stated in How to use capabilities in Docker section.
Option 1 runs the docker container by default as root user, open your terminal and type:
# Start and run a docker container as default
docker run -d --net=host httpd
# Open another terminal and type
docker ps
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS
>>> 5be0e1b5cdf3 httpd "httpd-foreground" 6 seconds ago Up 5 seconds
# Note it will be another Container ID and PID, use that instead
docker inspect --format '{{.State.Pid}}' 5be0e1b5cdf3
>>> 29420
# By default docker has 14 capabilities as root user
getpcaps 29420
>>> 29420:=cap_chown,cap_dac_override,...,cap_audit_write,cap_setfcap+ep
You can see there are 14 capabilities listed by running container as default. As for option 2 we first drop all the capabilities and add only the NET_BIND_SERVICE
capability in order to make the application run correctly.
# Run docker container with only cap-net-bind-service capability
docker run --cap-drop=all --cap-add=NET_BIND_SERVICE --net=host httpd
# Open another terminal and type
docker ps
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS
>>> 1ff6f5ea671b httpd "httpd-foreground" 30 seconds ago Up 30 seconds
docker inspect --format '{{.State.Pid}}' 1ff6f5ea671b
>>> 30196
# There is only one CAP_NET_BIND_SERVICE capability available in the container
getpcaps 30196
>>> 30196: = cap_net_bind_service+ep
The last option, option 3 runs docker container as non-root container images with no capabilities:
# Run docker container as non-root user
docker run --cap-add=NET_BIND_SERVICE -u nobody --net=host httpd
>>> (13)Permission denied: AH00072: make_sock: could not bind to address [::]:80
>>> (13)Permission denied: AH00072: make_sock: could not bind to address 0.0.0.0:80
This time the command returns an permission denied error code indicating it failed. This is because Docker does not yet support adding capabilities to non-root users and therefore cannot change the ownership of a file or directory.
CAP_CHOWN¶
Use CAP_CHOWN capability to change the ownership of file to non-root user by running container as default root user, as stated before Option 1 in How to use capabilities in Docker:
# In the terminal create a dummy file
nano dummy.txt
# Create the Dockerfile under the same directory as the terminal
FROM alpine:latest
WORKDIR /usr/src/app
COPY ./dummy.txt /usr/src/app/dummy.txt
RUN addgroup --system non-root-user && \
adduser --system -G non-root-user non-root-user && \
chown -R non-root-user:non-root-user /usr/src/app
# Build and Start Docker chown-test Container
docker build -t chown-test .
docker run -it chown-test
# Check the ownership of the dummy.txt file using "ls -RFlag"
/usr/src/app $ ls -RFlag
>>> ...
>>> -rw-rw-r-- 1 non-root 8 Jun 22 10:18 dummy.txt
# Kill the docker container
/usr/src/app $ exit
The command works because the default behavior is for the container to be started with a root user. This root user has the chown
capability by default.
Start another container and drop all capabilities for the root user other than the chown
capability, as stated before Option 2 in the How to use capabilities in Docker section.
# Change the Dockerfile as the following
FROM alpine:latest
WORKDIR /usr/src/app
COPY ./dummy.txt /usr/src/app/dummy.txt
RUN addgroup --system non-root-user && \
adduser --system -G non-root-user non-root-user
# Start Docker chown-test Container
docker run --cap-drop=all --cap-add=chown -it chown-test chown non-root-user /usr/src/app
This command gives no return code, indicating a successful run. The operation succeeds because although you dropped all capabilities for the container's root
account, you added the chown
capability back. The chown
capability is all that is needed to change the ownership of a file
Now start another new container as a non-root user and only add the chown
capability, as stated before Option 3 in the How to use capabilities in Docker section.
# Change the Dockerfile as following
FROM alpine:latest
WORKDIR /usr/src/app
COPY ./dummy.txt /usr/src/app/dummy.txt
RUN addgroup --system non-root-user && \
adduser --system -G non-root-user non-root-user
USER non-root-user
And now build and run the docker container:
docker container run --cap-add chown -it chown-test chown non-root-user /usr/src/app
>>> chown: /: Operation not permitted
This time the command returns an error code indicating it failed. This is because Docker does not yet support adding capabilities to non-root users and therefore cannot change the ownership of a file or directory.
CAP_NET_RAW¶
The CAP_NET_RAW capability allows a thread to open a raw network socket, it is used especially with the ping
tool. Here is an example how to use CAP_NET_RAW capability to run ping in a container:
# Create a Dockerfile
FROM alpine:latest
COPY ping /
HEALTHCHECK --interval=5s --timeout=3s \
CMD ps aux | grep '[s]h ping' || exit 1
CMD ["sh", "ping"]
# Create a file named ping in the same directory
#!/bin/sh
echo ping ${HOSTNAME:=localhost} every ${TIMEOUT:=300} sec
while true; do ping -c 1 ${HOSTNAME}; sleep ${TIMEOUT}; done;
Start again with Option 1 and runs the docker container by default as root user, open your terminal and type:
# Build and start the container ping-test
docker build -t ping-test .
docker run ping-test
>>> ping 3753292ac8f1 every 300 sec
>>> PING 3753292ac8f1 (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms
# Open another terminal and type
docker ps
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS
>>> 3753292ac8f1 ping-test "sh ping" About a minute ago Up About a minute
docker inspect --format '{{.State.Pid}}' 3753292ac8f1
>>> 51332
# By default docker has 14 capabilities as root user
getpcaps 51332
>>> 51332:=cap_chown,cap_dac_override,...,cap_audit_write,cap_setfcap+ep
# Kill the running container
docker kill 3753292ac8f1
Same as before you can see there are 14 capabilities listed by running container as default. As for option 2 we first drop all the capabilities and add only the CAP_NET_RAW
capability in order to make the application run correctly.
# Build and start the container ping-test
docker run --cap-drop=all --cap-add=net_raw ping-test
>>> ping dda4eb1054da every 300 sec
>>> PING dda4eb1054da (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms
# Open another terminal and type
docker ps
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS
>>> dda4eb1054da ping-test "sh ping" About a minute ago Up About a minute
docker inspect --format '{{.State.Pid}}' dda4eb1054da
>>> 52784
# There is only one CAP_NET_RAW capability available in the container
getpcaps 52784
>>> 52784:=cap_net_raw+ep
The last option, option 3 runs docker container as non-root container images with no capabilities:
# Run docker container as non-root user
docker run --cap-add=net_raw -u nobody ping-test
>>> ping 32cdaf0304c1 every 300 sec
>>> PING 32cdaf0304c1 (172.17.0.2): 56 data bytes
>>> 64 bytes from 172.17.0.2: seq=0 ttl=64 time=1.225 ms
# Open another terminal and type
docker ps
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS
>>> 32cdaf0304c1 ping-test "sh ping" About a minute ago Up About a minute
docker inspect --format '{{.State.Pid}}' dda4eb1054da
>>> 67053
# There is no capability available in the container
getpcaps 67053
>>> 67053:=
From the result you can see that even by adding the CAP_NET_RAW
capability to the non-root container explicitly, there is no capability available. However ping doesn't need any privileges and all user can use net.ipv4.ping_group_range parameter to create and open a raw socket.
Summary
In this section you have added and removed capabilities to a range of new containers. You have seen that capabilities can be added and removed from the root user of a container at a very granular level. You also learned that Docker does not currently support adding capabilities to non-root users.
In the HandsOn section, a more detailed workflow for dropping capabilities and identifying the required capabilities is implemented as a development guideline.
Security Audit¶
It is always recommended to setup security audit in order to monitor the configuration and changes made in Docker daemon and its associated files. So that you can test the configuration of the host operating system and Docker daemon and backtrack every changes made on the Docker daemon and it's host.
For example ensuring containers are restricted from acquiring new privileges by configuring the daemon config in "no-new-privileges": true
line. It can prevent privilege escalation from inside containers and ensures no container can gain new privileges using setuid
or setgid
binaries.
Reference Security Audit
More about security auditing in Docker see: https://www.digitalocean.com/community/tutorials/how-to-audit-docker-host-security-with-docker-bench-for-security-on-ubuntu-16-04