Skip to content

Network

When it comes to container networking, the OCI runtime spec does nothing more than creating or joining a network namespace. All other tasks are left to be handled using hooks, which allow you to inject into different stages of the container runtime and perform some customization.

With the default config.json, you will only see a loop device, not an eth0 that you normally see on the host, which allows you to communicate with the outside world. However, we can set up a simple bridge network using netns as the hook.

Download netns and copy the binary to /usr/local/bin, as assumed by the following config.json. It's worth noting that the hooks are executed in the runtime namespace, not the container namespace. This means, among other things, that the hooks binary should reside on the host system, not in the container. Therefore, you don't need to put netns into the container rootfs.

Setup bridge network using netns

Make the following changes to config.json. In addition to the hooks, we also need the CAP_NET_RAW capability so that we can use ping inside the container for basic network checks.

binchen@m:~/container/runc$ git diff
diff --git a/config.json b/config.json
index 25a3154..d1c0fb2 100644
--- a/config.json
+++ b/config.json
@@ -18,12 +18,16 @@
                        "bounding": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
-                               "CAP_NET_BIND_SERVICE"
+                               "CAP_NET_BIND_SERVICE",
+                               "CAP_NET_RAW"
                        ],
                        "effective": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
-                               "CAP_NET_BIND_SERVICE"
+                               "CAP_NET_BIND_SERVICE",
+                               "CAP_NET_RAW"
                        ],
                        "inheritable": [
                                "CAP_AUDIT_WRITE",
@@ -33,7 +37,9 @@
                        "permitted": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
-                               "CAP_NET_BIND_SERVICE"
+                               "CAP_NET_BIND_SERVICE",
+                               "CAP_NET_RAW"
                        ],
                        "ambient": [
                                "CAP_AUDIT_WRITE",
@@ -131,6 +137,16 @@
                        ]
                }
        ],
+
+       "hooks":
+               {
+                       "prestart": [
+                               {
+                                       "path": "/usr/local/bin/netns"
+                               }
+                       ]
+               },
+
        "linux": {
                "resources": {
                        "devices": [

start a container with this new config.

Inside the container, we find an eth0 device, in addition to a loop device that is always there.

/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 8E:F3:5C:D8:CA:2B
          inet addr:172.19.0.2  Bcast:172.19.255.255  Mask:255.255.0.0
          inet6 addr: fe80::8cf3:5cff:fed8:ca2b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:21992 errors:0 dropped:0 overruns:0 frame:0
          TX packets:241 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2610155 (2.4 MiB)  TX bytes:22406 (21.8 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:498 (498.0 B)  TX bytes:498 (498.0 B)

And, you will be able to ping (*) outside world.

/ # ping 216.58.199.68
PING 216.58.199.68 (216.58.199.68): 56 data bytes
64 bytes from 216.58.199.68: seq=0 ttl=55 time=18.382 ms
64 bytes from 216.58.199.68: seq=1 ttl=55 time=17.936 ms

Notes: 216.58.199.68 is one IP of google.com. If we had set up the DNS namesever (e.g echo nameserver 8.8.8.8 > /etc/resolv.conf), we would have been able to ping www.google.com.

So, how it works?

Bridge, Veth, Route, and iptable/NAT

When a hook is called, the container runtime passes the container's state to the hook. This includes the PID of the container (in the runtime namespace). The hook, Netns in this case, uses this PID to determine the network namespace in which the container is supposed to run. With this PID, netns performs a few tasks:

1) It creates a Linux bridge with the default name netns0 (if one doesn't already exist). It also sets up the MASQUERADE rule on the host. 2) It creates a veth pair, connects one endpoint of the pair to the bridge netns0, and places the other one (renamed to eth0) into the container's network namespaces. 3) It allocates and assigns an IP to the container interface (eth0) and sets up the Route table for the container.

We'll soon delve into the details of the above-mentioned tasks. But first, let's start another container with the same config.json. This should make things clearer and more interesting than having just one container.

  • bridge and interfaces

A bridge netns0 is created and two interfaces are associated with it. The name of the interface follows the format of netnsv0-$(containerPid).

$ brctl show netns0
bridge name    bridge id        STP enabled    interfaces
netns0        8000.f2df1fb10980    no        netnsv0-8179
                                             netnsv0-10577

As we explained before netnsv0-8179 is one endpoint of the veth pair, connecting to the bridge; the other endpoint is inside of the container 8179. Let's find it out.

  • veth pair

On the host, we can see the peer of netnsv0-8179 is index 7

$ ethtool -S netnsv0-8179
NIC statistics:
     peer_ifindex: 7

And in the container 8179, we can see the eth0's index is 7. It confirms that the eth0 in container 8179 is paired with netnsv0-8179 in the host. Same is true for netnsv0-10577 and the eth0 in container 10577.

/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue qlen 1000
    link/ether 8e:f3:5c:d8:ca:2b brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.2/16 brd 172.19.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8cf3:5cff:fed8:ca2b/64 scope link
       valid_lft forever preferred_lft forever

So far, we have seen how a container is connected to host virtul bridge using veth pair. We have the network interfaces but still need a few more setups: Route table and iptable.

Route Table

Here is the route table for In container 8179:

/ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         172.19.0.1      0.0.0.0         UG    0      0        0 eth0
172.19.0.0      *               255.255.0.0     U     0      0        0 eth0

We can see the all traffic will goes through eth0 to the gateway, which is the bridge netns0, as shown by:

# in container
/ # ip route get 216.58.199.68 from 172.19.0.2
216.58.199.68 from 172.19.0.2 via 172.19.0.1 dev eth0

In the host:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192-168-1-1     0.0.0.0         UG    0      0        0 wlan0
172.19.0.0      *               255.255.0.0     U     0      0        0 netns0
192.168.1.0     *               255.255.255.0   U     9      0        0 wlan0
192.168.122.0   *               255.255.255.0   U     0      0        0 virbr0

Also:

# on host
$ ip route get 216.58.199.68 from 172.19.0.1
216.58.199.68 from 172.19.0.1 via 192.168.1.1 dev wlan0
    cache

The 192.168.1.1 is the ip of my home route, which is a real bridge.

Piece together the route in the container, we can see when ping google from the container, the package will go to the virtual bridge created by the netns first, and then goes to the real route gateway at my home, and then into the wild internet and finally to one of the goole servers.

Iptable/NAT

Another change made by the netns is to set up the MASQUERADE target, that means all traffic with a source of 172.19.0.0/16 will be MASQUERADE or NAT-ed with the host address so that outside can only see the host (ip) but not the container (ip).

# sudo iptables -t nat --list
Chain POSTROUTING (policy ACCEPT)
target     prot  opt source               destination
MASQUERADE  all  --  172.19.0.0/16        anywhere

Port forward/DNAT

With Ip MASQUERADE, the traffic can goes out from the container to the internet as well as the return traffic from the same connection. However, for conatiner to accept incoming connections, you have set up the port forwarding using iptable DNAT target.

In container:

/ # nc -p 10001 -l

port map: host:100088 maps to container xyxy12:1024

iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp --dport 100088 -j DNAT --to-destination 172.19.0.8:10001

host

echo the host says HI |  nc localhost 5555

Share network namespace

To join the network namespace of another container, set up the network namespace path pointing to the one you want to join. In our example, we'll join the network namespace of container 8179.

{
-                "type": "network"
+                "type": "network",
+                "path": "/proc/8179/ns/net"

Remember to remove the prestart hook, since we don't need to create a new network interface (veth pair and route table) this time.

Start a new container, and we'll find that the new container has the same eth0 device (as well as same ip) with the container 8179 and the route table is same as the one in container 8179 since they are in the same network namespace.

/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 8E:F3:5C:D8:CA:2B
          inet addr:172.19.0.2  Bcast:172.19.255.255  Mask:255.255.0.0
          inet6 addr: fe80::8cf3:5cff:fed8:ca2b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22371 errors:0 dropped:0 overruns:0 frame:0
          TX packets:241 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2658017 (2.5 MiB)  TX bytes:22406 (21.8 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:498 (498.0 B)  TX bytes:498 (498.0 B)

So, despite being in different containers, they share the same network device, route table, port numbers and all the other network resources. For example, if you start a web service in container 8179 port 8100 and you will be able to access the service in this new container using localhost:8100.

Summary

We've seen how to use netns as a hook to set up a bridge network for our containers, enabling them to communicate with the internet and each other. In diagram form, we've set up something like this:

+---------------------------------------------------------+
|                                                         |
|                                                         |
|                                      +----------------+ |
|                                      |   wlan/eth0    +---+
|                                      |                | |
|                                      +---------+------+ |
|                                                |        |
|                                          +-----+----+   |
|                                    +-----+route     |   |
|                                    |     |table     |   |
|                                    |     +----------+   |
|    +-------------------------------+----------+         |
|    |                                          |         |
|    |                bridge:netns0             |         |
|    |                                          |         |
|    +-----+-----------------------+------------+         |
|          | interface             | interface            |
|    +-----+-----+          +------+----+                 |
|    |           |          |10:netnsv0 |                 |
|    |8:netnsv0- |          +-10577@if9 |                 |
|    |8179@if7   |          |           |                 |
|    +---+-------+          +----+------+                 |
|        |                       |                        |
|        |                       |                        |
| +-----------------+     +-----------------+             |
| |      |          |     |      |          |             |
| |  +---+------+   |     | +----+------+   |             |
| |  |          |   |     | |           |   |             |
| |  |7:eth0@if8|   |     | | 9:eth0@if10   |             |
| |  |          |   |     | |           |   |             |
| |  |          |   |     | |           |   |             |
| |  +----------+   |     | +-----------+   |             |
| |                 |     |                 |             |
| |  c8179          |     |  c10577         |             |
| +-----------------+     +-----------------+             |
|                                                         |
+---------------------------------------------------------+