13 Aug 22

Debugging eBPF-enabled programs in Docker

These days, I'm adding XDP offloading to l2tpns, a L2TP server used in production by several non-profit ISPs in France.

While doing that, I need to test if l2tpns can successfully load XDP programs into the kernel. But I don't want to run that directly on my Debian host: it might break network connectivity, and in addition l2tpns is updating the routing table of the kernel. So, let's just run l2tpns in Docker and allow it to break things! It turns out to be not so easy.

eBPF and XDP

As a reminder, XDP is a kernel mechanism that allows you to load custom eBPF programs that will execute right in the network device driver. You write your eBPF program in C, load it in the kernel from userspace with a simple system call, and from that point on, your program can process network packets in the kernel, before the rest of the kernel has even started parsing the packets! For a project like l2tpns, this is extremely powerful, fast and flexible, because we should be able to offload the bulk of encapsulation/desencapsulation work to the kernel while keeping a lot of flexibility.

That being said, the eBPF ecosystem is still young and is moving fast, and the whole software architecture to make this work is actually very complex. In the end, you always end up with weird errors that can be hard to track down, and especially when trying to run XDP in Docker!

What I want to debug

In this case, I'm extending l2tpns so that it loads XDP programs on network interfaces when it starts. The basic process looks like this with libbpf (error handling omitted):

char[] xdp_filename = "/path/to/xdp_prog.o";
char[] if_name = "eth0";
__u32 ifindex;
int prog_fd = -1;
struct bpf_object *obj;
__u32 xdp_flags = 0;

// Load XDP program into the kernel
bpf_prog_load(xdp_filename, BPF_PROG_TYPE_XDP, &obj, &prog_fd);

// Find network interface by name
ifindex = if_nametoindex(if_name);

// Attach XDP program to network interface
bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags);

See the xdp-tutorial repository for more complete examples, but as a starting point this is the basic functionality I want to debug in Docker.

Most programs manipulating eBPF are leveraging libbpf to do the hard work. As such, the debugging steps below can be generalized to any eBPF-enabled userspace program.

Basic Docker setup

To keep things simple, I only want to run l2tpns in a container. I will keep developing and building on my Debian host. So, let's get started with a simple Dockerfile that installs the required libraries and creates a minimum config to make l2tpns happy:

# Dockerfile used to test l2tpns during development.
# Do not use in production!

FROM debian:bullseye

RUN mkdir -p /etc/l2tpns; echo "10.10.10.0/24" > /etc/l2tpns/ip_pool
RUN apt update && apt install -y libbpf0 libcli1.10 iproute2

WORKDIR /src

VOLUME /src

ENTRYPOINT ["/src/l2tpns"]

My Debian host is running Bullseye, so I use the same distro in the container to make sure I have the same libraries.

Build the image from the Dockerfile:

$ docker build - -t l2tpns:latest < Dockerfile

Then give it a try (from the host, in the l2tpns git repository):

$ make -j4
# To send all logs to stderr
$ sed -i -e 's/set log_file/#set log_file/' etc/startup-config.default
# Run docker image with parameters
$ docker run -it --rm -v $PWD:/src l2tpns:latest -c etc/startup-config.default

This yields an error:

Can't open /dev/net/tun: No such file or directory

Ok, this first error is unrelated to XDP: l2tpns needs to create a tun interface and it cannot. Let's fix this:

$ docker run -it --rm -v $PWD:/src --cap-add=NET_ADMIN --device=/dev/net/tun l2tpns:latest -c etc/startup-config.default

Now we start seeing the interesting stuff:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y)
and/or that RLIMIT_MEMLOCK is set to big enough value.

From this point on, I will omit the tun-related options from the examples, but for the specific case of l2tpns they are still needed

Allowing the BPF syscall

Obviously, to load a eBPF program into the kernel, you need to do a syscall at some point. This is role of the BPF syscall, that is also used for other eBPF-related functionalities.

There is a new CAP_BPF capability that enables the BPF syscal for unprivileged users. This was introduced in Linux 5.8 according to capabilities(7), which is good because Debian bullseye runs a 5.10 kernel. Let's try:

$ docker run -it --rm -v $PWD:/src --cap-add=BPF l2tpns:latest -c etc/startup-config.default

Result:

docker: Error response from daemon: invalid CapAdd: unknown capability: "CAP_BPF".

Crap. Maybe my Docker version is too old to know about this capability. Let's just use a bigger hammer and settle for CAP_SYS_ADMIN, which gives a lot of priviledges, including BPF:

$ docker run -it --rm -v $PWD:/src --cap-add=SYS_ADMIN l2tpns:latest -c etc/startup-config.default

Result:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y)
and/or that RLIMIT_MEMLOCK is set to big enough value.

Well, this is the exact same error as before!

Configuring limits in the container

Helpfully, the error message mentions something about the "memlock" limit. Let's have a look at the limits in a simple Debian bullseye container:

$ docker run -it --rm debian:bullseye /bin/sh -c "ulimit -a"

Since ulimit is a shell builtin, we cannot run it directly as the command from Docker.

Result:

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        8192
coredump(blocks)     unlimited
memory(kbytes)       unlimited
locked memory(kbytes) 64
process              unlimited
nofiles              1048576
vmemory(kbytes)      unlimited
locks                unlimited
rtprio               0

We are interested in the "locked memory" limit. 64 KB is indeed on the low side (try comparing this value with your host system).

Looking at the relevant Docker documentation, we find there's an option we can pass to Docker to raise this limit:

$ docker run -it --rm --ulimit memlock=1073741824 debian:bullseye /bin/sh -c "ulimit -l"
1048576

That looks much better! Now on the real container:

$ docker run -it --rm -v $PWD:/src --ulimit memlock=1073741824 --cap-add=SYS_ADMIN l2tpns:latest -c etc/startup-config.default
libbpf: map 'sessions_table': failed to create: Invalid argument(-22)

Ok, we still have an error, but it looks application-specific (libbpf fails to create a map that is defined in the l2tpns code).

EDIT 2022-08-15: it turned out to be indeed a programming error: BPF array maps MUST have a 32-bits key size and I was trying to create a map with a 16-bits key size. It's hard to debug because there is no detailed error reporting, the syscall simply fails with EINVAL. Here is what strace is seeing, not really helpful:

bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=2, value_size=20,
                     max_entries=60000, map_flags=0, inner_map_fd=0,
                     map_name="sessions_table", map_ifindex=0, btf_fd=0,
                     btf_key_type_id=0, btf_value_type_id=0,
                     btf_vmlinux_value_type_id=0},
    72)
  = -1 EINVAL (Invalid argument)

After fixing this bug, libbpf happily creates the map in the kernel:

libbpf: map 'sessions_table': created successfully, fd=8

Conclusion

So far, after a bit of efforts, I could get basic BPF functionalities to work in a Docker container for debugging purposes! Of course, for further debugging, you would need tools such as bpftools to dump the XDP programs, observe the behaviour of the program by sending packets to the interface, and so on. But this part of the work should be quite similar whether using Docker or not. If this turns out to be more difficult than expected, I will update the article!