Contributed by tj on from the go-with-the-openflow dept.
When I heard that Martin Pieuchot (mpi@) was looking for a place to hold another mini-hackathon for three to four people to work on multiprocessor (MP) enhancements of the network stack, I offered to come to our work place in Hannover, Northern Germany. We have space, gear, fast Internet and it is easy to reach for the involved people. Little did I know that it would quickly turn into n2k15, a network hackathon with 20 attendees from all over the world.
I have organized a hackathon before when we had the k2k6 in Kransberg, the castle of a German Internet pioneer. But this was a long time ago and I didn’t have to run it in my own place. I’m talking about the “K14” – the spare office space of Esdenera Networks that we’re renting out for coworkers to get a little activity and creativity into our daily work lives. After my previous job I told myself that I never wanted to work in a boring business-like office anymore – the type with long floors, glimmering neon lights, bad coffee or even shabby cubicles. I’m glad that I’ve found a pretty decent place, but I never expected to host an actual OpenBSD hackathon in there.
If you ever hosted such an event or a party for many guests, you will know the dilemma of the host: you’re constantly concerned about your guests enjoying it, you have to take care about many trivial things, other things will break, and you get little to no time to attend or even enjoy it yourself. Fortunately, I had very experienced and welcomed guests: only one vintage table and a vase broke – the table can be fixed – and I even found some time for hacking myself. I have to mention that this wouldn’t have been possible without all the help by Mike (mikeb@), Bret (blambert@), Jan Schreiber, Malte Schalk and all the others who volunteered preparing, setting up, and tearing down the event. While giving credits: Nick Böse made the artwork and Timm Markgraf the pictures that you can find under http://k14.space/n2k15.
But let’s get back to the social part. For the mid-hackathon social event, we visited the Christmas Market in Hannover’s old town. I took the OpenBSD crowd, who came from all over Europe, USA, Canada, Japan, and Australia, to the traditional street market that is happening every year in December to feed them some Glühwein (hot mulled wine with an extra shot) and food. In the K14 itself I was busy giving quick introductions to our coffee machines, especially about using the portafilter. Coworking, Espresso, vintage stuff – you got me.
For the technical part, I worked on two things: vmd(8) and the switch. Some time ago over a beer, when Mike Larkin (mlarkin@) mentioned his plans to implement a hypervisor for OpenBSD the first time, I got all excited and offered help on the userland and networking side. After he committed his initial implementation a few weeks ago, I literally jumped on vmd(8), the virtual machine daemon that is running the userland part of vmm(4)-controlled VMs. I sometimes have the privilege to work on new things in OpenBSD and with Mike’s and Theo’s “blanket endorsement”, I could just move forward and implement our plans for vmd(8) with many subsequent commits.
The daemon vmd(8) is accompanied by a tool vmctl(8) to control and monitor the daemon on runtime – it was previously called host… err… vmmctl. The daemon manages the virtual machines by running the VM processes in userland, controlling the virtual machine monitor vmm(4) in the kernel, handling the device I/O and VM exits from vmm(4), as well as configuring and setting up the VMs. For all the details about vmm(4) and VM-specific parts in userland, you better have to ask Mike Larkin, as he is the architect, deserves all the credits, and I only handle the infrastructure and configuration part.
Mike’s initial vmd(8) already came with some built-in dropping of privileges, by having the privileged master process and unprivileged threaded child-processes per VM. I split the master process into three pieces: the privileged “parent” (or vmd) process that opens disks and devices, the unprivileged “vmm” process that talks to the kernel side of vmm(4), creates and monitors new VM processes, and the unprivileged control process that accepts connections from vmctl(8) on the control socket. All processes use pledge(2) to restrict the allowed system operations, and the unprivileged processes run as user “_vmd” and chrooted to /var/empty. The pledge(2) part is not quite true: the “vmm” and VM processes don’t use pledge(2) yet, as they need the vmm-specific ioctls that aren’t allowed by any of the supported “promises” – but I have a diff that’d allow to pledge “stdio vmm”. The daemon has to open disk images, the kernel and tap(4) network interfaces, but instead of doing it in the “vmm” master process directly, I moved this to the “parent” process that opens and passes up the file descriptors.
I added many new features to vmd(8) and vmctl(8), like a configuration file format vm.conf(5) that includes virtual machine specifications in a human-readable style that became very typical for OpenBSD. I initially implemented the configuration format in vmctl(8), but I took some time at n2k15 to move it to vmd(8) directly. The daemon now loads the optional configuration file on startup. The vmctl(8) tool will still allow to start new virtual machines on the command line, without loading a configuration file, but all the advanced options will go into vm.conf(5). In addition to some tweaking and cleaning, I added some groundwork that will be needed for items on the TODO list: start pre-configured VMs from the command line, run instances of configured VMs, track permissions and allow users to run their own VMs, change the interface configuration, and assign interfaces to switches.
The vmctl(8) tool got some tweaks and I changed the command line parser twice. Mike’s initial tool was very basic and used a few getopt(3) arguments to start a VM. This was fine, but I saw the risk that it could turn into something like qemu --without-long-opts or any comparable tool that demands you to remember numerous letters and even getsubopt(3) CSV-like lists. I first changed it into a CLI-style format comparable to the one of bgpctl(8), our networking daemons, or even relayctl(8), but it wasn’t very appreciated by getopt/POSIX-purists in our group. So I changed it again into a style that takes a keyword and argument followed by getopts, similar to Xen’s xl but without long options.
Evolution of vmctl(8):# vmmctl -S -m 512 -n 1 -b /some/path/disk.img -k /some/path/bsd # vmmctl start "myvm" memory 512M interfaces 1 disk disk.img kernel /bsd # vmctl start "myvm" -m 512M -i 1 -d disk.img -k /bsd
Any complicated parts will be restricted to vm.conf(5), and the keyword namespace allows us to get around getsubopt(3). While I still think that the second version is more intuitive to use, I have to admit that the last version looks cleaner.
So what is this “switch” about? A few days before the hackathon, I talked with Masahiko Yasuoka (yasuoka@) and Kazuya Goda (goda@) about our bridge(4). With the MP network stack overhaul it became obvious that our bridge needs some updates and cleanup. It is some proven and reliable code that was written by Jason Wright (formerly jason@ the Wookie) almost 17 years ago. Much iteration and numerous improvements later, the bridge is at its core still based on the same code. Old is not bad, but it wasn’t built for a MP networking stack and done before anyone talked about “virtual switches”, flow tables, or split data and control planes for such things. People were looking into supporting “Open vSwitch” (OVS), and Goda actually ported it, but the costs of adding the complex kernel layer of OVS to OpenBSD was just too high and with questionable licensing (the Apache 2 license is not acceptable for us). So we were reconsidering to further modernize the bridge(4). Goda’s OVS work helped to understand what we really need and I came up with a simple idea: we don’t need it or another virtual switch, we just need a controller to offload the “control plane”. OpenBSD is already doing bridging, VXLANs, VLANs, STP, routing domains and many other things in the kernel, so why should we move it to yet another complex daemon? All we need is a controller daemon and a well-defined, pluggable interface to handle the forwarding decisions from bridge(4) in the daemon and the Cloud: OpenFlow.
Fortunately, I had started such a simple, privilege separated OpenFlow controller some time ago, but I never released it because it wasn’t complete, not comparable to any of the big controllers, and I didn’t have an actual use case in OpenBSD for it. It only provided a simple learning switch that works with Open vSwitch or OpenFlow-enabled HP (HPE) switches. I also didn’t find a satisfying name for it, as “OpenFlow™” is an open protocol but also a very strict trademark and calling it openflowd would violate their trademark policy (at least in the Land of the Free). I don’t use funny or pet names for software, and OpenWolf or sdnflowd simply didn’t work. After talking with Yasuoka and Goda I renamed it to “switchd(8)”. Following the idea of using the OpenFlow protocol itself as our new kernel interface, Yasuoka and Goda worked on “bridgeofp” and managed to get it working as a simple layer 2 switch. This is a very brief summary, as the code hasn’t been released yet, but watch out over the next few months what is coming. We’ll need it for many things, including the distributed virtual switching for vmd(8).
I did enjoy the hackathon! Thanks to everyone who attended, especially to Jonathan Matthew (jmatthew@) who cramped himself into airplanes all the way from Brisbane to visit Germany’s most underrated city. And of course to all our users who support OpenBSD, the donations allowed the OpenBSD Foundation to cover hotel costs for a number of developers.
Thanks very much for the detailed report, Reyk!
(Comments are closed)