OpenBSD Journal

It's Official: OpenBSD Helps Me Do Better Science

Contributed by jj on from the sudo-make-me-coffee-in-parallel dept.

Kristaps Dzonsons wrote in with an article about how OpenBSD helps him produce better research. Kristaps writes,

It's no secret that OpenBSD is an excellent research platform. From packages(7) for specialised software to out-of-the-box httpd(8), sshd(8), and so on, it's a no-brainer to pop OpenBSD onto a workstation and just get to work.

In this article, I explore how OpenBSD's clean code and sane defaults recently saved the day. For great science!
By way of background, I often [ab]use make(1) for generating visualisations from hundreds of datasets. It usually goes something like this:
  1. generate a set of parameters, like moments of a distribution;
  2. generate data files from these parameters;
  3. generate plotter input by applying parameters to a template;
  4. plot using the plotter input and data files.
make(1) ties it all together. Sometimes I manually maintain a Makefile; sometimes it's a tangle of .for loops and substitutions. For larger projects I generate the Makefile from a configure script. But in the end, it usually looks like this:
  parm1=0.01
  p1.png: p1.plot p1.dat
      gnuplot p1.plot
  p1.plot: template.plot
      sed "s!@OUT@!p1.png!;s!@IN@!p1.dat;s!@PARM@!$(parm1)!" $< >$@
  p1.dat: 
      longrunning >$@

This trivial example uses gnuplot(1) to generate an image p1.png from a data file, p1.dat, and plot input p1.plot. Assume that the longrunning utility consists of heavy number-crunching. Now imagine hundreds of data files and images... you get the point.

This is fabulously easy on OpenBSD. On a new box I pkg_add(1) the required tools, pull my templates and Makefile from an off-site cvs(1)) repository, then invoke make(1). Time from CD boot? A handful of minutes.

Meanwhile, a recent project was pushing my patience: the parameters needed lots of tweaking, with partial builds taking over ten minutes . I can only drink so many coffees per day (note: conjecture).

Each plain-text data set took from a few seconds up to minutes to generate. It then occurred to me that I could use a nearby backup machine to build dataset targets with ssh(1), since the output format of longrunning was machine-independent.
  p1.dat:
      ssh node1 longrunning ">$@"
      scp node1:$@ .

(Note: I shell-escape ">$@" for generality. In this snippet, sending directly to $@ is valid.)

This worked, although builds completed unevenly due to longrunning's random execution time. This didn't bug me: speed-up with rough edges is still speed-up, right? But when I tripled my simulations and was stuck guessing remote host load and baby-sitting builds, the hack had outlived its usefulness.

OpenBSD to the rescue!

Unlike the systems of choice for some researchers, where solutions to such problems involve despair and liquor, OpenBSD has a third option: /usr/src. Why not hack make(1) to distribute target builds on-the-fly? No need to hardcode remote hosts and endure load inequality. To the source!

make(1) had always seemed one of those utilities you just run and try not to think about. But it took only a few minutes to walk through usr.bin/make, starting with grep exec, and discover exactly where to play with build targets. Tricky bits, functions, even files themselves were well-documented in the source.

In no time at all, I added a new special source to whitelist distributable targets (.DIST) and some goop to send and receive dependencies and targets. The dispatcher was ready to go!

The bad news: I lost my coffee time.

The good news: distributed make(1) with a single patch to /usr/src. No need for NFS. No need for RPC---way too complicated for me. Just the clean make(1) code, a single patch on the local host, and password-less keys for sshd(8) on my build hosts. Throw in ControlMaster connections and my build times dropped arithmetically with the number of remote hosts.

In the end, my research capacity jumped from a small set of simulations to a set proportionate to my build cluster. Adding a new machine? Slap OpenBSD on the disk and add ssh(1) keys. Upgrading the dispatch machine? Re-apply the patch and get back to work.

The moral of the story is that the clean code and low barrier to entry, cd /usr/src/usr.bin/make ---not to mention the excellent default system installation---are invaluable tools. Scientific computing puts a lot of stress on computation, but there's more to it than algorithms and tuning: it's a process. And part of that process is starting dhcpd(8) on a secondary Ethernet device, attaching some boxes, and running them into the ground.

Is this feature useful for the general make(1)? No. Is it useful for me? Absolutely. Can it be improved? Sure. All for another day, and another adventure in /usr/src!

References:

Unofficial Distributed Build Extension of OpenBSD's bsdmake:
http://kristaps.bsd.lv/bsddistmake/

(Comments are closed)


Comments
  1. By Anonymous Coward (linkslice) sparctacus@gmail.com on

    You should also look at making your own siteXX.tgz files so that you don't have to do all that fancy patching stuff after install.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]