Distributed Package Builds (dpb3) Marc Espie

Contributed by jcr on 2010-06-18 from the go-very-fast dept.

If you are a smart OpenBSD user and follow the suggestion from the FAQ to use packages, then you've probably set your PKG_PATH, and already witnessed the joy of pkg_add and friends. While watching the magic of the OpenBSD packges system, you may have wondered, "Where do all these packages come from?"

The literal answer is Theo's basement where the ports build machines for the project live.

The super secret but far more informative answer is little known tool called dpb (Distributed Ports Build), combined with a number of great people running groups of machines to find and fix as many ports build problems as possible. This is how the ports build machines in Theo's basement are kept running, and running well. As an regular end user or even system admin managing tons of OpenBSD machines, you may have never had the need to do complete builds of the ports tree, or even heard of dbp, let alone know about the recent work being done to improve it (dpb3). Marc Espie (espie@) was kind enough to tell us about his work on improving dpb and how it helps to improve the entire ports tree.

Unfortunately, due to limitations of the cgi to display man pages on the web, I can't give you a link to the dpb manual. More importantly, since dpb lives in the ports tree, it's not part of the base installation, so simply typing man dpb will not help. If you have ports tree source on your system, you can do the following to view the dpb man page:

$ mandoc -Tascii /usr/ports/infrastructure/build/dpb3.1 | less

Or you can move the dpb3.1 man page file or adjust your manpath.

Marc Espie (espie@)

Before dpb3

A long time ago, Nikolay Sturm wrote the original dpb (Editor's Note: the original dpb is still in the ports tree). It was good enough, and every few months, I would say "oh, I really should take a look, it can probably be improved". But the sheer complexity of the startup time was hampering me. I would start to do small improvements, nicknaming the result "dpb2", and then I would forget about it, or give up because it was so unwieldy.

You have to realize that Nikolay did the best he could with the tools he had at the time. When I started revamping the ports tree, we didn't know a lot about dependencies, how to handle them simply, or how to coordinate a full ports tree. It was a few years before binary package updates, and the most rad tools where print-*-depends together with tsort.

Over the years, things got simpler. Specifically, we moved to uniform SUBDIR format all over the place, and then the huge simplification that's current MULTI_PACKAGES (this happened at a ports hackathon, btw, after talking to people about how MULTI_PACKAGES were really hard to get right, hence the *DEPENDS-subpackage idiom).

The big breakthrough was probably sqlports. I had written make show=VAR a few years back, because it was impossible to debug big ports Makefiles without it, but it took me a few years to realize I could turn that into a full introspection mechanism, and add "make dump-vars" to the ports tree.

New Framework

Suddenly, I got a motivation to rewrite dpb. Using dump-vars would allow me to get everything I needed in one pass (well, actually two or three passes, since some pkgpaths don't "exist" anywhere except in dependencies, so you need to compute a closure from the starting set), and I could possibly make things work better. Instead of handling pkgnames, dpb3 would handle pkgpaths (I still had dpb2 somewhere, so dpb3 it would be!

The next idea was to have a "generic" architecture for jobs, so that computing dependencies would be a job like other jobs. Fortunately, perl OO can do that, and since it also has functional language qualities, setting up a list of "jobs" that would execute arbitrary code as separate processes would be fairly easy (if you know the idiom... Nikolay's first comment to my new code was "I don't understand anything in your perl" :).

The goal was to avoid busy-waiting, and so I used continuation-style code: on a wait(), dpb would find the job entry corresponding to the dead process, and just call the finalization routine for that specific job. It's used in a very uniform way, and there's even a "watchdog": a simple clock job that dies every ten seconds, to ensure refresh each time nothing else happens.

There were a few details to complete. The initial "Job" thingy soon gave birth to "Core" so that I could tell on which machine a job would be running, and I could have some kind of affinity for several jobs/tasks. A "Job" would become a list of "Tasks" that would execute on the same Core where instead of "make package", a given build would be separated into "patch/configure/build/..." steps, to give better feedback.

I soon had something that was better than old dpb, at least in some aspects. Specifically, it would start building stuff right away, since the dependencies jobs was just another job. That was a major design goal. It would require maintaining dependencies "on the fly", and move pkgpaths from a "to build" list to a "queue" as soon as all required dependencies could be satisfied.

There's actually some rather subtle mix of ideas in there: we build pkgpaths, but several pkgpaths can produce the same package, in which case they ALL have the same pkgname, and so, to figure out whether a pkgpath is "done", we check whether the binary package already exists so we don't pointlessly rebuild things.

Locks And Hot Fixes

The next useful feature was to be locks. The ports tree has its own locking mechanism, but that turns out to be insufficient for dpb3. See, if the ports tree ends up rebuilding the same package, it would just sleep until the directory would be available. Apply that to dpb3, and you see several cores waiting for a directory to be available to build distinct flavors... instead of going elsewhere and doing useful work. So dpb3 got its own locks, allowing it to know when not to go somewhere, and to do something else instead.

This got me two interesting benefits:

the locks from dpb3 are in a central location. So you can start TWO dpb3 at the same time, and they won't step on each other's toes... they will correctly notice locks from each other, and restart the build when these vanish.
locks will stay behind in case of an error, but dpb3 actually doesn't care where the lock comes from: if it vanishes, it will put the pkgpath back in the queue of things that can be built.

This means that a human operator can do "hot fixes" in a live ports tree build: notice something didn't build, fix the problem by hand, remove the lock and have dpb3 pick things up... without needing to stop the build (assuming the fix didn't bump pkgnames or anything like that... this will be addressed at a future date).

Logfiles

One nice feature of dpb3 is its logging functionality. Since it does have all the pkgpaths and pkgnames at its disposal, it creates one actual logfile for a package build, and links all the alternate names to the same file, so it's really easy to find out build info for anything.

Based on that framework, there was lots of work to do. One major gripe with old dpb was the lack of feedback. Basically, you started old dpb, came back a few hours later... and if it didn't say anything, it would mean things were more or less fine. No idea how far it got, or anything. You would come back days later and notice a job got stuck for a few days.

dpb3 watches its logfiles, at least every ten seconds or so, and displays messages when things don't change. It also displays pids for everything that runs. So, if you see that a job is stuck into build, logfile unchanged for 3 hours, it's often time for manual intervention.

Timing Is Everything

dpb3 computes a full dependency tree, starts building ports as soon as it can, and keeps going until the end. However, it does not do so randomly (well... you can force it to be very random to weed out dependency issues, but that's not very efficient). If you start it from scratch, it doesn't have much to work from, it will just be as greedy as possible (maximize the number of ports that can be built so that it never gets starved, e.g., it will build the ports that unlock the largest number of ports that depend on them). All other things being equal, it will always build the "latest" scanned port first, so that, if you interrupt dpb3, and then start a new one, it will start to build ports very quickly.

With these heuristics, you see dpb3 figure out by itself that it really wants libtool, gperf, gettext gmake and all those autoconfs.

But the major breakthrough was build-directed heuristics. Simply put, if you don't do anything smart, you will always end up with all your cores twiddling their thumbs while the last machine is struggling through openoffice. Yes, it's THAT big. Measurements show that openoffice itself is THREE TIMES as large as the next port and don't forget it depends on such lightweights as java which take quite a bit of time themselves.

So I started an experiment: what if I were to use the timings from a previous build to inform the current build ?

It worked.

In fact, it worked even better than I expected. I could litterally see dpb3 rush through to build the full dependency tree of openoffice3 and similar weighties.

One surprising side-effect was that dpb3 would appear to slow-down: it would indeed build all the big, big ports first. So it would crawl, if you just looked at the number of packages, and all of a sudden, it would rush. The threshold is around 2000 packages. I've appended a picture of a full build log, where 2000 packages take 37 hours, and the remaining 4000 are done in under 3 hours!

One last timing issue was using heterogeneous networks. My current i386 farm is composed of:

1 dual-core @2.5GHz
1 dual-core @1.8GHz
1 single core @1.2GHz
1 single core @800MHz

obviously, if you start dpb3 on these without any precaution, you might very well end up waiting for days until the single core finishes openoffice.

Well, the current setup has specific heuristics to handle that. The complete algorithm is simply to sort the current build queue according to build times, and to partition it according to processor speed, so that slower machines don't get to build the big thingies. This would work handsomely... except that this algorithm is too expensive! I ended up having dpb3 consume 50% of the cpu on the fast machine, which wouldn't do.

Instead of exact partitioning, I put jobs into "bins", where each bin contains all jobs that take at most 4 times as much cpu as the next bin. The fastest cpu has acces to the whole queue (and chooses the best pkgpath as if it were alone). Every other cpu only gets access to a single bin, which is determined according to the current total weight of what's left to build.

This is still somewhat dynamic: if a cpu manages to empty its bin (more or less), the overall weight will change, and it will move to the next bin up (or down), but it consumes only 1% cpu (or so). There was quite a lot of fine tuning, but I end up having each machine building what's best for it, and I never get absurd builds, such as the slowest machine getting a really, really big thingy.

Final Details

There are still a few things to deal with. This is a bit of a work in progres. One major recent improvement was a check for common libraries: dpb3 will now compare hosts at startup, and refuse to incorporate machines in the network if they don't have the same shared libraries. There's still some work to do to have this be more dynamic such as adding a new core after start, or telling dpb3 that a library problem has been solved, but it works reasonably fine.

All in all, it's looking good for 4.8. I do hope we'll be able to completely retire old dpb.

In general new dpb3 yields much better feedback to developers: it requires minimal configuration, and starts up almost instantly, and it's very easy to set up. Also, it complains right away about most problems. When you run bulk builds on legacy platforms like the sparc, it makes a huge difference. You don't lose days on a bulk build that can take weeks.

dpb makes build farms more useful, so it might become realistic to put 3 or 4 armish in a network to reduce bulk build times of packages on a slow architecture. More buld builds means more tests, so better quality all around.

Sexy Graphics

Packages built actually show as three curves: Built, Installable, and Packages.

"Built" are packages that have been built, but miss some runtime dependencies (this value is kept close to zero).

"Installable" are packages that might be needed later in the build (i.e. building other ports may depend on them being installed).

"Packages" are packages that no longer have any dependencies left, and so, could even be deinstalled before the end of the build.

The graph clearly show the "Queue" of buildable packages going up very quickly at the start, so that dpb has a huge number of choices. It also shows clearly how dpb3 seems to be very slow at first, as it builds huge packages, then speeds up during the final hours, with nice exponential curves.
(click for larger image)

Thank you Marc for taking time away from all your great hacking to share your work on dbp.

(Comments are closed)

Comments

By Anonymous Coward (mlude) on 2010-06-18 10:25

Marc Espie is espie@, not marc@
Comments
1. By J.C. Roberts (jcr) on 2010-06-18 18:53 http://www.designtools.org
  
  > Marc Espie is espie@, not marc@
  
  Thanks. It seems one of the other editors fixed it for me, and truth be told, there were a handful of my mistakes that were fixed very quickly after it was posted. ;)
By jirib (jirib) jirib@mailinator.com on 2010-06-18 11:17

Very nice reading. Using managers' talk "dp3 is OpenBSD ports building cluster tool" :)
By Anonymous Coward (anon) on 2010-06-18 11:57

rather than copying the manpage, symlinking would be a better idea otherwise you'll end up with an outdated version.

ln -s /usr/ports/infrastructure/build/dpb3.1 /usr/share/man/man1/dpb3.1

Latest Articles

Sat, Jun 28
- 05:57 Game of Trees 0.115 released (0)
Tue, Jun 24
- 07:48 Game of Trees 0.114 released (0)
- 07:23 Call for testing: bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix (0)
Mon, Jun 16
- 08:22 j2k25 hackathon report from kn@: installer, low battery, and more (0)
Fri, Jun 13
- 11:18 dhcpd(8): use UDP sockets instead of BPF (1)
Thu, Jun 12
- 12:32 clang(1)/llvm/lld(1) updated to version 19 (0)
Wed, Jun 11
- 12:22 Source code sandboxing (0)
Tue, Jun 10
- 06:50 TearFree option backported to modesetting(4) driver (0)
Mon, Jun 09
- 07:32 FFS optimizations with dirhash, as blogged by rsadowski@ (1)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]