OpenBSD Journal

Developer blog: cnst@: fixing make

Contributed by deanna on from the that's-a-lot-of-j dept.

Constantine A. Murenin (cnst@) writes:

A 10-year-old pointer-arithmetic bug in make(1) is now gone, thanks to malloc.conf and some debugging. The bug prevented people from running make -j on OpenBSD for fear of it looping on one the CPUs forever, without actually building anything.

Personally, I've encountered this bug a few times after todd@ told me about the -j option at c2k7. I immediately noticed performance improvements in building kernel with `make -j4` — which was obvious from the fact that now both CPUs were at close to 0% idle time for the duration of the build. However, I've also noticed that shortly after being started with a big -j number, like 16 or 24, the make process would become mysteriously quiet and would start to consume 100% of one of the CPUs (without actually running any jobs) every time I ran it.

Independently (or maybe due to the influence) of the above make(1) bug, one day I've decided to give the malloc options a try, with running `ln -s 'AFGHJPRX' /etc/malloc.conf` to set it up. After setting these malloc options, I've noticed that make was crashing quite often when used with the -j option.

To make the long story short, I've discovered that the problem had to do with pointer arithmetic. A number of bytes was added as an offset for a pointer of type fd_set, which has a size of 128. I.e. the offset in the number of bytes was erroneously multiplied by a factor of 128, thus doing a memset on the unallocated piece of memory, leaving the allocated part uninitialised.

Afterward, millert@ also identified that the original code from 1997 seems to have a potential memory leak due to incorrect realloc usage, so we've fixed that problem too, and a complete patch was committed to 4.1-current

Happy hacking!

(Comments are closed)


Comments
  1. By Anonymous Coward (193.63.217.208) on

    Excellent, thanks. Will this be back-ported to -stable and 4.0 as a reliability fix?

    Comments
    1. By Anonymous Coward (72.11.69.215) on

      > Excellent, thanks. Will this be back-ported to -stable and 4.0 as a reliability fix?

      Is this a bug in GNU Make? Will this be patched upstream?

      Comments
      1. By Brad (brad) on

        > > Excellent, thanks. Will this be back-ported to -stable and 4.0 as a reliability fix?
        >
        > Is this a bug in GNU Make? Will this be patched upstream?

        No. OpenBSD uses BSD Make.

  2. By Anonymous Coward (24.37.242.64) on

    Awesome! Thank you!

  3. By Anonymous Coward (88.64.133.128) on

    Damn it!

    I thought that the bug of "looping on one the CPUs forever, without actually building anything" I encountered on many ports I compiled with my distcc port were related to distcc!

    Now I have to reevaluate and rebenchmark distcc for ports. I hope this works out and faster port building with many "crap" machines on my home network becomes reality.

    Thank you very much, great work!

  4. By Anonymous Coward (149.169.206.225) on

    How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?

    Comments
    1. By sean (139.142.208.98) on

      > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?

      Gain, if any, would be less than double. Think about it for a bit. Performance does not scale linearly with each additional CPU.

      Comments
      1. By art (213.0.113.90) on

        > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
        >
        > Gain, if any, would be less than double. Think about it for a bit. Performance does not scale linearly with each additional CPU.
        >

        Actually, when you really think about it for a bit, the gain can sometimes be larger than double because compilations (depending on what's compiled of course) are often waiting for disk I/O.

        Comments
        1. By Anonymous Coward (139.142.208.98) on

          > Actually, when you really think about it for a bit, the gain can sometimes be larger than double because compilations (depending on what's compiled of course) are often waiting for disk I/O.

          I can see that though wouldn't not having SMP and using the -j option still see those benefits as the items waiting on io will be in biosleep and won't be running so the other make processes will chew the savings?


    2. By cnst (cnst) on http://cnst.livejournal.com/24068.html#chart

      > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?

      The difference is substantial -- on Intel Core 2 Duo E4300, the time it takes to build OpenBSD's kernel with -B is 4:45, but with -j16 it's only 2:36, i.e. just under 3 minutes, and almost a two-fold decrease in elapsed time. See the link under my name above for some simple benchmarking that I've performed.

      Comments
      1. By Anonymous Coward (68.98.37.61) on

        > See the link under my name above for some simple benchmarking that I've performed.

        Thanks for the chart. Does this make even using say -j4 (since -j2 looks like it has issues) on a non-mp system useful or does it block so much it's not worth it? It's kinda scary that you can fork bomb that easy with make...

        Comments
        1. By Anonymous Coward (207.59.237.99) on

          > > See the link under my name above for some simple benchmarking that I've performed.
          >
          > Thanks for the chart. Does this make even using say -j4 (since -j2 looks like it has issues) on a non-mp system useful or does it block so much it's not worth it? It's kinda scary that you can fork bomb that easy with make...

          Why not time kernel compiles w/differring values of -j against a standard build? It's not like it's beyond your abilities to do so...

        2. By Anonymous Coward (198.175.14.5) on

          >
          > It's kinda scary that you can fork bomb that easy with make...

          Not really. It doesn't affect the whole kernel, just your user. Up your login.conf limits or ulimit if that's a problem.

      2. By Anonymous Coward (84.135.78.21) on

        > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
        >
        > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45

        4:45? O_O
        I've got a T5500 and it takes around 30 minutes to compile.

        Comments
        1. By mk (130.225.243.71) on

          > > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
          > >
          > > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
          >
          > 4:45? O_O
          > I've got a T5500 and it takes around 30 minutes to compile.

          Slow disk? Broken interrupt routing? Very little memory? etc.

          My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.

          Comments
          1. By Anonymous Coward (217.112.38.94) on

            > > > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
            > > >
            > > > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
            > >
            > > 4:45? O_O
            > > I've got a T5500 and it takes around 30 minutes to compile.
            >
            > Slow disk? Broken interrupt routing? Very little memory? etc.
            >
            > My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.

            maybe he is talking about the whole system not just the kernel?...

            Comments
            1. By raw (84.135.70.128) on

              > > > > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
              > > > >
              > > > > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
              > > >
              > > > 4:45? O_O
              > > > I've got a T5500 and it takes around 30 minutes to compile.
              > >
              > > Slow disk? Broken interrupt routing? Very little memory? etc.
              > >
              > > My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.
              >
              > maybe he is talking about the whole system not just the kernel?...

              No, I'm talking about the _kernel_. This machine has 512MB of RAM. The disk is some hitachi sata and about broken interrupt routing, I don't know.

      3. By scot bontrager (216.62.11.163) on

        > > How much of a speed increase is this? Are we talking like a double in compile speed because you can multithread?
        >
        > The difference is substantial -- on Intel Core 2 Duo E4300, the time it takes to build OpenBSD's kernel with -B is 4:45, but with -j16 it's only 2:36, i.e. just under 3 minutes, and almost a two-fold decrease in elapsed time. See the link under my name above for some simple benchmarking that I've performed.

        time make -B
        289.080u 59.050s 5:43.48 101.3% 0+0k 208+2226io 416pf+0w

        time make -j4
        291.050u 66.180s 3:27.57 172.1% 0+0k 1632+5575io 7112pf+0w

        amd64 2x 1.6ghz

  5. By Anonymous Coward (88.64.159.34) on

    Now the only thing left to do is making the system build make -j safe, but I guess this would be a huge amount of work.

  6. By Steve Shockley (68.80.137.106) on

    malloc.conf kicks ass!

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]