Contributed by deanna on from the that's-a-lot-of-j dept.
A 10-year-old pointer-arithmetic bug in make(1) is now gone, thanks to malloc.conf and some debugging. The bug prevented people from running make -j on OpenBSD for fear of it looping on one the CPUs forever, without actually building anything.
Personally, I've encountered this bug a few times after todd@ told me about the -j option at c2k7. I immediately noticed performance improvements in building kernel with `make -j4` — which was obvious from the fact that now both CPUs were at close to 0% idle time for the duration of the build. However, I've also noticed that shortly after being started with a big -j number, like 16 or 24, the make process would become mysteriously quiet and would start to consume 100% of one of the CPUs (without actually running any jobs) every time I ran it.
Independently (or maybe due to the influence) of the above make(1) bug, one day I've decided to give the malloc options a try, with running `ln -s 'AFGHJPRX' /etc/malloc.conf` to set it up. After setting these malloc options, I've noticed that make was crashing quite often when used with the -j option.
To make the long story short, I've discovered that the problem had to do with pointer arithmetic. A number of bytes was added as an offset for a pointer of type fd_set, which has a size of 128. I.e. the offset in the number of bytes was erroneously multiplied by a factor of 128, thus doing a memset on the unallocated piece of memory, leaving the allocated part uninitialised.
Afterward, millert@ also identified that the original code from 1997 seems to have a potential memory leak due to incorrect realloc usage, so we've fixed that problem too, and a complete patch was committed to 4.1-current
Happy hacking!
(Comments are closed)
By Anonymous Coward (193.63.217.208) on
Comments
By Anonymous Coward (72.11.69.215) on
Is this a bug in GNU Make? Will this be patched upstream?
Comments
By Brad (brad) on
>
> Is this a bug in GNU Make? Will this be patched upstream?
No. OpenBSD uses BSD Make.
By Anonymous Coward (24.37.242.64) on
By Anonymous Coward (88.64.133.128) on
I thought that the bug of "looping on one the CPUs forever, without actually building anything" I encountered on many ports I compiled with my distcc port were related to distcc!
Now I have to reevaluate and rebenchmark distcc for ports. I hope this works out and faster port building with many "crap" machines on my home network becomes reality.
Thank you very much, great work!
By Anonymous Coward (149.169.206.225) on
Comments
By sean (139.142.208.98) on
Gain, if any, would be less than double. Think about it for a bit. Performance does not scale linearly with each additional CPU.
Comments
By art (213.0.113.90) on
>
> Gain, if any, would be less than double. Think about it for a bit. Performance does not scale linearly with each additional CPU.
>
Actually, when you really think about it for a bit, the gain can sometimes be larger than double because compilations (depending on what's compiled of course) are often waiting for disk I/O.
Comments
By Anonymous Coward (139.142.208.98) on
I can see that though wouldn't not having SMP and using the -j option still see those benefits as the items waiting on io will be in biosleep and won't be running so the other make processes will chew the savings?
By cnst (cnst) on http://cnst.livejournal.com/24068.html#chart
The difference is substantial -- on Intel Core 2 Duo E4300, the time it takes to build OpenBSD's kernel with -B is 4:45, but with -j16 it's only 2:36, i.e. just under 3 minutes, and almost a two-fold decrease in elapsed time. See the link under my name above for some simple benchmarking that I've performed.
Comments
By Anonymous Coward (68.98.37.61) on
Thanks for the chart. Does this make even using say -j4 (since -j2 looks like it has issues) on a non-mp system useful or does it block so much it's not worth it? It's kinda scary that you can fork bomb that easy with make...
Comments
By Anonymous Coward (207.59.237.99) on
>
> Thanks for the chart. Does this make even using say -j4 (since -j2 looks like it has issues) on a non-mp system useful or does it block so much it's not worth it? It's kinda scary that you can fork bomb that easy with make...
Why not time kernel compiles w/differring values of -j against a standard build? It's not like it's beyond your abilities to do so...
By Anonymous Coward (198.175.14.5) on
> It's kinda scary that you can fork bomb that easy with make...
Not really. It doesn't affect the whole kernel, just your user. Up your login.conf limits or ulimit if that's a problem.
By Anonymous Coward (84.135.78.21) on
>
> The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
4:45? O_O
I've got a T5500 and it takes around 30 minutes to compile.
Comments
By mk (130.225.243.71) on
> >
> > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
>
> 4:45? O_O
> I've got a T5500 and it takes around 30 minutes to compile.
Slow disk? Broken interrupt routing? Very little memory? etc.
My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.
Comments
By Anonymous Coward (217.112.38.94) on
> > >
> > > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
> >
> > 4:45? O_O
> > I've got a T5500 and it takes around 30 minutes to compile.
>
> Slow disk? Broken interrupt routing? Very little memory? etc.
>
> My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.
maybe he is talking about the whole system not just the kernel?...
Comments
By raw (84.135.70.128) on
> > > >
> > > > The difference is substantial -- on Intel Core 2 Duo E4300, the time > it takes to build OpenBSD's kernel with -B is 4:45
> > >
> > > 4:45? O_O
> > > I've got a T5500 and it takes around 30 minutes to compile.
> >
> > Slow disk? Broken interrupt routing? Very little memory? etc.
> >
> > My X60 compiles a kernel in about 5:30 on GENERIC. I think it's about half of that using -j 12.
>
> maybe he is talking about the whole system not just the kernel?...
No, I'm talking about the _kernel_. This machine has 512MB of RAM. The disk is some hitachi sata and about broken interrupt routing, I don't know.
By scot bontrager (216.62.11.163) on
>
> The difference is substantial -- on Intel Core 2 Duo E4300, the time it takes to build OpenBSD's kernel with -B is 4:45, but with -j16 it's only 2:36, i.e. just under 3 minutes, and almost a two-fold decrease in elapsed time. See the link under my name above for some simple benchmarking that I've performed.
time make -B
289.080u 59.050s 5:43.48 101.3% 0+0k 208+2226io 416pf+0w
time make -j4
291.050u 66.180s 3:27.57 172.1% 0+0k 1632+5575io 7112pf+0w
amd64 2x 1.6ghz
By Anonymous Coward (88.64.159.34) on
By Steve Shockley (68.80.137.106) on