Contributed by deanna on Tue Jun 19 10:42:41 2007 (GMT)
from the that's-a-lot-of-j dept.
Constantine A. Murenin (cnst@) writes:
A 10-year-old pointer-arithmetic bug in make(1) is now gone, thanks to malloc.conf and some debugging. The bug prevented people from running make -j on OpenBSD for fear of it looping on one the CPUs forever, without actually building anything.
Personally, I've encountered this bug a few times after todd@ told me about the -j option at c2k7. I immediately noticed performance improvements in building kernel with `make -j4` — which was obvious from the fact that now both CPUs were at close to 0% idle time for the duration of the build. However, I've also noticed that shortly after being started with a big -j number, like 16 or 24, the make process would become
mysteriously quiet and would start to consume 100% of one of the CPUs (without actually running any jobs) every time I ran it.
Independently (or maybe due to the influence) of the above make(1) bug, one day I've decided to give the malloc options a try, with running `ln -s 'AFGHJPRX' /etc/malloc.conf` to set it up. After setting these malloc options, I've noticed that make was crashing quite often when used with the -j option.
To make the long story short, I've discovered that the problem had to do with pointer arithmetic. A number
of bytes was added as an offset for a pointer of type fd_set, which has a size of 128. I.e. the offset in the number of bytes was erroneously multiplied by a factor of 128, thus doing a memset on the unallocated piece
of memory, leaving the allocated part uninitialised.
Afterward, millert@ also identified that the original code from 1997 seems to
have a potential memory leak due to incorrect realloc usage, so we've fixed that problem too, and a complete patch was committed to 4.1-current