Contributed by sean on from the and-another-bug-bites-the-dust dept.
Some bugs are so ornery that they remain hidden for a very long time.
There are some that happen in such weird edge cases that they go unexposed and are very hard to repeat.
Marc Balmer (mbalmer@) investigated a bug that was exposed by Samba. This bug had to do with a corner case with respect to the life of a directory listing. It seems to be the case that iterating through a directory (as provided by seekdir()/readdir()) can return invalid results when the directory is modified (specifically when items are removed) during the list iteration.
In Marc's personal blog, he explains the bug (and the solution) in better detail and notes how this bug has existed for far longer than OpenBSD has been around. I'm personally convinced I've seen this issue on some really high traffic OpenBSD and FreeBSD file servers but being a relative Luddite, blamed it on Samba (of which I'm not a huge fan but tolerate out of necessity).
Congratulations Marc on finding and squashing this beast.
As well, the other postings on his blog are equally good reading.
(Comments are closed)
By Anonymous Coward (128.171.90.200) on
Comments
By Anonymous Coward (2001:4830:123a:beef:21e:8cff:fe6f:7ae3) on
Samba people knew about incorrect behavior for the past three years, but never bothered to find its causation nor notify BSD people to take a look at it.
Comments
By Marc Balmer (2001:8a8:1001:0:216:cbff:fea2:37c4) on
>
> Samba people knew about incorrect behavior for the past three years, but never bothered to find its causation nor notify BSD people to take a look at it.
This is not true. The knew about the problem _and_ they talked to BSD people. But apparently they were told they were using the API wrong, it was never meant that way, etc., etc.,...
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
The single Unix standard talks about that on some types of filesystems seekdir(3) cannot work:
"The original standard developers perceived that there were restrictions on the use of the seekdir() and telldir() functions related to implementation details, and for that reason these functions need not be supported on all POSIX-conforming systems. They are required on implementations supporting the XSI extension.
One of the perceived problems of implementation is that returning to a given point in a directory is quite difficult to describe formally, in spite of its intuitive appeal, when systems that use B-trees, hashing functions, or other similar mechanisms to order their directories are considered. The definition of seekdir() and telldir() does not specify whether, when using these interfaces, a given directory entry will be seen at all, or more than once."
This was used as an "excuse" to not further investigate. But ffs uses a linear directory stucture, and it even was the original filesystem seekdir(3) and friends were implemented on, so it was a jump to conclusions that was not justified.
Actually, once Marc had a test program that could reproduce the problem and while discussing things on icb, we both saw the bug almost simultaneously. The fix was quite straightforward, the challenge here was finding out what was going on.
--Otto
Comments
By clvrmnky (69.28.228.76) clvr,mnky.invlaid@gmail.com on
>
> The single Unix standard talks about that on some types of filesystems seekdir(3) cannot work:
>
[...]
> Actually, once Marc had a test program that could reproduce the problem and while discussing things on icb, we both saw the bug almost simultaneously. The fix was quite straightforward, the challenge here was finding out what was going on.
>
Which, you know, is a common pattern when maintaining software. You can only do so much, and someone needs to own the bug sometimes to really solve it. We can't blame the Samba folks if they solved it Good Enough from their POV.
It's rare to find a 25-yr old bug, but not totally uncommon. One the firts bugs I fixed was an old "we know about it but can't quite solve it, but have worked around it for years" problem. All it takes is a fresh look and afair amount of time to dig into it.
That last resources is a rare commodity sometimes.
By Anonymous Coward (63.237.125.100) on
OpenBSD Devs ROCK!!!
Fixing bugs in OpenBSD, NetBSD, FreeBSD, and even OSX.
Amazing.
Comments
By Anonymous Coward (128.171.90.200) on
>
> Fixing bugs in OpenBSD, NetBSD, FreeBSD, and even OSX.
and DragonFlyBSD
By Brynet (Brynet) on
>
> OpenBSD Devs ROCK!!!
>
> Fixing bugs in OpenBSD, NetBSD, FreeBSD, and even OSX.
>
> Amazing.
This bug fix seems to be fixed in most of the BSD derivatives now, but what about OS X? I wonder why it's not fixed over there yet.
Comments
By Anonymous Coward (67.240.141.201) on
> derivatives now, but what about OS X? I wonder why
> it's not fixed over there yet.
Apple and their fans are pretty amusing about confronting bugs, or even admitting their existence. For example, ask an iPhone user how he got custom apps on the phone, and he'll say he loaded a TIFF image in the browser. Ask him then if he feels comfortable with the fact that a webpage can run arbitrary code on his phone simply by including an image... And he won't see the problem.
By Anonymous Coward (64.81.40.211) on
I've been looking for an answer, it's not clear to me that this bug affects OSX. It seems to be dependent on the file system used? Or is it at the VFS layer? I'm not sure.
I suppose since OSX supports UFS they'd have to fix it there but what about HFS?
Comments
By Pierre Riteau (131.254.100.94) on
>
> I've been looking for an answer, it's not clear to me that this bug affects OSX. It seems to be dependent on the file system used? Or is it at the VFS layer? I'm not sure.
>
> I suppose since OSX supports UFS they'd have to fix it there but what about HFS?
Mac OS X doesn't support a readdir after an unlink happened:
http://docs.info.apple.com/article.html?artnum=107884
I guess it must be quite similar with a seekdir.
By Anonymous Coward (2001:4830:123a:dead:21f:3bff:fe03:b159) on
http://cvsweb.de.netbsd.org/cgi-bin/cvsweb.cgi/src/lib/libc/gen/readdir.c#rev1.24
http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/gen/readdir.c#rev1.15
http://www.dragonflybsd.org/cvsweb/src/lib/libc/gen/readdir.c#rev1.10
Comments
By D. Adam Karim (archite) on
Comments
By Darrin Chandler (dwc) on http://www.stilyagin.com/darrin/
Not sure what's not been ackknowleged, but I've seen plenty of "from NetBSD" or from FreeBSD, DragonFlyBSD and even Linux in commit messages. So there's certainly no hesitation to show where ideas or code came from.
Comments
By Anonymous Coward (2001:4830:123a:dead:21f:3bff:fe03:b159) on
>
> Not sure what's not been ackknowleged, but I've seen plenty of "from NetBSD" or from FreeBSD, DragonFlyBSD and even Linux in commit messages. So there's certainly no hesitation to show where ideas or code came from.
that's what I had in mind
http://undeadly.org/cgi?action=article&sid=20080407082616&pid=2
By Marc Balmer (2001:8a8:1001:0:216:cbff:fea2:37c4) on
>
> http://cvsweb.de.netbsd.org/cgi-bin/cvsweb.cgi/src/lib/libc/gen/readdir.c#rev1.24
> http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/gen/readdir.c#rev1.15
> http://www.dragonflybsd.org/cvsweb/src/lib/libc/gen/readdir.c#rev1.10
That was only an oversight. Tonnerre replied to his own committ message shortly after:
http://marc.info/?l=netbsd-source-changes&m=120992872614538&w=2
By Anonymous Coward (216.68.193.94) on
Subtle, sophisticated, and Simple. Beautiful.
By Matthieu Herrb (2001:660:6602:0:20a:e4ff:fe26:de06) on
23092 mimedefang CALL getdirentries(0xf,0x4f140000,0x4000,0x4bb61fa8)
23092 mimedefang RET getdirentries 512/0x200
23092 mimedefang CALL lstat(0x47beee80,0x47becd10)
23092 mimedefang NAMI "/var/spool/MIMEDefang/mdefang-m499CpMc029723/HEADERS"
23092 mimedefang RET lstat 0
23092 mimedefang CALL unlink(0x47beee80)
23092 mimedefang NAMI "/var/spool/MIMEDefang/mdefang-m499CpMc029723/HEADERS"
23092 mimedefang RET unlink 0
23092 mimedefang CALL lstat(0x47beee80,0x47becd10)
23092 mimedefang NAMI "/var/spool/MIMEDefang/mdefang-m499CpMc029723/COMMANDS"
23092 mimedefang RET lstat 0
23092 mimedefang CALL unlink(0x47beee80)
23092 mimedefang NAMI "/var/spool/MIMEDefang/mdefang-m499CpMc029723/COMMANDS"
23092 mimedefang RET unlink 0
23092 mimedefang CALL getdirentries(0xf,0x4f140000,0x4000,0x4bb61fa8)
23092 mimedefang RET getdirentries 0
23092 mimedefang CALL lstat(0x47beee80,0x47becd10)
23092 mimedefang NAMI "/var/spool/MIMEDefang/mdefang-m499CpMc029723/COMMANDS"
23092 mimedefang RET lstat -1 errno 2 No such file or directory
Thanks to all of you.