OpenBSD Journal

Developer blog - otto@: New Otto malloc helps spot ancient bugs

Contributed by johan on from the new-code-kills-old-bugs dept.

Otto Moerbeek (otto@) recently found and fixed an ancient bug (some 33 years old) in yacc(1). Here is his story:

As some of you know I have been working on a new implementation of malloc, the general purpose memory allocator that's used by userland programs. See this link for more details about it.

My malloc has been tested by many and has been in snapshots for a while now. Some time ago I received a bit of a puzzling report from Nikolay Sturm (sturm@) that on sparc64 compiling big C++ projects would sometimes fail with an Internal Compiler Error (ICE). For a start, it was not clear if my new malloc was involved, but I setup my sparc64 machine for testing. It soon turned out I could reproduce the problem. Switching to the in-tree malloc made the problem go away. So I was facing a malloc bug or a gcc bug, I thought.

Otto continues below...

My new malloc has some features not found in the current malloc: it moves allocations smaller than a page but bigger than half a page to the end of the page, to have a greater chance of catching buffer overflows. Even without guard pages, this works most of the time, since we have a randomized mmap(2): there's a pretty big chance that there is a hole after a page allocated. Accessing that hole leads to a segmentation fault.

So I put my malloc back in, and started investigating. The first thing I tried was to switch off the code that moves small allocations: and yes, that made the bug go away. Using an unstripped version of the compiler, I could interpret the backtrace in gdb, and it soon turned out that the compiler was dying in yyparse(). yyparse() is the main function generated by the parser generator yacc(1). After compiling obj/cp/parse.c with debug options and some fighting with wrong line numbers reported by gdb, I saw the offending statement was the second statement below;

yym = yylen[yyn];
yyval = yyvsp[1-yym];
It turned out yym was 0, so yyvsp[1] was accessed: one stack entry above the current stack pointer.

Yacc has an implicit action that does:

$$ = $1;
before the action of a rule is done. $$ refers to the result of the rule, and $1 refers to the first item on the right hand side of the parse rule, yym says how many right hand side values of the current rule there are on the stack. The two statement above implement the implicit action.

But if the actions are part of a rule that has no right hand side:

A: /* empty */ { foo(); };
the implicit action will access via $1 a value beyond the top of stack: yym will be 0 in that case. Normally, that isn't a big problem: you're just accessing garbage, and assigning it to $$.

But if the stack is at maximum size, this will overflow if an entry on the stack is larger than the 16 bytes leeway my malloc allows. In the case of of C++ it is 24 bytes, so a SEGV occured.

Funny thing is that I traced this back to Sixth Edition UNIX, released in 1975. A very similar problem can be spotted here. The lines:

yypv =- yyr2[n];
yyval=yypv[1];
access an item above the stack pointer. If yyr2[n] is zero, this is a potential access outside the stack. Note the archaic use of =-, we write -= these days.

The bug was only triggered on sparc64, since it uses 8k pages. The default yacc stack size for C++ is 24 * 200 = 4800 bytes, which is larger than a page on most platforms. In this case malloc returns a page aligned object, no moving of the allocation inside a page occurs.

I'm really happy this turned out to be a yacc bug, and not a malloc bug. It's actually a malloc feature in action. ;-)

The commit that fixes this can be found here.

-Otto

(Comments are closed)


Comments
  1. By Anonymous Coward (169.244.143.114) on

    I'm happy that the openbsd developers are not willing to "leave well enough alone." I can imagine that there is a lot of pressure to work on cool desktop features, but the project does an admirable job of staying focused on the fundamentals and not getting too distracted.

    Comments
    1. By Anonymous Coward (91.21.74.158) on

      > I'm happy that the openbsd developers are not willing to "leave well enough alone." I can imagine that there is a lot of pressure to work on cool desktop features, but the project does an admirable job of staying focused on the fundamentals and not getting too distracted.

      Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.

      Comments
      1. By landry (82.245.140.102) on

        > > I'm happy that the openbsd developers are not willing to "leave well enough alone." I can imagine that there is a lot of pressure to work on cool desktop features, but the project does an admirable job of staying focused on the fundamentals and not getting too distracted.
        >
        > Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.

        And ? Is it opposed to have cool desktop features ? Come on..

      2. By phessler (phessler) on first undead, then not, then undead again.

        > > I'm happy that the openbsd developers are not willing to "leave well enough alone." I can imagine that there is a lot of pressure to work on cool desktop features, but the project does an admirable job of staying focused on the fundamentals and not getting too distracted.
        >
        > Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
        >
        >

        the trolls are cute today.

      3. By Anonymous Coward (128.171.90.200) on

        I think there are plenty of people who expect the OpenBSD devs to want to compete with the likes of Linux and FreeBSD etc. for desktop dominance, but the OpenBSD devs stick to their guns and make changes which show up bugs and hurt a few users now, which if you are playing the popularity game you cannot do, but in the long run means a better written OS because it does what it's supposed to do.

      4. By Cleber (189.23.20.10) cleber@bsd.com.br on

        > > I'm happy that the openbsd developers are not willing to "leave well enough alone." I can imagine that there is a lot of pressure to work on cool desktop features, but the project does an admirable job of staying focused on the fundamentals and not getting too distracted.
        >
        > Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
        >
        >

        My desktop is OpenBSD, and I love it. IMHO, why not?

    2. By Otto Moerbeek (otto) on http://www.drijf.net

      > I can imagine that there is a lot of pressure to work on cool desktop features.

      I never felt any pressure like that. I just work on things that I find interesting or things that I need myself.

  2. By yacc newb (66.207.218.21) on

    "Normally, that isn't a big problem: you're just accessing garbage, and assigning it to $$."

    Hmm, wouldn't that lead to garbage in the syntax tree and, if so, wouldn't that cause big problems?

    Comments
    1. By Otto Moerbeek (otto) on http://www.drijf.net

      "Normally, that isn't a big problem: you're just accessing garbage, and assigning it to $$." Hmm, wouldn't that lead to garbage in the syntax tree and, if so, wouldn't that cause big problems?

      It does not lead to seg faults, at least.

      In a typical yacc grammar, the actions assign a value to $$, overriding the default action:

      You could see something like:

      A : /* empty */ { $$ = NULL; };
      
      The following would indeed be problematic if the non-terminal A would be used as a value in other rules:
      A: /* empty */;
      
      because it would assign junk to A. Since the rule has no action, only the default action is left, which is nonsensical for a rule having no right hand side. But values are not always passed up, the side effects of action could just do the work. So you are right, it could lead to problem, but it does not have to do that.

  3. By Ted Walther (70.71.224.2) on http://reactor-core.org

    So, when will you take away the 16 bytes of leeway? Be interesting to see what bugs pop up then.

    Comments
    1. By Otto Moerbeek (otto) on http://www.drijf.net

      > So, when will you take away the 16 bytes of leeway? Be interesting to see what bugs pop up then.
      >

      Well, I suspect too much programs will stop working. A simple off by one is so common.

      Comments
      1. By Anonymous Coward (78.20.243.221) on

        > Well, I suspect too much programs will stop working. A simple off by one is so common.

        How about some sort of setting that toggles this behaviour? If you can find bugs in 30 year old codes with a reasonably large safety-net, then I'm going to go guess that other interesting bugs might come bubbling to the surface. Or am I missing something painfully obvious?

        Comments
        1. By Anonymous Coward (219.90.224.124) on

          Agreed. Couldn't something like that be enabled in snaps for a release cycle or three to help fix off by one errors? Too brutal?

          At the very least, it would be nice to be able to pick this sort of thing up in one's own code.

          Comments
          1. By Anonymous Coward (74.14.137.59) on

            > Agreed. Couldn't something like that be enabled in snaps for a release cycle or three to help fix off by one errors? Too brutal?
            >
            > At the very least, it would be nice to be able to pick this sort of thing up in one's own code.

            Theo no like buttons, Theo hate buttons! Rar! Theo smash!

            Comments
            1. By Anonymous Tasmanian (150.101.108.45) on

              > Theo no like buttons, Theo hate buttons! Rar! Theo smash!

              Theo no like buffer overflows...

            2. By Philipp (89.49.66.123) on

              > Theo no like buttons, Theo hate buttons! Rar! Theo smash!

              awesome :)

              otto: GREAT. Like one already said: brilliant

      2. By J.C. Roberts (jcr) on http:www.designtools.org

        > > So, when will you take away the 16 bytes of leeway? Be interesting to see what bugs pop up then.
        > >
        >
        > Well, I suspect too much programs will stop working. A simple off by one is so common.

        Hi Otto,

        It's frightening to think how many programs would neeed to be fixed, so I can see how it would result in a ton of work if we suddenly break everything with an off by one error, but I am failing to see why breaking them is a bad thing?

        Is the goal to initially just get your new malloc in the tree and well tested with similar bug-for-bug compatibility to the old malloc? (i.e. while leaving the option open to change the behavior later on so it can eventually be used to catch/break all the off-by-one errors?)

        Or do you think we're stuck with the 16 byte leeway (and off-by-ones) for the long term?


        Kudos on the fantastic work!

        Thanks,
        jcr

        Comments
        1. By Otto Moerbeek (otto) on http://www.drijf.net


          > Or do you think we're stuck with the 16 byte leeway (and off-by-ones) for the long term?


          The problem is that if I make things too strict, so much things wil stop working that you end up with a non-functional system. A general goal is to have as much protection enabled by default. Making it too strict will just be counterproductive.

          I have some more plans for the future and a resizable leeway is one of them, but for now the 16 bytes leeway is good.

  4. By Andrés Delfino (200.80.164.16) adelfino@gmail.com on

    That's the reason I use OpenBSD: evolution, rather than revolution.

  5. By Charley Corday (75.138.170.36) ccorday@mailinator.com on

    Johan, I loved reading this. I have my own malloc just because I am very DIY, but I never thought of the cool idea to align to the end of a page that is floating in address space with nothing after it and let the memory manager catch out of range errors at zero cost. That is brilliant and I bow in your general direction.

    For finding a serious bug dating back to 1975 that is used in scores of projects world wide to this day, you earn the respect of the community as one serious bad ass developer. Kudos to you.

    Comments
    1. By Charley Corday (75.138.170.36) ccorday@mailinator.com on

      correction: Otto found the bug and Johan reported it.

    2. By Janne Johansson (jj) on Old amiga fart.

      > Johan [sic], I loved reading this. I have my own malloc just because I am very DIY, but I never thought of the cool idea to align to the end of a page that is floating in address space with nothing after it and let the memory manager catch out of range errors at zero cost. That is brilliant and I bow in your general direction.

      Tedu@ did the same for pointer-sized allocations somewhere around this
      commit:

      http://marc.info/?l=openbsd-cvs&m=111811962102430&w=2

      That (supposedly ;) catches all mallocs that mistakenly uses sizeof(*foo) instead of sizeof(foo) in the code.

      "man malloc.conf" on the J option.

      Comments
      1. By Janne Johansson (jj) on .


        > "man malloc.conf" on the J option.
        >

        Bah, I meant "P"

      2. By Matthew Dempsky (2001:470:805a:1:21b:63ff:feca:36df) on

        > That (supposedly ;) catches all mallocs that mistakenly uses sizeof(*foo) instead of sizeof(foo) in the code.

        Other way around.

  6. By Hudson (74.58.67.99) hudson@videotron.ca on

    There's a bug in your explanation.

    Note the archaic use of =-, we write -= these days.

    =- is not the same thing as -=.

    The first one discards the original term on the left, replacing it with the negative of the term on the right.

    The second one decrements the left term by the right term.

    See here.

    Not enough coffee? ;-)

    Comments
    1. By Anonymous Coward (60.241.132.54) on


      > =- is not the same thing as -=.

      It is when you're talking about old versions of C, where op= was written as =op.

    2. By Damien Miller (djm) on http://www.mindrot.org/~djm/

      > There's a bug in your explanation.

      Don't argue with Otto. He sorts lists in O(1).

      Comments
      1. By J.C. Roberts (jcr) on http:www.designtools.org

        > > There's a bug in your explanation.
        >
        > Don't argue with Otto. He sorts lists in O(1).
        >

        Please add that statement to fortune.

      2. By Miod Vallat (miod) on

        > > There's a bug in your explanation.
        >
        > Don't argue with Otto. He sorts lists in O(1).
        >

        And in reverse order, too.

        Comments
        1. By Anonymous Coward (128.171.90.200) on

          You can do it, Otto! You can do it, Otto!
          Help each other out that'll be our motto!
          Make this patch I'll give you free gelato!
          Then back to my place, where I will get you blotto!
          Domo Arigato, Mr. Roboto


      3. By Anonymous Coward (24.181.247.3) on

        > > There's a bug in your explanation.
        >
        > Don't argue with Otto. He sorts lists in O(1).
        >

        This definitely needs to go into fortune. It's a part of Unix history.

    3. By Anonymous Coward (82.130.37.84) on

      > There's a bug in your explanation.
      >
      > Note the archaic use of =-, we write -= these days.
      >
      > =- is not the same thing as -=.

      This change in syntax was introduced in K&R (the first edition of "The C Programming Language", you know), published 1978. Pinpointing the original public release of the compiler that introduced new syntax, or deprecated the old is harder - but it most likely occured in UNIX V6, V7 or the Progammers Workbench 1.0 released between them.

      Anyway, =- *was* valid back in UNIX V6 yacc.

    4. By Marc Espie (213.41.185.88) espie@openbsd.org on

      > There's a bug in your explanation.
      >
      > Note the archaic use of =-, we write -= these days.
      >
      > =- is not the same thing as -=.
      >
      > The first one discards the original term on the left, replacing it with the negative of the term on the right.
      >
      > The second one decrements the left term by the right term.

      No, you're just too young to know better.

      Very, very old versions of C did have =- with those archaic semantics.

    5. By Anonymous Coward (71.255.98.13) on

      > There's a bug in your explanation.
      >
      > Note the archaic use of =-, we write -= these days.
      >
      > =- is not the same thing as -=.
      >

      Other's have already jumped on you about this, so I'll just point out that, in the old days, decrement-and-assign and add-and-assign were the only operators to require spaces on either side, since otherwise they would be interpreted as assign-the-opposite-of and assign, respectively. Enough people made the error that you are currently being chewed out for that people finally decided to do something about it. By the time of ANSI C, -= had become the standard way, and the new rules prohibited doing it the old, stupid way.

      I'm sure there's a gcc option to make it be dumb again if you want to try it out. Try --dumb or --like-everything-else-gnu-does.

  7. By Anonymous Coward (129.132.27.178) on

    Thanks for sharing this with us.

    It is one thing to see OpenBSD improve constantly, but that would be much more boring if it would all happen "under the hood".

    I appreciate the time you took to write down this story.

    Comments
    1. By asemisldkfj (71.174.176.28) on http://www.thehomerow.net

      > Thanks for sharing this with us.
      >
      > It is one thing to see OpenBSD improve constantly, but that would be much more boring if it would all happen "under the hood".
      >
      > I appreciate the time you took to write down this story.

      Ditto! I enjoy reading narratives like this.

  8. By David Librik (99.147.4.64) librik@panix.com on

    This is really impressive, Otto -- just think of how many people have looked at the parser skeleton without spotting the problem.

    But your patch only corrects the first bug you found. Did you also have a fix for the second one, which you describe as "a very similar problem"? That would be the lines:
    yypv -= yyr2[yyn];
    yyval = yypv[1];
    I'm looking through the parser skeleton of a version of Yacc which is still in use (Abraxas PCYACC) and there they are! So the bug didn't die with V6 UNIX.

    Comments
    1. By Miod Vallat (miod) on

      > This is really impressive, Otto -- just think of how many people have looked at the parser skeleton without spotting the problem.
      >
      > But your patch only corrects the first bug you found. Did you also have a fix for the second one, which you describe as "a very similar problem"? That would be the lines:
      > yypv -= yyr2[yyn];
      > yyval = yypv[1];
      > I'm looking through the parser skeleton of a version of Yacc which is still in use (Abraxas PCYACC) and there they are! So the bug didn't die with V6 UNIX.
      >

      You did not understand him correctly. v6 has the bug you are quoting, then over time yacc was modified and the bug took a different form, which Otto describes earlier in his message. So whichever yacc version you have, it has (well, might have) either form of the bug, but not both.

    2. By Otto Moerbeek (otto) on http://www.drijf.net

      Like Miod says, the "second bug" is actually an earlier manifestation of the same bug.
      Fixing code in versions of yacc outsode of OpenBSD is left as an exercise to the reader.

  9. By TylerEss (69.42.248.90) on

    So about the too-much-leeway vs breaking-too-many-things balance: Why not make it a fixed leeway that gradually decreases as releases come out?

    Start with the current amount, and wait until people stop complaining. Then, reduce the leeway by 25% and see who complains... repeat this process and after a few years you might be killing people's off-by-ones en masse...

  10. By Anonymous Coward (24.181.247.3) on

    Friggin' awesome!

  11. By speedplane (117.47.226.71) mes65@cornell.edu on

    Its amazing that this wasn't this caught by memory checking tools years ago. Memory checking tools could spot this bug and similar ones in an instant.

    Comments
    1. By Alan Chantler (87.115.194.161) alan@chantler.net on

      I have been following this thread with some interest. I got my first UNIX license back in 1975 and used to run Edition VI on some pretty tight hardware.

      In addition to congratulating Otto on his detective work, and on his willingness to publish the outcome (in true UNIX style), I think it worth saying something in defense of Edition VI.

      Way back, when Ken Thompson and Dennis Ritchie produced their OS, it was done from necessity rather than due to commercial pressure. I expect that you all know all this anyway but I have always been impressed by the fact that the whole UNIX universe (interesting use of words given the original cause of the need that led to the OS in the first place) is based upon sharing and peer-review.

      There was once available (by Lyons but no longer available - I rather think it was suppressed by AT&T Bell Labs) an annotated version of the kernel code for Edition VI. It attempted to explain the fundamental thinking behind some of the code but I well remember that there was a line of code (written, of course, in 'C' [which itself followed 'B'}) which included the comment "/* You're not supposed to understand how this works!*/".

      So let us not forget that, remarkable though Otto's discovery of this so-called 'ancient bug' may be, to my mind what is even MORE remarkable is the fact that there have, in the past nearly forty years, been so FEW such 'bugs' discovered!

      Well Done Otto! Keep up the good work!

      Comments
      1. By Pierre Riteau (131.254.100.94) on Pierre Riteau

        > There was once available (by Lyons but no longer available - I rather think it was suppressed by AT&T Bell Labs) an annotated version of the kernel code for Edition VI. It attempted to explain the fundamental thinking behind some of the code but I well remember that there was a line of code (written, of course, in 'C' [which itself followed 'B'}) which included the comment "/* You're not supposed to understand how this works!*/".

        http://cm.bell-labs.com/who/dmr/odd.html

        Comments
        1. By Anonymous Coward (122.49.172.127) on

          http://web.archive.org/web/20040311200823/http://cm.bell-labs.com/who/dmr/odd.html

      2. By Anonymous Coward (122.49.172.127) on

        > So let us not forget that, remarkable though Otto's discovery of this so-called 'ancient bug' may be, to my mind what is even MORE remarkable is the fact that there have, in the past nearly forty years, been so FEW such 'bugs' discovered!

        Maybe, but how much of the original code is still in the wild?

  12. By Anonymous Coward (77.239.192.101) on

    A: /* empty */ { foo(); };

    It's not a bug, it's you who've messed things up.

    You've used an empty token but added action not knowing that the rule is always assumed to return a value. It's explicitly stated in docs. An empty token has no value at all, so you have to return it in the action.

    The same thing happens when your write the rule

    A: B {..} C {..};


    Yacc actually converts the above statement internally to

    TEMP: /* empty */ {..};

    A: B TEMP C {..};


    And yes, the action in the middle have to assign something to the $$. That's the rule.


    Comments
    1. By Otto Moerbeek (otto) on http://www.drijf.net

      You seem to have missed the point. It is the implicit $$ = $1 action that is casuing trouble. This action is always done, even if you assign to $$ yourself.

  13. By Otto Moerbeek (otto) otto@drijf.net on http://www.drijf.net

    I'll reply to some of the things I've seen on various forums here.

    A lot of posters seem to have missed the point that the bug is in the code generated by yacc. That code can end up anywhere, so the potential impact of this bug is hard to tell. I suspect it is low: it's a read buffer overflow after all.

    The bug I fixed is in the Berkely version of yacc. The original 1975 bug is very similar. While Berkely yacc is a complete rewrite, it shows that some bugs really know how to survive. Contrary to some headlines I saw, I did not actually fix the 1975 bug. I fixed the Berkeley yacc bug and showed it has very old roots.

    The actual problem is the read access of $1. The code adjust the stack pointer using the length ot the right hand side of the rule being reduced. After that, $1 is 1 up the stack pointer. The implict $$ = $1 rule is executed before any processing of user written actions, so if the stack is at it maximum allocted size, the out of bounds read access occurs.
    The stack I'm talking about is of course the yacc parser runtime stack, and not the system stack.
    In theory the bug could have been found by using memory access debuggers. But it has not. This is probably due to the fact that it is pretty rare: you have to compile a program that uses a certain grammar rule when the yacc stack is at its maximum size. IMO, this all shows the value of te various techniques we use: if possible, we enable them by default, so we have a better chance to catch rare errors.

    About making malloc stricter: now that my new malloc is committed, I will start working on some more improvements. Adjustable end of allocation alignment is one of them. BTW, currently the feature is completely switched off, and you can enable it by using the P malloc option.



Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]