Contributed by jj on Fri Jun 19 09:34:46 2015 (GMT)
from the bug-ate-my-manpage dept.
Ingo Schwarze (schwarze@) writes in with an analysis of the issues found by afl in mandoc:
After realizing that I have nine topics for my BSDCan talk and that I can't cover them all in the depth they deserve, here are a few more details about afl and mandoc than I can't cover in the talk. Not a spoiler, there is still plenty of material for the talk!
During the half year from November 2014 to May 2015, Jonathan Gray
(jsg@) repeatedly ran the American fuzzy lop
security-oriented fuzzer program
to scrutinize the mandoc documentation parser and formatter program
for bugs causing crashes and hangs. Most
of these runs took multiple days of around-the-clock operation on a
modern amd64 computer. All bugs found were quickly investigated
A grand total of 45 issues was found. Fortunately, this number is
not large enough to support a full-blown statistical analysis, and
anyway, it is unknown whether, and in which ways, the mandoc codebase
may or may not be representative for application software in general.
But unfortunately, the number of issues is much larger than i would
have anticipated, so large that looking at it from a quantitative
angle does make some sense, even if the results are not going to be
representative (for systematical reasons) nor precise (for statistical
reasons). But at the very least, they may inspire some hypotheses
and remind us of some well-known recommendations and best practices,
so i do consider the exercise worthwhile.
2. Analysis of root causes
In a first step, let's classify the root causes of the
issues found by afl, proceeding from lower-frequency to
2.1. 1 case out of 45: abuse of assert() for input validation
It is really needless to say that you shouldn't do that.
What is interesting here is that
Yet an error of this type still persisted.
- mandoc was in general written with security in mind
- by people not completely inexperienced or unskilled
- and by the time of the afl audit, the code had already
matured for half a decade
Admittedly, it is not always possible to make all the details perfect
right away when starting a new project from scratch. But be aware
that every stupid shortcut is *very* dangerous, it may persist for
a long time even in an environment like OpenBSD that tends to be
unusually hostile to bugs. So think twice about what you are doing,
even in the beginning, avoid as much as possible giving in to hurry
and laziness right from the start.
2.2. 1 case out of 45: powerful language = exploitable language
The roff(7) language is Turing complete. In particular, it supports
iteration and recursion. Recursively defining a user-defined string,
then expanding it, tricked the parser into an infinite loop,
eventually dying from exhaustion of memory.
This teaches us two things:
- Some vulnerabilities are extremely hard to fix.
Attempts at fixing them can easily degenerate into limiting
functionality. Even if you manage to fix them, if the language
is powerful enough, chances are it's not very difficult to
construct a subtler variant of the exploit that the fix doesn't
- So if you have a choice, keep the power and complexity of languages
down: Make them powerful enough to do the job, but no more.
Otherwise, it may be impossible to ever design a secure parser
2.3. 3 cases out of 45: use after free of an allocated pointer
In all three cases, a pointer was allocated, a copy of the pointer
saved somewhere else, the original pointer freed, and the copy
later used at the other place.
This teaches three lessons:
- When freeing a pointer, think twice whether any copies may exist,
and act accordingly, for example NULLing out the copies after
freeing the original.
- When using a pointer that is not stored at its canonical place
but a copy from somewhere else, think twice where the original
pointer might get freed and make sure the original copy is still
- Try to avoid copying allocated pointers if possible, in particular
if they may get freed before the end of the program.
2.4. 5 cases out of 45: string parser advances beyond the terminating \0
All of these cases were triggered by grossly incomplete syntax at
the end of a string coming from user input.
Again, there are three lessons:
- When advancing a parse pointer, be vary of incomplete syntax
and always make sure you are not on a \0 before advancing
to the next byte. Apparently, this rather trivial point
is much more easily missed than i would have thought.
- Use double care when advancing a parse pointer that will
be passed back to the caller upon return. Mere read buffer
overflows in the called subroutine can easily turn into
even more dangerous write buffer overflows in the calling
- If possible, avoid writing to parse buffers. If you have to
manipulate the input data during parsing, consider copying
the respective part, then modify that, and throw it away
when done. That reduces the risk of write buffer overruns.
2.5. 6 cases out of 45: missing elementary input validation
The cases that came up were:
- dividing by user input without checking for 0
- incomplete checks for the validity of input bytes
(specifically, in UTF-8 parsing, accepting some invalid
sequences, resulting in forbidden bytes in the ASCII parse
buffer, resulting in forbidden bytes in ASCII output buffers,
eventually triggering an assertion - without that assertion,
the consequences might have been even worse)
- unreasonably large positive integers in user input (twice)
- unexpectedly negative integers in user input
- inconsistent claims of the user about the size of the
provided data (hello, heartbleed...)
Of course, many of the more complicated problems in this list can -
in one way or another - also be classified as input validation issues.
But i was surprised that there were so many rather elementary cases
of missing input validation. So be sure to *systematically* check
your input code for all corner cases that you can think of, and
repeat the check some time later or ask somebody else to repeat
that particular check, even if you think you are not all that stupid.
Apparently, such errors are easier to miss than one might think.
2.6. 14 cases out of 45: exceptions from invariants
Whenever writing code, you need to assume that your data structures
maintain some invariants. If you would assume that the content of
any variable could be invalid in any possible way at any point in
the program, you would get utterly bogged down in error handling.
In mandoc, assuming some invariant holds, relying on it, and then
finding out that there can be an exception to that particular
invariant at some particular point in the code was one of the two
main root causes for crashes found by afl. Most of these cases
were related to syntax tree node structures.
In about half of the cases, it was general invariants supposedly
applying to each and every instance of a given type that could
get violated when specific macros were used in invalid ways.
The other half of the cases were specific invariants applying
to specific macros only that could get violated when these
macros were used in invalid ways.
A possible lesson is that you ought to explicitely write down all
the invariants you intend to use, make sure they are really maintained
at all places where the respective data structures are created and
changed, and make sure you assume nothing about the content of the
data except those explicitly specified invariants. I haven't found
the time yet to do that for mandoc.
2.7. 15 cases out of 45: logic errors arising from complexity
This class is particularly disturbing: It is the top root cause for
crashes found by afl in mandoc, one third of all. These bugs arose
from code complexity that is hard or impossible to describe in terms
of invariants of data structures. There were two subclasses.
For one subclass (block rewinding), causing 6 crashes, it turned
out the complexity was needless. It was possible to reorganise
this code in a much simpler way. Obviously, complexity that isn't
present cannot cause logic oversights, so you should look at your
code, identify the most complicated parts, and try to find ways to
But the most disturbing subgroup, containing 8 out of 44 crashes
and the only endless loop that wouldn't eventually die from memory
exhaustion, is caused by the complexity of the implemented language
itself (specifically, support for bad nesting of mdoc(7) blocks),
not by a clumsy implementation. Even after considerable simplification,
that code is still frighteningly complex and afl found two new
crashes even after the simplification and associated partial audit
So, for writing secure software, the key really is to already design
the user-visible functionality for simplicity and clarity. Complexity
introduced on that level will reach all the way through to the
implementation details. It is obvious that those issues will be
hardest to fix, and the example of the mandoc afl audit demonstrates
that these issues may even end up being quantitatively dominant.
3. Analysis of severities
Ordered by decreasing severity, the issues found had the following
- 2 write buffer overruns (both advancing a pointer past \0 and returning it)
- 3 read buffer overruns (all advancing a pointer past \0 without returning it)
- 2 use after free (both free a pointer, keep a copy, and use it)
- 2 unsigned integer underflows (by unchecked subtraction and unvalidated input)
- 2 cases of effectively infinite output (by not checking input for large numbers)
- 2 infinite loops (one by user input, one by complicated internal logic)
- 1 division by zero (by unvalidated user input)
- 11 NULL pointer accessess
- 20 assertion failures
This looks like a negative correlation between the difficulty to
avoid classes of issues and the typical severeness of the consequences
of failure. The easier to avoid, the more dangerous a failure
appears to be. This indicates that it really pays off to get the
obvious things right:
- Be particularly careful when passing around parse pointers.
- Make sure you don't parse beyond \0.
- Watch out for use after free.
- Don't forget to validate any input.
- Be careful with arithmetic operations (/, *, -, +, etc.)
In this study, the hardest problems only caused assertion failures,
NULL pointer accesses, and only one infinite loop. But don't rely
on that. For one thing, i don't see any theoretical reason why
issues arising from excessive complexity could not cause issues of
arbitrary severity. Besides, the dominance of assertion failures
over other modes of failure seen in this study might be a benefit
of security-conscious coding practices being used in the first
place, catching problems relatively early in most cases, while the
average severity of vulnerabilities caused by similar issues in
code written with less internal safeguards could be considerably
4. Bug distribution across modules
It is interesting to note that the number of bugs found per line of code
is constitent in all modules with the average density across all modules,
see the second appendix for some numbers. For almost all modules, the
numbers agree within one standard deviation, and only for one, there is
a difference of two standard deviations. Of course, statistics are small,
so this is not a very strong statement.
In particular, contrary to what might have been expected, the largest
and most complicated module, the mdoc(7) parser, doesn't have a
higher density of bugs than other, smaller and simpler modules.
The reasons are only accessible to speculation, i don't see any way
to safely establish why this might be so. It could be, though,
that the reason for the mdoc(7) parser to not be more bug-ridden
is that it received more scrutiny in the past because it is used
most of all the languages, so functional defects are more visible
and annoying to lots of people compared to functional defects in
less commonly used languages like tbl(7) and eqn(7).
5. Total number of bugs
In retrospect, my surprise at the number of bugs found may
surprise you. Looking at studies like the following, finding 45
crashes and hangs in over 30.000 lines of code - less than 1.5/KLOC
- doesn't seem all that out of proportion for a project with 1.5
part-time developers and no formal QA department. That, yet again,
emphazises how people tend to over-estimate their own abilities
(myself included). Unless you are a real genius, your code is
likely to contain more bugs than you think, too.
In addition to that, Kristaps Dzonsons pointed out in private
communication that cleanly architecturing the mandoc codebase was
unusually difficult. The languages mandoc(1) parses are not designed
to any strict paradigm but rather use a mixture of different syntactic
and semantic concepts in pragmatic, sometimes historically evolved
ways, and none of the languages has a formal definition. Existing
manuals were often vague and sometimes contradictory. That certainly
doesn't help strict parsing. Many parser requirements were only
discovered after the parsers were already deployed in production,
requiring changes to existing logic and to existing invariants. In
particular, support for badly nested blocks was added a long time
after the design of all basic data structures was complete, and
even normal block rewinding rules weren't clear at all at the outset.
Originally, there was no intention to include any low-level roff(7)
support, and the whole roff(7) parser was edged in below existing
higher-level parsing code at a later stage, when it became clear
that many legacy manuals rely quite a bit on low-level roff features.
Even in a reasonably mature open-source project developed with
security in mind, you may expect on the order of one remaining bug
of some importance per thousand lines of code. For optimal return
on investment while fighting the worst consequences, the present
example suggests to watch out for buffer overflows in string parsers,
use after free errors, missing input validation, and integer overflow
in arithmetic operations. In addition to that, two particularly hard
but quantitatively important measures are to explicitly specify
invariants and to avoid excessive complexity of both interface
design and implementation, even though those more difficult topics,
in the present study, were only related to issues of smaller severity.
Appendix - chronological list of issues found by jsg@ with afl in mandoc
date root technical code location and
fixed cause consequence brief problem description
Nov 20 GI/decen ass term.c term_flushln left>right margin
Nov 24 GI/exc NULL mdoc_term.c termp_sh_pre Sh w/o body (Sh Xo Sh)
Nov 26 SI/exc ass mdoc_validate.c Sm with multiple args
Nov 27 SI/exc ass mdoc_validate.c Db with multiple args
Nov 27 compl/rew ass mdoc_macro.c Eo rewinding
Nov 28 SI/exc NULL mdoc_term.c termp_nm_pre meta->name == NULL
Dec 15 pass \0 r ov roff.c parse error, next try after EOL
Dec 15 pass \0 w ov roff.c parse point one too far, passed back
Dec 16 compl/rew corr/ass man_macro.c block closure in next-line
Dec 17 compl/br corr/NULL mdoc_macro.c add item to breaking list
Dec 18 input val /0 roff.c divide by user input
Dec 18 GI/exc NULL mdoc_validate.c post_it item w/o body (It Xo no-Xc)
Dec 18 SI/exc ass mdoc_validate.c post_bl Sm in Bl
Dec 18 SI/exc corr/ass mdoc_macro.c add Ta to closed column list
Dec 18 input val ass preconv.c bare BS in text from invalid UTF-8
Dec 19 ass=inval ass term.c font stack limit impl by assert
Dec 19 compl/rew ass mdoc_macro.c rewinding
Dec 19 compl/br corr/ass mdoc_macro.c only Bl end bef begin; only extend Bl with It
Dec 22 compl/br loop mdoc_macro.c block break 2 others (Po Ao Pc Bo Pc Ac Bc)
Dec 24 input val output man_term.c large indent or par dist
Dec 24 input val unsi unde tbl_term.c number too wide for tbl cell
Dec 24 input val unsi unde man_term.c large neg indent
Dec 24 input val output mdoc_term.c large indent
Dec 25 turing loop roff.c recursive expansion of user-def strings; limit length
Jan 01 GI/exc NULL eqn_term.c missing args
Jan 01 pass \0 r ov mandoc.c trailing \s- \s+
Jan 01 pass \0 w ov roff.c trailing backslash at EOF in cond; pass \0, returned
Jan 29 GI/exc NULL tbl_layout.c line w/o cell
Jan 30 GI/exc NULL tbl_data.c tblcalc not called if first span in .TP head
Jan 30 *alloc ass term.c don't keep absolute pointers into area that is realloc'ed
Feb 01 compl/br corr/ass mdoc_macro.c Fo Nm Fc; opening body of non-existant block
Feb 01 compl/rew ass mdoc_macro.c rewinding
Feb 03 compl/br ass mdoc_macro.c expl broken by It; close twice
Feb 07 *alloc use free mdoc_validate.c delete node in tree iteration
Feb 10 pass \0 r ov tbl_layout.c f layout mod at eol
Feb 10 SI/exc NULL mdoc_validate.c St no-arg child-mac; empty St
Feb 11 compl/rew corr/NULL mdoc_macro.c exp blocks close Nd; syntax tree corr
Feb 11 SI/exc NULL mdoc_*.c Eo w/o tail
Feb 12 compl/br corr/ass mdoc_macro.c Bo Bl It Bd Bc It; pending pointer corrupt
Feb 12 compl/br ass mdoc_macro.c Bl It Bl -column It Bd El text text El; miss ENDBODY check
Apr 29 *alloc use free tbl_layout.c last line empty
Apr 29 compl/br corr/NULL mdoc_macro.c bogus MDOC_BROKEN by stray end macro
Apr 29 SI/exc ass term.c \z\o'ab'c at BOL
Apr 29 compl/br corr/ass mdoc_macro.c body broken, don't extend end
May 01 compl/rew corr/ass mdoc.c/roff.c roff_node_unlink clobbers ROFF_NEXT
Abbreviations of classes of root causes:
- GI/decen - attempt to enforce general invariant decentrally
- GI/exc - exception from supposedly generic invariant
- SI/exc - exception from macro-specific supposed invariant
- *alloc - use after free of an allocated pointer
- ass=inval - abuse of assert() for input validation
- compl/br - excessive complexity in language design (badly nested blocks)
- compl/rew - excessive complexity in the implementation of macro rewinding
- input val - missing elementary input validation
- pass \0 - string parser advances beyond the terminating \0
- turing - Turing-complete language allows the user to trigger infinite loops
Abbreviations of classes of technical consequences:
- NULL - directly caused NULL pointer access
- corr/NULL - data corruption indirectly causing NULL pointer access
- ass - direct assertion failure
- corr/ass - data corruption indirectly causing assertion failure
- loop - infinite loop
- output - excessive amount of output
- r ov - buffer overflow (read access only)
- unsi unde - unsigned integer underflow
- use free - use after free
- w ov - buffer overflow (may write beyond end of buffer)
- /0 - division by zero
Appendix - Bug distribution across modules
In the following tables, the numbers preceding the keywords are:
- loc: number of lines of C code in the respective module
- bugs: number of bugs found in the respective module
- exp: expected number of bugs assuming a Poisson distribution
with a mean of 1.5 bugs/kloc
The first four lines refer to functional modules, in particular
separating parsers and formatters, and include all 45 bugs.
The second group of three lines refers to source languages,
partitioning the same code in a different way, summing parsers and
formatters for the each language. For mdoc(7), no line is included
because the mdoc(7) code is fully dominated by the large mdoc(7)
parser, the small mdoc(7) formatter doesn't change the picture.
Similarly, the second group doesn't include issues of the low-level
roff(7) language because there is no roff(7) formatter.
low-level parsers 5400 loc 7 bugs 8+-3 exp
mdoc(7) parser 17782 loc 22 bugs 27+-5 exp
other parsers 3430 loc 5 bugs 5+-2 exp
formatters 5504 loc 11 bugs 8+-3 exp
man(7) p/f 2543 loc 3 bugs 4+-2 exp
tbl(7) p/f 1373 loc 5 bugs 2+-1.5 exp
eqn(7) p/f 1248 loc 1 bug 2+-1.5 exp
Our thanks to Ingo for his detailed write-up and for working on making on manpages and the tooling around them so much better.