OpenBSD Journal

mandoc - UNIX Manuals

Contributed by jcr on from the manly dept.

If you've ever opened up a raw man page in a text editor, somewhere in the back of your mind you heard the words of Arthur C. Clarke, "Any significantly advanced technology is indistinguishable from magic," or the words of Larry Niven, "Any sufficiently advanced magic is indistinguishable from technology." Either way, you knew you had a lot to learn.

The recent and upcoming mandoc (via mdocml) changes are significant improvements to how manual pages are handled in OpenBSD. This important work not only improves build times, but also improves rendering and flexibility. Kristaps Dzonsons and Ingo Schwarze (schwarze@) were kind enough to tell us about the ongoing work.

NOTE: If you find man page formatting bugs, please do not file a PR (Problem Report) with sendbug. You should report problems directly to Kristaps and Ingo, or better to the mdocml mailing lists:

mdocml is being used by OpenBSD, NetBSD, and DragonflyBSD as well as in the ports collection of FreeBSD, so reporting problems properly upstream helps everyone.

Also, you should check the todo list Ingo Schwarze (schwarze@) has put together so you don't report an already known issue:

Just in case you missed it, Kristaps gave a presentation on mdocml at AsiaBSDCon 2009; Deprecating groff for BSD manual display. You can find the paper and a video of the talk online. Also, you should get familiar with the magic by reading the relevant documentation, both in OpenBSD itself and in the mdocml suite.

A nice description of the mdocml suite is at mdocml.bsd.lv

mdocml is a suite of tools compiling -mdoc, the roff macro package of choice for BSD manual pages, and -man, the predominant historical package for UNIX manuals. The mission of mdocml is to deprecate groff, the GNU roff implementation, for displaying -mdoc pages whilst providing token support for -man.

Why? groff amounts to over 5 MB of source code, most of which is C++ and all of which is GPL. It runs slowly, produces uncertain output, and varies in operation from system to system. mdocml strives to fix this (respectively small, C, ISC-licensed, fast and regular).

The core of mdocml is composed of the libmdoc, libman, and libroff validating compiler libraries. All are simple, fast libraries operating on memory buffers, so they may be used for a variety of front-ends (terminal-based, CGI and so on). The front-end is mandoc, which formats manuals for display.

The mdocml suite is a BSD.lv Project member.

From Kristaps Dzonsons:

I wrote mandoc, so I can provide some historical bits.

mandoc exists because grohtml didn't let me change the colour of `.Sh' section headings. I wanted to HTML-format the manuals for mult and sysjail (cf. http://bsd.lv/) consistent with the style of the surrounding site and was upset when it didn't Just Work. So I wrote a little tool to consume my manuals in -mdoc and directly produce CSS and HTML, so mdocml is short for mdoc2html and CVS dates my first check-ins at 18 months ago.

Anyway, 14 months ago mdocml branched into mdocterm and mdochtml, sometime before AsiaBSDCon-2009, where it was featured in a talk. Soon after it went into OpenBSD.

mdocml begat mandoc, consolidating mdocterm and mdochtml with nroff-compatible arguments (-Txx and -mxx) around 12 months ago. Soon after, mandoc took on -man via libman also about 12 months ago. The only recent feature added was -Txhtml about 3 months ago. The rest is accommodating for roff's strangeisms, both in terms of input and output.

Today, mandoc is in the base system of OpenBSD (linked to the build!), NetBSD, and Dragon Fly BSD as well as in ports for FreeBSD.

mandoc's main code contributors are Ingo Schwarze (schwarze@) and Joerg Sonnenberger (NetBSD). Ingo has performed yeoman labour in fitting mandoc into OpenBSD's build process, submitting patch after patch to get it working properly with OpenBSD manuals, and making it byte-compatible with GNU troff. Joerg has also done considerable work to similar effect, and also contributed the compatibility layer that allows mandoc to work on most any Unix system. Ingo and Joerg are the downstream contacts for OpenBSD and NetBSD. Ulrich Spörlein is the downstream contact for FreeBSD and Sascha Wildner is the downstream for Dragon Fly BSD. Many others, most significantly Jason McIntyre (jmc@), have tested mandoc on all manner of manuals.

I do ask that anybody with a manual please test with mandoc. And if it doesn't work, to cross-check mdoc.samples(7) and mdoc(7) to make sure that the manuals aren't broken before submitting a report. Lastly, if one's manuals are in -man format, for gods sakes re-write them in -mdoc format! (Search for " Fixing on a Standard Language for UNIX Manuals" for the scoop...)

When I made the mistake of filing an OpenBSD PR with sendbug on a man page formatting bug, Ingo was kind enough to send me the following.

From Ingo Schwarze (schwarze@):

In general, when groff copes, mandoc ought to cope, too. Except for very rare exceptions.

We know there are still lots of small problems. It is not difficult to dig up more of these than we can fix quickly by just running automatic comparisons across the tree.

The most helpful bug reports regarding non-fatal mandoc errors currently are those that

  • report formatting errors with important consequences (e.g. layout completely garbled to the point that one cannot figure out the content any more)
  • report one problem at a time
  • say precisely what goes wrong, best with man(7)/mdoc(7) source code and groff and mandoc output

But even those need not go to the OpenBSD bug tracker, or we will quickly put more nits there than people want to see in that place. Just send them to kristaps and me directly.

(Comments are closed)


Comments
  1. By Will Backman (bitgeist) bitgeist@yahoo.com on http://bsdtalk.blogspot.com

    I hear this has major benefits for build times on various architectures.

    Comments
    1. By J.C. Roberts (jcr) on http://www.designtools.org

      I've heard the same, but I haven't seen any hard numbers (yet) to back up the claims. I'm sure said numbers exist someplace to some degree, but this article is to detail early efforts so I doubt such metrics are available.

      BTW, I just finished watching your "The Microphone as Mirror" BSD talk --excellent!

      Comments
      1. By J.C. Roberts (jcr) on http://www.designtools.org

        > I've heard the same, but I haven't seen any hard numbers (yet) to back up the claims. I'm sure said numbers exist someplace to some degree, but this article is to detail early efforts so I doubt such metrics are available.
        >

        I found the following test numbers from Ingo sitting in my inbox unread.

        ------------------
        The following happened on an old, slow i386 box (Athlon 1200 MHz,
        768 MB RAM):

        schwarze@rhea $ time make cleangman cleanman
        0m42.75s real 0m13.97s user 0m15.18s system

        * without Perl and OpenSSL (2730 files)
        schwarze@rhea $ time make buildman
        1m23.97s real 0m26.39s user 0m36.58s system (w/o hyph)
        schwarze@rhea $ time make buildgman
        5m40.59s real 3m56.96s user 1m21.98s system (w/o hyph)
        6m13.08s real 4m25.52s user 1m24.69s system (with hyph)

        * in gnu/usr.bin/perl (671 files)
        schwarze@rhea $ make -f Makefile.bsd-wrapper buildman
        0m18.75s real 0m7.64s user 0m9.46s system (w/o hyph)
        schwarze@rhea $ make -f Makefile.bsd-wrapper buildgman
        1m9.63s real 0m43.77s user 0m22.33s system (w/o hyph)

        * in lib/libssl (238 files)
        schwarze@rhea $ time make buildman
        1m40.32s real 1m25.84s user 0m12.45s system
        1m41.27s real 1m25.97s user 0m12.99s system
        schwarze@rhea $ time make buildgman
        2m10.80s real 1m37.21s user 0m16.19s system
        1m55.95s real 1m36.89s user 0m16.53s system
        1m58.78s real 1m37.49s user 0m18.28s system (w/o hyph)

  2. By Carson Chittom (carson) carson@ashatteringbeauty.org on http://ashatteringbeauty.org

    This is good as far as it goes, I suppose. But I have to question why we're still using a 40-year-old[1] typesetting system for our man pages, which—as far as I'm aware—isn't used for anything else.[2] I'm not saying I know what the right answer is, but surely there must be something more usable than the immensely arcane syntax I see when I look at the source for a man page: even TeX is better.

    Is there some technical reason we still set man pages this way?

    [1] Wikipedia says the Unix version of roff dates from around 1970.
    [2] Please correct me on this if you can. I'm genuinely curious.

    Comments
    1. By Peter J. Philipp (pjp) on http://centroid.eu

      > Is there some technical reason we still set man pages this way?

      I don't know if there is a technical reason but as a side-comment I have the O'Reilly and USENIX Association System Managers Manual for 4.4BSD and it's basically the manpages printed out in a book the size of your usual ORA book. Immensely fun to read when wanting to explore the older system IMO. If OpenBSD publishes the manpages in this way I'd probably buy it.

      -peter

      Comments
      1. By Carson Chittom (carson) on http://ashatteringbeauty.org

        > I don't know if there is a technical reason but as a side-comment I
        > have the O'Reilly and USENIX Association System Managers Manual for
        > 4.4BSD and it's basically the manpages printed out in a book the size
        > of your usual ORA book. Immensely fun to read when wanting to
        > explore the older system IMO. If OpenBSD publishes the manpages in
        > this way I'd probably buy it.


        I'm glad someone else feels this way too. I'd love to have this--preferably as individual volumes for each section (for some reason, I imagine them in different colors). I'd like to say I'd buy them, but I probably wouldn't, economic times being what they are and I have a family to keep fed, housed, and clothed.

        But if I had the spare cash, I totally would get 'em.

      2. By Anthony J. Bentley (bentley) on

        > I don't know if there is a technical reason but as a side-comment I
        >have the O'Reilly and USENIX Association System Managers Manual for
        >4.4BSD and it's basically the manpages printed out in a book the size of
        >your usual ORA book. Immensely fun to read when wanting to explore the
        >older system IMO. If OpenBSD publishes the manpages in this way I'd
        >probably buy it.

        I've also desired something like that. In lieu of that I set MANPS=Yes in /etc/mk.conf -- whenever I start reading a new manual I rebuild it and read the Postscript. Certainly doesn't compare to the printed page though.

        Speaking of Postscript: unlike groff, Mandoc doesn't currently have a Postscript target... yet. However, expect great things by the end of this summer -- one of NetBSD's GSoC projects this year is implementing mandoc -Tps!

        Comments
        1. By Marc Espie (espie) on


          > Speaking of Postscript: unlike groff, Mandoc doesn't currently have a Postscript target... yet. However, expect great things by the end of this summer -- one of NetBSD's GSoC projects this year is implementing mandoc -Tps!

          Iick. I definitely hope that project also includes -Tpdf

          What's the point of -Tps these days ?
          PostScript is a nice programming language, but it's ways too many things for a page description language. See, there's a reason why Adobe created pdf a few years after PostScript.

          Comments
          1. By Anonymous Coward (kristaps) on

            >
            > > Speaking of Postscript: unlike groff, Mandoc doesn't currently have a Postscript target... yet. However, expect great things by the end of this summer -- one of NetBSD's GSoC projects this year is implementing mandoc -Tps!
            >
            > Iick. I definitely hope that project also includes -Tpdf

            If there's time, yes:

            http://netbsd-soc.sourceforge.net/projects/mandoc_ps/

      3. By Barry Grumbine (barry) on

        > [...] it's basically the manpages printed out in a book [...]
        >If OpenBSD publishes the manpages in this way I'd probably buy it.

        Same here. It is far easier to get my employer to buy books than CD sets. Books are expendable, software requires draconian property control measures.

    2. By Anonymous Coward (kristaps) on

      > This is good as far as it goes, I suppose. But I have to question why we're still using a 40-year-old[1] typesetting system for our man pages, whichas far as I'm awareisn't used for anything else.[2] I'm not saying I know what the right answer is, but surely there must be something more usable than the immensely arcane syntax I see when I look at the source for a man page: even TeX is better.
      >
      > Is there some technical reason we still set man pages this way?
      >
      > [1] Wikipedia says the Unix version of roff dates from around 1970.
      > [2] Please correct me on this if you can. I'm genuinely curious.

      mdoc is nice cause it annotates terms semantically: `Fn' is a function name; `Vt' is a variable type; `Op' is a command option; etc. These nicely satisfy what UNIX manuals are meant to do: document what the hell things mean.

      The only other semantic format I know of is docbook. Have you ever seen a CVS diff of XML? Line-based formats have their pluses.

      Furthermore, if you imagine a TeX package annotating the stuff mdoc does, it probably wouldn't look much different.

      In short, it may look a little funny, but it's compact, expressive, and minimal. What more do you want?

      Comments
      1. By Carson Chittom (carson) on http://ashatteringbeauty.org

        Your points are laregely well-taken.

        > In short, it may look a little funny, but it's compact, expressive, 
        > and minimal.  What more do you want?
        

        That last adjective, honestly, is my biggest thing. The format for manpages is, in my opinion, too minimal. Minimalism can be taken too far[1]. Take the following example from ftp(1) on FreeBSD 7.2 (what I happened to be logged into):

        .Dd January 15, 2005
        .Dt FTP 1
        .Os
        .Sh NAME
        .Nm ftp
        .Nd
        Internet file transfer program
        .Sh SYNOPSIS
        .Nm
        .Op Fl 46AadefginpRtvV

        All of which is more or less understandable if you look at the file and the display of it side-by-side, but I just don't believe that it's any less legible and maintainable to and by authors to do something like (in pseudo-LaTeX):

        \documentclass{manpage}
        \usepackage{basehdrftr}
        \title{ftp}
        \mansection{1}
        \date{January 15, 2005}
        \description{Internet file transfer program}
        \options{46AadefginpRtvV}
        \begin{document}
        \maketitle
        

        Et cetera. Of course, there's also the fact that a typesetting system is a need that (while by no means universal) many people have for reasons other than manpages; I just think it would make more sense from a purely logical perspective to have a general purpose tool—which of course is what we have in groff (hence my other earlier point about use outside of manpages)[2]—that's in some way better than the status quo, rather than just reimplementing the functionality useful for manpages.

        Of course, I'm for various reasons unwilling to spend the time and effort myself to do something like that, so I'll stop arguing about it.

        [1] Phillip Glass, I'm looking at you here.
        [2] I was informed by email that some book publishers (no names were named) still use roff to typeset their books.

        Comments
        1. By Predrag Punosevac (Oko) on

          > Your points are laregely well-taken.
          >
          >
          > In short, it may look a little funny, but it's compact, expressive,
          > and minimal. What more do you want?
          >
          >
          > That last adjective, honestly, is my biggest thing. The format for manpages is, in my opinion, too minimal. Minimalism can be taken too far[1]. Take the following example from ftp(1) on FreeBSD 7.2 (what I happened to be logged into):
          >
          > .Dd January 15, 2005
          > .Dt FTP 1
          > .Os
          > .Sh NAME
          > .Nm ftp
          > .Nd
          > Internet file transfer program
          > .Sh SYNOPSIS
          > .Nm
          > .Op Fl 46AadefginpRtvV
          >
          > All of which is more or less understandable if you look at the file and the display of it side-by-side, but I just don't believe that it's any less legible and maintainable to and by authors to do something like (in pseudo-LaTeX):
          >
          >
          > \documentclass{manpage}
          > \usepackage{basehdrftr}
          > \title{ftp}
          > \mansection{1}
          > \date{January 15, 2005}
          > \description{Internet file transfer program}
          > \options{46AadefginpRtvV}
          > \begin{document}
          > \maketitle
          >
          >
          > Et cetera. Of course, there's also the fact that a typesetting system is a need that (while by no means universal) many people have for reasons other than manpages; I just think it would make more sense from a purely logical perspective to have a general purpose toolwhich of course is what we have in groff (hence my other earlier point about use outside of manpages)[2]that's in some way better than the status quo, rather than just reimplementing the functionality useful for manpages.
          >
          > Of course, I'm for various reasons unwilling to spend the time and effort myself to do something like that, so I'll stop arguing about it.
          >
          >
          > [1] Phillip Glass, I'm looking at you here.
          > [2] I was informed by email that some book publishers (no names were named) still use roff to typeset their books.
          >

          Oh, my goodness! Can you stop repeating that non-sense? Do you think you are the only TeX user around? Do you think nobody around knows who is Don Knuth?

          TeX is an exceptional typesetting system for the things it is designed to do. It is unsurpassed for typesetting mathematics. It is ill suited for typing man pages. Where should I start? TeX had trouble until relatively recently with ASCII output. It is simply not designed for terminal output. It is written in WEB! The original native output is Pascal code. Comparing to Troff (originally written in Assembly and then translated into C) it is epitome of bloat. The modern TeXLive/MikTeX distributions are close to 1GB. It is not designed to do one thing well. It is designed to do many things well. On the contrary
          Troff uses the concept of pre-processors. For example in order to type mathematics you have to use eqn pre-processor. All those things are well-documented.

          Groff is written in wrong language. It is over-engineered (Typical for GNU) and uses wrong type of license for BSD. It has to go and I am glad it is the thing of the past.

          Lastly, if you are sincere Groff user you should be also happy. Now when the Groff is moving to ports it will be finally updated. Did you notice that a version of Groff from the base of OpenBSD was 3-4 release versions behind the latest official release?

          Comments
          1. By Carson Chittom (carson) on http://ashatteringbeauty.org

            I know I said I wasn't going to argue about this anymore, but....

            From your reply to me, it's obvious that English is more than likely not your first language and that your command of it is not perfect (that's not an insult, by the way: my command of Russian (the only language other than English I have anything approaching a grasp of) is much, much worse than your English). But as a result, you have misunderstood me, and attributed to me sentiments which I neither intended to imply, nor, indeed, hold.

            > TeX is an exceptional typesetting system for the things it is
            > designed to do. It is unsurpassed for typesetting mathematics. It is
            > ill suited for typing man pages. Where should I start? TeX had 
            > trouble until relatively recently with ASCII output. It is simply not 
            > designed for terminal output. It is written in WEB! The original 
            > native output is Pascal code. Comparing to Troff (originally written 
            > in Assembly and then translated into C) it is  epitome of bloat. The 
            > modern TeXLive/MikTeX distributions are close to 1GB. It is not 
            > designed to do one thing well. It is designed to do many things well. 
            

            We are actually in perfect agreement here. TeX is designed (well-designed, in my opinion) for producing printable output, not displaying on a terminal. And it goes without saying that any of the various TeX distributions are of course far too big to go into the base system of any operating system. My example was in pseudo-LaTeX because I'm familiar with LaTeX and because I wished to demonstrate a syntax that I personally find much more readable than that used for manpages, not because I thought TeX should replace troff. Just to be explicit: TeX is ill-suited to for manpages.

            > Groff is written in wrong language. It is over-engineered (Typical 
            > for GNU) and uses wrong type of license for BSD. It has to go and I 
            > am glad it is the thing of the past. 
            

            I could not possibly care less what language groff is written in, except that I want the people who have to maintain it in the tree to like their work. I do care somewhat about what license it is distributed under. But mostly, as I said in the comment you replied to, a typesetting system is a need that many (although not all) users have; I happen to be one who does. If someone developed a well-designed typesetting system that improved on troff (rather than just reimplementing the functionality needed for manpages) and TeX, that could be used for terminal and printed output, and was BSD-licensed, and was in the base system—well, that would make me happy. Hence my original comment that "This is good, as far as it goes"; I just wish it went further.

            But since I don't have the time to do it myself, nor the money to pay someone to do it for me, I really am going to stop arguing about it now.

  3. By Chris (LizardKing) on http://www.chriswareham.net/

    This is great news for many reasons. One of the more obscure ones, is that native builds of NetBSD Vax choke on parts of the groff codebase. As far as I'm aware, groff is the only major portion of the codebase on any of the BSD's that is written in C++ (toolchain excepted). And as a fully paid up member of the C++ haters club, the switch to a C codebase is much appreciated.

    Comments
    1. By Markus Peloquin (incripshin) on http://cs.wisc.edu/~markus

      > This is great news for many reasons. One of the more obscure ones, is that native builds of NetBSD Vax choke on parts of the groff codebase. As far as I'm aware, groff is the only major portion of the codebase on any of the BSD's that is written in C++ (toolchain excepted). And as a fully paid up member of the C++ haters club, the switch to a C codebase is much appreciated.

      C++ has the best set of data structures of any language I know of. Every time I write C, I end up bumping up the computational complexity so I can avoid writing something like a doubly-linked list or a map. Sure, templates are a huge pain in the ass sometimes, and it is slow to compile. I find the crusade against C++ otherwise mindless.

      The only rational reason to drop C++ in this case is so the base system can be built entirely with a non-GPL compiler, pcc. You won't find many people who hate the GPL as much as I do, except here on Undeadly.

      Comments
      1. By Chris (LizardKing) on http://www.chriswareham.demon.co.uk/

        > C++ has the best set of data structures of any language I know of.

        Then you might want to broaden your horizons a bit. I'm familiar with the STL, Java's Collection classes and the Foundation classes that come with Cocoa, OpenStep and GNUstep. Of the three, the STL is the weakest, with little choice among types of data structure, and wildly varying behaviour across implementations. Java's Collections and the ObjC Foundation classes support a wider range of abstractions and implementations with well defined behaviour, and as a plus have far more intuitive interfaces and naming conventions. Boost is nothing to crow about either, as much of it is esoteric shite and a nightmare to compile thanks to portability issues. And don't event get me started on that ACE crap.

        > Every time I write C, I end up bumping up the computational complexity
        > so I can avoid writing something like a doubly-linked list or a map.

        Wow, how long have you been programming in C? All C programmers I know either have a personal library of code that implements data structures such as these, or use a library such as glib. Most large corporate codebases written in C that I've come into contact also have such well tested data structure libraries. Several of them even used glib thanks to its LGPL license.

        > Sure, templates are a huge pain in the ass sometimes, and it is slow
        > to compile. I find the crusade against C++ otherwise mindless.

        You describe most peoples main gripe with C++ - it's piss poor templates - and then dismiss their concern as "mindless". Oh well. That's without even starting on the mistakes in the fundamental design of the C++ object model.

        >
        > The only rational reason to drop C++ in this case is so the base
        > system can be built entirely with a non-GPL compiler, pcc. You won't
        > find many people who hate the GPL as much as I do, except here on
        > Undeadly.

        Or maybe the reason is that every C++ codebase I've come across has been a maintenance nightmare, whereas equivalent C based ones have
        proved to be much more manageable. It is possible to write a large, maintainable codebase in C++, but it usually depends on using a small subset of C++ and having extremely strict auditing in place to prevent inherent pitfalls and usual fuckwittery that many C++ coders resort to.

    2. By Marc Espie (espie) on

      > This is great news for many reasons. One of the more obscure ones, is that native builds of NetBSD Vax choke on parts of the groff codebase. As far as I'm aware, groff is the only major portion of the codebase on any of the BSD's that is written in C++ (toolchain excepted). And as a fully paid up member of the C++ haters club, the switch to a C codebase is much appreciated.

      How does NetBSD manage to botch that ? groff builds and works just fine on OpenBSD vax.

      Of course, we tend to use native builds, instead of relying on flaky cross-builds.

      If what you're saying is true...

      ... I thought netbsd had learnt from the fiasco of a few years ago.

      NetBSD: running on a bazillion architectures, as long as you're only showing them off and not try to do anything real with them, such as rebuild the system....

      Comments
      1. By Chris (LizardKing) on http://www.chriswareham.demon.co.uk/

        > > This is great news for many reasons. One of the more obscure ones, is that native builds of NetBSD Vax choke on parts of the groff codebase. As far as I'm aware, groff is the only major portion of the codebase on any of the BSD's that is written in C++ (toolchain excepted). And as a fully paid up member of the C++ haters club, the switch to a C codebase is much appreciated.
        >
        > How does NetBSD manage to botch that ? groff builds and works just fine on OpenBSD vax.
        >

        Groff is pretty gnarly code, and makes assumptions about floating point formats and NaN representations that cause problems on the Vax. These problems aren't helped by the torture that is debugging C++ code. NetBSD also updates groff more frequently and is at version 1.19, while OpenBSD is still in 1.15, so issues with changes in groff code crop up more often.

        > Of course, we tend to use native builds, instead of relying on flaky cross-builds.
        >
        > If what you're saying is true...
        >
        > ... I thought netbsd had learnt from the fiasco of a few years ago.
        >
        > NetBSD: running on a bazillion architectures, as long as you're only showing them off and not try to do anything real with them, such as rebuild the system....

        Whatever.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]