OpenBSD Journal

p2k15 Hackathon Report: schwarze@ on USE_GROFF

Contributed by tbert on from the dont-use-groff-it's-a-goat dept.

Ingo Schwarze (schwarze@) writes in with our fourth report from the p2k15 ports hackathon:

When groff was removed from the OpenBSD base system in October 2010, Marc Espie@ marked more than 3000 ports with the USE_GROFF variable, meaning that their manuals were formatted with groff at port build time and the preformatted versions included in the package. Over time, as mandoc(1) matured and learnt to handle more and more syntax, the number of ports having USE_GROFF gradually decreased.

Basically, there are three reasons for wanting to get rid of USE_GROFF: Installing source manuals in principle allows to use semantic searching with apropos(1) - though so far, that mostly applies to mdoc(7) manuals and doesn't make much of a difference for man(7) manuals; avoiding dependencies simplifies optimization of bulk builds for speed; and getting rid of USE_GROFF altogether would take one complication out of the ports build infrastructure.

One of the porters who again and again removed USE_GROFF from more and more ports that no longer needed it is Christian Weisgerber (naddy@), and as of late, he has become even more active in that respect. Recently, he inspected all remaining 250 ports still having USE_GROFF and classified the reasons why it wasn't removed from each one yet, arriving at a list of about 45 different reasons; of course, many ports are affected by multiple reasons, half of the reasons occur in just one single port, and another quarter affects but a handful of ports.

My plan for the hackathon was to take the list of reasons, sorted by frequency, and try to remove USE_GROFF from as many ports as feasible, of course without degrading formatting quality of any ports manuals. It was obvious this might sometimes involve patching invalid manual source code that doesn't render properly even with groff, and even more importantly, it would involve fixing bugs in mandoc(1) and adding missing features to it.

During the hackathon, i managed to work through two of the most common classes of issues: Wrong indentation (46 affected ports) and extra blank lines (29 affected ports). It turned out to be harder than expected because tagging a bunch of ports with a common label ("indent") doesn't imply there is just one problem to be fixed for ticking off the whole class... It felt rather like most ports in that class typically exhibited about two distinct mandoc(1) indentation bugs, and the next port would demonstrate two new ones rather than repeating the ones just fixed, and so on for the one after that... Consequently, there was no way to handle anything close to that still quite considerable number of 250 ports during the four days of the hackathon. But at least, i managed to delete USE_GROFF from the following 22 ports based on work done in Exeter, which is about 10% of what remains: audio/mp3blaster devel/argtable devel/ectags devel/libJudy devel/pcre games/gnushogi games/xmahjongg games/xskat graphics/dcmtk graphics/mpeg_encode lang/classpath lang/erlang lang/php mail/popclient misc/findutils multimedia/transcode multimedia/xine-lib net/mutella net/rabbitmq plan9/sam security/wpa_supplicant www/tntnet.

Three of the top reasons for USE_GROFF - the .ta request (define tabulator stop positions, 61 affected ports), the .ti request (temporary indent for the next output line, 50 affected ports), and the braindead way the infamous DocBook formats bullet lists by manually moving the cursor left and right with \h escape sequences (30 affected ports) - are very hard to fix in the current mandoc(1) parsing framework because mandoc(1) handles roff(7) as a pure preprocessing language and is able to generate syntax tree nodes only from high level mdoc(7) and man(7) macros. But theses three features, .ta, .ti, and \h, require generating syntax tree nodes on the roff(7) level, to be interspersed among high level macro nodes. Achieving this requires a reorganization of the mandoc(1) parsers, unifying the data structures of the syntax trees and the functions handling them across all the various languages.

I didn't work on that reorganization *at* the hackathon, but on the train going there and returning home, replacing mdoc(7) and man(7) specific data structures and functions with unified data structures enum roff_type, struct roff_node, struct roff_meta, struct roff_man, and generic functions: roff_man_alloc roff_man_free roff_man_reset roff_node_alloc roff_node_append roff_block_alloc roff_body_alloc roff_head_alloc roff_elem_alloc roff_node_delete roff_node_free roff_node_unlink roff_word_alloc roff_word_append roff_addeqn roff_addtbl. This unification so far shrank the code by more than 350 lines, and that trend will continue. But above all, these data structures and functions will be used for future roff(7) syntax tree nodes and ultimately for improved low-level roff handling.

So once again, and even though four out of the eight trains i took were seriously delayed by twenty to fourty minutes each, the train trip Karlsruhe-Frankfurt-Bruxelles-London-Exeter-London-Bruxelles- Koeln-Wolfsburg proved quite productive and much less stressful than flying - one other developer having booked with two different air carriers got stranded between two London airports, missed his connection and had to buy a new ticket to get home because the second air carrier washed their hands of the delay caused by the first one...

Changes to mandoc(1) at the hackathon included two other notable refactorings - vastly simplified block unwinding for man(7), similar to what i recently did for mdoc(7), and a common handling for the breaking of explicit mdoc(7) blocks by implicit blocks. Looking at many weird ports manuals resulted in a large number of mandoc(1) bug fixes:

  • mdoc(7): Arguments to end macros of broken partial explicit blocks must go inside the breaking block.
  • mdoc(7): If a partial explicit block extending to the next input line follows the end macro of a broken block, put all of it into the breaking block.
  • man(7): Section headers have hanging indentation when overflowing the line.
  • man(7): Use the default width for .RS without arguments.
  • man(7): On a new .RS nesting level, the saved width starts from the default width, not from the saved width of the previous level.
  • man(7): Fix a quirk with respect to empty .HP.
  • man(7): Do not mistreat empty arguments to font alternating macros as vertical spacing requests.
  • Don't allow breaking the output line after hyphens following escapes.
  • roff(7): Fix rounding rules for horizontal scaling widths.
  • man(1): Do not hardcode the path /usr/bin/ to more(1).

I also got one groff bugfix committed upstream, preventing mdoc(7) .Bl with trailing -width or -offset from picking up old args when formatted with groff.

Given that it was a ports hackathon, i also committed two new ports, textproc/p5-XML-SemanticDiff-1.0004 and devel/p5-Test-XML-0.08, and updated two others, devel/p5-TAP-Formatter-JUnit 0.09 -> 0.11 and devel/p5-TAP-Harness-JUnit 0.41 -> 0.42 around the time of the event.

Unfortunately, i didn't manage to make any time for hiking on Dartmoor or on the South West Coast trail this time - for me, it was a short but very focussed, productive, and pleasant trip. The main side effect of coming to Devon was enjoying quite a bit of culinary art - and who would have thought that even various French developers would be full of praise for the English cusine, and the praise wouldn't subside even when it came to cheeseboards!

(Comments are closed)


Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]