OpenBSD Journal

Developer blog: niallo

Contributed by Niall O'Higgins on from the RCS-RCS-RCS dept.

Now that you're familiar with the RCS file format, I thought I'd write a bit about revision numbers and some the issues surrounding them. Usually, revision numbers are of the form `.' e.g. 1.1, 1.117, 2.1, 3.50. Numbers in this form are said to be on the trunk or main branch, and they are linked in descending order. E.g. 1.30 points to 1.29 which points to 1.28. Of course, these revision numbers can be arbitrary, as long as they are greater than the last. So 1.30 could point to 1.3 which could point to 1.1. Then there is the notion of the HEAD revision, which should always point to the highest of such number pairs. Typically, HEAD is the most recent revision. Things become more complicated with default branches and "magic" branch numbers and all this kind of stuff - I'm not going to write about that right now.

In RCS, many command line options accept revision numbers as optional arguments. E.g. ci -l -f -i -j -k -u then you have co -f -I -p -M ... you get the picture. Clearly, revision numbers constitute very important inputs to the RCS tools. Therefore, how we handle these revision numbers should be a well-tested area of our implementation.

One approach to testing software is to automatically generate "interesting" values as inputs for functions. For example, if a function accepts a character string as an input, run it with zero-length strings, exceptionally long strings, strings filled with randomly generated characters etc - and see how it behaves. Similarly, if a function accepts integer values, run it with very large numbers (see how it deals with integer overflows), negative numbers, zero, one, etc. In some languages, it is possible to do automatic static verification - e.g. Java - and this can be done at the source code level. See for example this paper or this paper for more information on this area. Unfortunately, C has some qualities which make this impractical. However, since this is UNIX, if we take a higher level view we can abstract away from literal functions in the source code to treating the program itself as a function. From this perspective, the command (e.g. ci) is a function which transforms various inputs (standard input, some files, command line options) into some output (standard output/error, some files, error code).

Using this approach, we can automatically test a large quantity of boundary cases which would not normally be tested by humans - and we find some interesting bugs!

In our case, we could compare the behaviour of our RCS implementation with the existing GNU implementation. During testing, we found some erroneous assumptions in our code. For example, we simply weren't expecting the value zero or one to be passed as a revision, and so we hadn't added proper handling for it. Furthermore, our revision number handling API did not cope well with very large revision numbers - resulting in integer overflows.

This has certainly demonstrated to me that fully automated testing of this nature can expose bugs which might otherwise go un-noticed. It also aids greatly in pointing out where we might differ from the reference implementation in subtle ways (exit codes, standard output/error).

(Comments are closed)

  1. By tedu ( on

    another paper which may be interesting, about auto-generation of C test cases:

    1. By Nate ( on

      Hey tedu, since you work there, do you know why OpenVPN, OpenLDAP and FreeBSD are on, but OpenBSD isn't?

      1. By dlg ( on

        i think they have to be able to build to software on the platform their test software runs on. getting openbsd to build anywhere but on openbsd is not fun.

        1. By Amir Mesry ( on

          Might be because it hasn't found any in it.

      2. By tedu ( on

        mostly, anything missing is not there because of the feasibility (or lack) of building it in our current setup. the project is not done, however.

    2. By niallo ( on

      Very interesting paper indeed. Their tool looks most promising, I'd love to have a look at their CIL transformations.

  2. By SH ( on

    This has certainly demonstrated to me that fully automated testing of this nature can expose bugs which might otherwise go un-noticed.

    Subversion has a big automated test suite that is very useful. Takes some time to complete, as anyone that has done a "make regress" in the svn port has noticed.

  3. By Corentin ( on

    > During testing, we found some erroneous assumptions in our code.

    What is your opinion (other OpenBSD developers and users are of course encouraged to reply to this question as well) regarding the correct use of assertions to detect erroneous assumptions? I am a strong advocate of their use (only when appropriate, of course; i.e. not to do user error handling but to catch developer mistakes) so I am interested in knowing what other careful developers do think of them.

    1. By Otto Moerbeek ( on Otto Moerbeek

      Assertions can sometimes help when developing code; but they have no place in production code. Error handling (for ANY error) should be an integral part of the code, including resource cleanup, messages, recovery, whatever is needed. Assertions let you chicken out; they give you an excuse not to handle errors properly.

      Another BIG drawback of assertions is that it's easy to introduce side effects that either hide bugs or introduce bugs, making the behaviour of the programs compiled with and without assertions not the same.

    2. By Marco Peereboom ( on

      Assertions suck. It makes usually for lousy coding practices. If for some reason a piece of code needs an assertion it also means it needs that code during production. The fact that assertions don't happen during development is no guarantee that they won't ever happen. Therefore it is a bad coding practice that should be quelled.

    3. By niallo ( on

      I have actually never considered used assert(). In OpenCVS/RCS we do however make use of fatal() which is related I guess. However, as in OpenSSH, it's almost exclusively used in the context of memory management (xmalloc, xfree, xrealloc, etc).

      I would agree with Otto and Marco, its crucial to handle errors properly in the first place.

      1. By niallo ( on

        Oops, I meant to write:

        In OpenCVS/RCS we do however make use of fatal(), as in OpenSSH, which is related I guess. However, it's mostly used in the context of memory management (xmalloc, xfree, xrealloc, etc).

        Also its important to note that fatal() is always going to work the same way, it won't get compiled out in production builds like assert().

        1. By Corentin ( on

          Well, I think you are completely right (Otto and Marco are, too): assertions suck at error handling... because they are not a error handling mechanism at all; I know there are so many people using them for error handling because they are perhaps too lazy to write correct error handling code instead. But they are really meant to prove the correctness of a piece of code and are great IMHO to easily catch a few subtle "must never happen" *bugs* (not the exceptions that can and will happen such as user/system errors, bad inputs, etc.)

          Anyway, thanks all for replying! It is always great to know what you do think about coding practices like that (I was really curious about that one, because I noticed they were not in heavy use in the OpenBSD source tree and so I thought you probably had a good reason to avoid them).


Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]