OpenBSD Journal

Developer blog: dlg

Contributed by dlg on from the newbs-and-compilers-dont-mix dept.

Every now and then I am totally surprised by an unknown feature or caveat in something I thought I knew everything about. I recently had one of those moments with the printf family of functions when I was trying to get marco's iogen running on linux.

Everyone who programs in C knows about printf. It is the first function anyone ever uses. How else are you supposed to get "Hello World!" onto the screen (and no, please don't tell me about write or puts or such)? I've spent a lot of time in its manpage as well, trying to figure out how to align crazy values and print bitmasks out properly, so I was fairly confident that I could use it without issue. However, it seems I suck.

There are three things that have to be fixed in iogen to get it to run on linux. The first two are functions that are on obsd, but not in linux. The third thing is a little more subtle and is what caused me to scratch my head. With the function issues fixed, iogen was segfaulting on the following chunk:

            "file size: %llu io size: %llu read percentage: %i random: %s "
            "target: %s result: %s update interval: %i",
            file_size, io_size, read_perc, randomize ? "yes" : "no",
            target_dir, result_dir, interval);

err_log is basically a wrapper around a printf function, so the way it deals with arguments is the same. After spending some time experimenting with the arguments and commenting bits out, it turns out that it was the file_size and io_size arguments and their format string causing the problem. These two variables are of off_t type, and according to the format string there, are supposed to be unsigned long long values. On openbsd this is almost true. If you go poking around in the headers you'll discover that off_t is a typedef of long long, which is a 64bit wide value. However, on linux off_t is a long int by default and only 32bits wide. The problem with the chunk above is the mismatch between the format string and the size/type of the argument that is supposed to correspond with it.

If you tell the printf that you're going to print out a long long value, it will take a long long (64bit) sized variable off the stack (on i386 anyway) and try to print it out. This is a problem if you've only got a long (32bit) sized variable there. In the best case scenario there will be zeros on the stack and your number will be printed fine, however, this is very unlikely. If you're lucky you'll just get a garbage value printed out, but in the worst case (as I experienced with iogen) you'll get a segfault. The following demonstrates:

$ cat test.c
#include <stdio.h>

main(int argc, char *argv[])
        int i = 1, j = 2;

        printf("%lld %lld\n", i, j);
        return (0);
$ make test
cc     test.c   -o test
$ ./test
8589934593 9664997444

As you can see we're not getting what we expect. Fortunately, there is a way to do this properly: know your types!

Most of the time you're going to know what the types of the variables are that you want to print out, and are therefore able to match the format string appropriately. Sometimes you can be unsure or unable (or too lazy) to check to see what is really behind a variables type. In other cases you can get it right on one platform only to have it blow up when you move it to another operating system (eg, off_t). In that case you should proactively cast the argument to the type appropriate for your format string. For example, assume we aren't sure what type an int is:

$ cat test2.c
#include <stdio.h>

main(int argc, char *argv[])
        int i = 1, j = 2;

        printf("%lld %lld\n", (long long)i, (long long)j);
        return (0);
$ make test2
cc     test2.c   -o test2
$ ./test2
1 2

Such a change would help fix iogen on linux.

I always assumed that printf coerced the type appropriately when you passed them like normal functions do, but no, it turns out that it figures out what sized chunk of mem it wants to read based on the format string. This totally blew my mind when that was explained to me.

Of course, if I was using CFLAGS=-Wformat I would have got warnings about this problem in iogen and discovered it earlier.

The moral of my story is know your types and cast the ones you don't know to match your arguments to your format string.

(Comments are closed)

  1. By Chad Loder ( on

    Both "gcc -Wformat" and lint are useful for finding format argument mismatches.

    1. By Anonymous Coward ( on

      However lint wouldn't catch this on OpenBSD, where the types were correctly used.
      Or is your awesome reworked lint intended to be portable (ported ?) on other platforms ?

  2. By djm@ ( on

    there are some magic #defines that you can set to make off_t 64 bit on Linux, autoconf takes care of them if include the AC_SYS_LARGEFILE test

    1. By David Gwynne ( on

      I found them when poking around. The specific define in this case is _FILE_OFFSET_BITS which has to be set to 64 to get the bigger type. If portability is your aim is it better to assume that off_t is really a long long on all platforms (or can be made to be long long through some crazy define), or would you cast it when passing it to printf just to be sure?

      1. By tedu ( on

        when in doubt passing something to a varargs function (or if there's no prototype, but that's fixable), use a cast.

  3. By David P. ( on

    Could you post the diff? TIA.


Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]