OpenBSD Journal

Developer Blog: ray: Buffer Overflows and String Truncation

Contributed by marco on from the ulrich-drepper-likes-memcpy dept.

Today ray@ will be talking about proper string buffer handling with strlcpy(3), strlcat(3), and snprintf(3). Many people now know to avoid the unbounded string functions (strcpy(3), strcat(3), and sprintf(3)), which are prone to buffer overflows:

	char newstr[9];
	const char *oldstr = "rm -rf /home/ray/tmp/*";

	/* Buffer overflow in all cases. */
	strcpy(newstr, oldstr);
	strcat(newstr, oldstr);
	sprintf(newstr, oldstr);

The bounded string functions take a buffer size argument and never write past that boundary, preventing buffer overflows. For arrays, the buffer size can be calculated with the sizeof operator. For malloc(3) allocated buffers, the size parameter given to malloc(3) can be used as the buffer size. The following examples use the sizeof idiom, but each sizeof instance can be replaced with the buffer size.

Note: the sizeof operator does not work with arrays passed as function arguments:

	char real_array[BUFSIZ];
	void
	function(char array[], char array2[BUFSIZ], chat *ptr)
	{
		/* sizeof(real_array) != sizeof(array) */
		/* sizeof(real_array) != sizeof(array2) */
		/* sizeof(real_array) != sizeof(ptr) */
	}
	int
	main(int argc, char *argv[])
	{
		function(real_array, real_array, real_array);
		return (0);
	}

In the example above, a pointer to the array is passed around, not the array. Thus the sizeof operator returns the pointer size instead of the array size.

The bounded variants of the previous three string functions are strlcpy(3), strlcat(3), and snprintf(3). These functions truncate the resulting string as necessary, preventing overflow:

	char newstr[9];
	const char *oldstr = "rm -rf /home/ray/tmp/*";

	/* Truncation in all cases. */
	strlcpy(newstr, oldstr, sizeof(newstr));
	strlcat(newstr, oldstr, sizeof(newstr));
	snprintf(newstr, sizeof(newstr), oldstr));

strncpy(3) and strncat(3) are also bounded functions, but they suffer from an unwieldy API. This is how to properly use strncpy(3) and strncat(3) to always create NUL-terminated strings and to never overflow:

	char newstr[9];
	const char *oldstr = "rm -rf /home/ray/tmp/*";

	/* Truncation in all cases. */
	strncpy(newstr, oldstr, sizeof(newstr) - 1);
	newstr[sizeof(newstr) - 1] = '\0';
	strncat(newstr, oldstr, sizeof(newstr) - 1 - strlen(newstr));

Compare the above to strlcpy(3) and strlcat(3). The strncpy(3) and strncat(3) functions are complicated, error-prone, and strongly discouraged.

Now buffer overflows are prevented, but a new problem arises: truncation. Undetected truncation can be deadly: do you really want to execute the truncated string produced above? To aid truncation detection, these functions return the resulting string length as if truncation did not occur. If this value is greater than or equal to the destination buffer size, truncation has occurred:

	char newstr[9];
	const char *oldstr = "rm -rf /home/ray/tmp/*";

	/* Detected truncation in all cases. */
	if (strlcpy(newstr, oldstr, sizeof(newstr)) >= sizeof(newstr))
		warnx("truncation");
	if (strlcat(newstr, oldstr, sizeof(newstr)) >= sizeof(newstr))
		warnx("truncation");
	if (snprintf(newstr, sizeof(newstr), oldstr) >= sizeof(newstr))
		warnx("truncation");

snprintf(3) comes from the printf(3) family and inherits all its quirks. Because of this inheritance snprintf(3) needs more than just truncation checks. For example, it interprets oldstr as a format string. Percentage signs (%) in the string can affect the result:

	char newstr[9];
	const char *oldstr = "100%off!!!";

	/* Format string error. */
	if (snprintf(newstr, sizeof(newstr), oldstr) >= sizeof(newstr))
		warnx("truncation");

To prevent this, always escape strings with %s. This rule applies to all printf(3) functions:

	char newstr[9];
	const char *oldstr = "100%off!!!";

	/* Format string error. */
	if (snprintf(newstr, sizeof(newstr), "%s", oldstr) >=
	    sizeof(newstr))
		warnx("truncation");

Additionally, the printf(3) family will return a negative value if there is an error and must be tested:

	int i;
	char newstr[9];
	const char *oldstr = "100%off!!!";

	/* Format string error. */
	i = snprintf(newstr, sizeof(newstr), "%s", oldstr);
	if (i < 0 || i >= sizeof(newstr))
		warnx("snprintf");

The above example demonstrates snprintf(3)'s last quirk: it takes a size_t for the buffer size but returns an int. This means that truncation detection involves comparing an int to a size_t. This can be avoided by using strlcpy(3) and strlcat(3), if your format string does nothing but string concatenation.

Here's a recap of correct string usage:

	int i;
	char newstr[9];
	const char *oldstr = "100%off!!!";

	/* Detect all errors. */
	if (strlcpy(newstr, oldstr, sizeof(newstr)) >= sizeof(newstr))
		warnx("truncation");
	if (strlcat(newstr, oldstr, sizeof(newstr)) >= sizeof(newstr))
		warnx("truncation");
	i = snprintf(newstr, sizeof(newstr), "%s", oldstr);
	if (i < 0 || i >= sizeof(newstr))
		warnx("snprintf");

I hope that this has been helpful, but before submitting any patches please be sure they are correct and solve actual problems. Thanks!

(Comments are closed)


Comments
  1. By Anonymous Coward (82.43.92.127) on

    Very informative, thank you. It is great to see some of the pitfalls and gotcha's pointed out to mere mortals like me who occasionally look at patches but are mystified as to why they fix a problem.

  2. By Niall O'Higgins (83.147.128.114) niallo@openbsd.org on

    Another thing worth mentioning, strncpy() is also slower than strlcpy() since it zero fills the destination buffer. This is especially significant in cases where the destination buffer is much larger than the source buffer.

    See Miller and de Raadt for more info.

    Comments
    1. By rmg (208.181.115.2) on

      Since these entries aren't really for people who already know this stuff, it's also worth mentioning that regardless of which is better, use of strncpy and friends is standard C.

      Use of strlcpy and friends is merely convention, and *BSD specific convention at that, I believe.

      Comments
      1. By Nate (65.95.124.5) on

        Not BSD-centric, it's mostly just glibc that ignores the OpenBSD created stuff. Solaris has it.

        Comments
        1. By rmg (208.181.115.2) on

          OK, I stand corrected. Thanks for the clarification.

          My point was really that they aren't part of the standard C library (talking standard here, not implementation).

          Comments
          1. By Anonymous Coward (70.27.15.123) on

            Neither is strdup, who cares?

            Comments
            1. By m0rf (68.104.1.58) on

              strdup has been part of the single unix specification since at leat unix98 as an extension to the ISO C standard.

              Comments
              1. By Anonymous Coward (71.255.98.251) on

                Yeah? So? The single Unix specification is basically POSIX for people who like paying the X/Open Group a horrendous chunk of money to call their OS "Unix".

            2. By rmg (208.181.115.2) on

              The OpenSSH team probably cares, given that they release a separate "portable" version for the unwashed masses.

              Comments
              1. By Anonymous Coward (70.27.15.123) on

                No, they probably don't. Because its a tiny little function they can just include with their software like everyone else.

                Comments
                1. By rmg (208.181.115.2) on

                  That sounds like caring to me. They cared enough about the non-portability to do something about it.

                  Comments
                  1. By Anonymous Coward (70.27.15.123) on

                    Sounds like you are an idiot to me. IT IS PORTABLE. IT WORKS EVERYWHERE. Of course they include their own code with their own code, you aren't even making sense.

                    Comments
                    1. By rmg (208.181.115.2) on

                      so including a superior alternative to strn* because the superior version isn't standard means they don't care about portability?

                      if they didn't care, they would not have released the portable version, and the software would be non-portable. but they did, so...?

                      my comments make a lot more sense if you don't assume they are an attack.

                      Comments
                      1. By Anonymous Coward (66.11.66.41) on

                        Are you really this dumb? Read the thread again. You said "its not standard", I said "who cares" and you said "the openssh devs". No, they do not care that its not standard, they use it despite it not being standard, just like everyone else with a clue does.

          2. By Nate (65.95.124.5) on

            Yeah, it's all Ulrich Drepper's ego on that glibc thing. He resents things from the BSDs for some reason, it's like Bill Joy told him he was a dick once and ever since he's hated BSD.

            Comments
            1. By Anonymous Coward (67.64.89.177) on

              Drepper is THE worst thing that ever happened to Linux. He is a moron and very detrimental. I have NO clue why RH is paying him actual money to be that useless and stupid. If you read his horseshit trick to deal with broken strings you can go piss your pants laughing how subtely broken his ideas are. Oh yeah, lets kill ALL arches minus x86!!

              Heil Drepper!

              Comments
              1. By Anonymous Coward (66.39.191.42) on

                Does that mean you will subtly piss your pants? Maybe it won't be very noticeable....The piss will just slowly pool up in your shoes?

      2. By m0rf (68.104.1.58) on

        yes, as in not in gnu libc or in windows, you'll find it in the bsds, mac os x and solaris. linux users get to use memcpy/snprintf apparently, and their developers concerned about buffer overflows get threads like:
        http://lists.debian.org/debian-devel/2002/03/msg00295.html

        of course application developers can add the 3 files needed to get the functions on gnu/microsoft systems or anywhere else they aren't supported (like OpenSSH-portable does).

        Comments
        1. By Nony mouse Coward (128.171.90.200) on

          Is Ulrich Drepper suggesting that if you do proper string handling ( using memcpy and null terminating by hand, ) then you do not need functions that do the right thing ?

          The post following that suggests correct string handling does not mean checking return values, or that the poster hasn't looked at the API.

          None of the arguments they presented make any sense.

          Life sure is scary in GNU land.

          Comments
          1. By Anonymous Coward (67.64.89.177) on

            He is a fucktard. His shit is broke and he don't even know it.

          2. By Anonymous Coward (24.34.57.27) on

            Are you really quoting 6 year-old threads? Like, seriously? Please step into this year.

            Comments
            1. By Anonymous Coward (67.64.89.177) on

              It's always funny to point out Drepper stupidity. It's a classic.

            2. By Chris (80.176.91.102) chriswareham@chriswareham.demon.co.uk on www.chriswareham.demon.co.uk

              Yeah, that thread may be six years old, but Drepper still holds the same view. Having compared the implementation of various bits of glibc and the libc in OpenBSD and NetBSD, I can only agree that Drepper is not one of theLinux worlds greatest contributors.

          3. By Anonymous Coward (64.80.197.181) on

            All I can say is, "Wow."

            Thanks Ullrich. You sure make me glad I don't use Linux.

          4. By Anonymous Coward (128.171.90.200) on

            Apparently the glibc solution is

            *((char *) mempcpy (dst, src, n)) = '\0';

            Comments
            1. By veins (193.251.36.113) veins@skreel.org on http://lab.skreel.org/

              > Apparently the glibc solution is
              > *((char *) mempcpy (dst, src, n)) = '\0';
              >

              Well, I may be very tired, but this looks like the following code written in a more understandable way:

              {
              char *p;

              p = memcpy(dst, src, n);
              *P = '\0';
              }

              and since memcpy returns the address of dst, it looks like his example actually achieves this:

              {
              *dst = '\0';
              }

              So ... either it's me, or Drepper's one-liner example of how people should nul-terminate a string instead of using inefficient BSD crap, is broken.

              Comments
              1. By veins (193.251.36.113) veins@skreel.org on http://lab.skreel.org/

                < *P = '\0';
                > *p = '\0';

              2. By Sebastian (84.172.38.148) on

                I was wondering about the same issue. The line is not only ugly, it really is wrong. However, in order to be sure I had to test this in a small piece program. It spits out an empty string...
                int main()
                {
                    char dst[10];
                    char *src = "hallo";
                
                    *((char *) memcpy(dst, src, strlen(src)) = '\0';
                
                    puts(dst);
                    return 0;
                }
                
                In my opinion it already shows how badly broken this piece of code is (beside that it is wrong) if the meaning of the code does not become clear by reading it.

                Comments
                1. By Sebastian (84.172.38.148) on

                  Awesome! They mean mempcpy(dst, src, n) (note the additional 'p' just after 'mem'). This function is a GNU extension and indeed returns 'dst + n'. Jet another way to score some points in C code obfuscation contests...

                  Comments
                  1. By veins (193.251.36.113) veins@skreel.org on http://lab.skreel.org/

                    *sigh*

                    /me puts two fingers in throat out of disgust

                    Comments
                    1. By veins (193.251.36.113) veins@skreel.org on http://lab.skreel.org/

                      "Dammit, it is not safe. It hides bugs in programs. If a string is too long for an allocated memory block the copying must not simply silently stop. Instead the program must reallocate or signal an error. I can construct you cases where the use of these stupid functions is creating new security problem."

                      Anyways, just reading this makes it obvious he does not know what he is talking about, had he read the paper, he'd know how you can catch truncations.

                      Comments
                      1. By Anonymous Coward (128.171.90.200) on

                        as I said before, when did it become acceptable to not check return values ?

      3. By Anonymous Coward (70.27.15.123) on

        No, use of strlcpy/cat is not convention, its sanity. There is never, ever, in any circumstance, a reason to use strn* over strl*.

        Comments
        1. By rmg (208.181.115.2) on

          I agree with you guys, I was just pointing out to any newbies who are looking for information on how to write safe code (say, god forbid, they google and find this article), they might appreciate knowing that it's not part of the standard so they can't expect it to be 100% portable.

          And, to avoid being modded down again for not stating the obvious:
          - strl* are better than strn*
          - strl* is not part of standard C
          - strl* SHOULD be part of standard C
          - if you don't have strl* in your libc, add it to your project

          ...and I thought adding to the article was better than repeating it, sheesh..

          Comments
          1. By m0rf (68.104.1.58) on

            or they could add the files to their project needed to get the strl* functions, making the portability of the functions a non-issue.

            Comments
            1. By Anonymous Coward (210.233.106.4) on

              What part of "if you don't have strl* in your libc, add it to your project" did you not understand in the comment you replied to?

              Comments
              1. By Anonymous Coward (68.104.1.58) on

                must have been the persecution complex and the constant harping about portability that caused my reaction. sorry about that.

          2. By Couderc (212.234.204.97) on

            There is a work done by the WG14 working group on safe functions.

            They already provided a draft (N1135) which contains what should be the future ISO C implementations of those safe functions :
            "Programming languages, their environments and system software interfaces Specification for Safer, More Secure C Library Functions"

            From the scope section :
            "This Technical Report specifies a series of extensions of the programming language C, specified by International Standard ISO/IEC 9899:1999."

            I have already started to implement some of them just for fun :)

            Comments
            1. By Anonymous Coward (62.49.147.46) on

              The URL is (Ugh! PDF! What happened to good old text?): http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1135.pdf

            2. By Anonymous Coward (64.62.167.198) on

              This specification is next to useless. They take most of the functions that almost nobody uses or should use, and tweak them the slightest bit so your average programmer has almost no incentive to use them.

              And anybody who thinks that if/when these debut and become widely available ten years from now anybody will bother going over old code to do a simple search + replace (which is the only redeemably quality they have), is an idiot.

              strlcpy() and strlcat() have their problems, but they have unique mix of compatibility, convention, and sane (if not elegant) semantics.

              We need more interfaces like strlcpy()/strlcat(). gget_s() (from N1135) is too little, and all the over-the-top libraries which implement string "object" in C (bstr, postfix "records" etc), go way too far and are unwieldly.

              Comments
              1. By Couderc (212.234.204.97) on

                This specification is next to useless. They take most of the functions that almost nobody uses or should use, and tweak them the slightest bit so your average programmer has almost no incentive to use them.

                Maybe you're right, but where were you when those extensions were discussed by the working group ? By this time, I sent an alert to millert@ about glibc evangelists that were spreading their ideas there.

                And anybody who thinks that if/when these debut and become widely available ten years from now anybody will bother going over old code to do a simple search + replace (which is the only redeemably quality they have), is an idiot.

                Yes but future developpement will sure follow C standard for the sake of portability instead of functions that are not part of this standard.

                strlcpy() and strlcat() have their problems, but they have unique mix of compatibility, convention, and sane (if not elegant) semantics.

                You do not need to try to convert me, i am already since years. I use those functions in my projects but for this i have to provide a compatibility layer to use non standard functions.

                We need more interfaces like strlcpy()/strlcat(). gget_s() (from N1135) is too little, and all the over-the-top libraries which implement string "object" in C (bstr, postfix "records" etc), go way too far and are unwieldly.

                Yes but gget_s is part of future standard even if it does not please to you. There are other systems than BSD in real world you know, and even people who believe in portable code over unix systems.

        2. By Darren Tucker (203.217.17.96) dtucker@zip.com.au on

          Quoth: There is never, ever, in any circumstance, a reason to use strn* over strl*

          Actually that's not quite true, but if you know that then you should also know when :-)

          (Hint: for example, utmp and wtmp structs are nul-padded but not nul-terminated.)

          Comments
          1. By Anonymous Coward (66.11.66.41) on

            Structs are not strings. If you are just copying memory around, use memcpy.

            Comments
            1. By Anonymous Coward (128.151.92.52) on

              What he meant was that those aforementioned structs have char [] members in them that are not always null terminated.

            2. By Mr Strncpy (83.70.176.191) on

              Actually, I had a valid use for strncpy recently to copy a user supplied string (a temporary key) into a fixed size buffer that does not need to be null terminated. Strncpy pads the buffer with NULs and will not overflow but you can use the full buffer for the key if necessary.

              I agree in general though, strncpy is misnamed and in the new world order it should be renamed str2buf since after strncpy you are not guaranteed to end up with a string as I think by definition, a string is a nul terminated character array.

              char key[16];
              printf("key \"%.*s\"\n", sizeof(key), strncpy(key, optarg, sizeof(key)));

        3. By tedu (69.12.168.114) on

          unless you're copying "maybe" strings that aren't always nul terminated.

    2. By Anonymous Coward (71.140.143.217) on

      Another problem with strncpy and friends is if you have code like this:
      char buf[100];
      strncpy(buf, src, sizeof(buf));
      Note sizeof(buf) instead of sizeof(buf) - 1. If src is at least as long as buf, this means buf won't be properly NULL terminated because strncpy will use up all the bytes if needed. Contrast this to what fgets() and strlcpy() do. When told that the buffer size is n bytes, they both attempt to read up to n-1 bytes and properly NULL terminate the buffer.

      Comments
      1. By Anonymous Coward (71.140.143.217) on

        Also note that in the above, passing sizeof(buf) - 1 instead of sizeof(buf) to strncpy still doesn't fix the problem. You still need to NULL terminate the string by hand afterwards.

        Comments
        1. By Niall O'Higgins (193.1.172.166) niallo@openbsd.org on

          Exactly. strncpy and friends do not guarantee to NUL-terminate the destination buffer. This is yet another weird condition the programmer has to remember to handle, which makes it easier to introduce mistakes.

  3. By Anonymous Coward (70.74.75.200) on

    "ulrich-drepper-likes-memcpy dept"^^

    This reminds me of this caution about the mistakes made using sizeof.

    http://marc.theaimsgroup.com/?t=110859539500001&r=1&w=2

    Comments
    1. By Matthias Kilian (84.134.9.253) on

      I just hate this bug. Not detectable by common tools (gcc, lint, maybe Coverity?), not detected by test cases, but it *was* an overflow.

      Ugly. Disgusting. Frustrating.

  4. By Anonymous Coward (141.100.40.69) on

    Hi all, interesting topic. What do you guys think about the 'Secure C Library Functions' that recently formed the ISO WDTR 24731 draft. I am talking about strcpy_s and friends. It works like this:
    char buffer[4];
    strcpy_s(buffer, "1234", sizeof buffer); // buffer will be an empty string "".
    
    I would be nice to get some comments from you guys!

    Comments
    1. By Anonymous Coward (141.100.40.69) on

      Here is the link to the draft: link (pdf) ~200KB

    2. By tedu (69.12.168.114) on

      it looks like a lot of shit. wtf do i need an fopen_s function for? and ooo, look, scanf_s which is just as shitty as normal scanf. qsort_s?!?

      so some committee had to come up with _s[hitty] versions of every libc function? waste of time.

      Comments
      1. By Anonymous Coward (128.151.253.249) on

        You're right. It looks like a bunch of useless cruft.

        It appears the difference between fopen_s() and fopen() is that it checks for NULL pointers. But the document itself says that fopen() with NULL arguments is "undefined" behavior. Which means that a libc implementation of fopen() can do the NULL check and still be conforming.

        And what's with returning errno_t everywhere?

        Comments
        1. By Anonymous Coward (128.151.253.249) on

          Oh, okay, I see now another difference between fopen() and fopen_s(). The first arg is FILE** instead of FILE*, and it returns an errno code. Still, I have to ask: why?

          Comments
          1. By rmg (208.181.115.2) on

            Because errno is broken.

            Comments
            1. By Anonymous Coward (84.172.38.148) on

              Do you have any pointers that state whay errno is broken?

              Thanks for your reply!

              Comments
              1. By rmg (208.181.115.2) on

                sorry, can't think of any links off the to of my head, but my main reason for saying it was that errno, like any use of global variables, can lead to re-entrancy problems.

                here's the first hit from google (errno reentrance):
                http://docs.sun.com/app/docs/doc/805-5080/6j4q7emi5?a=view

                enjoy :-)

  5. By Anonymous Coward (74.62.215.125) on

    I read some of the stuff posted by Drepper and I just cannot believe it. He is an absolute ass, and shockingly incompetent and/or stubborn.

    strlcpy is a replacement for strNcpy (N! see it?). Not for strcpy. His claim that strlcpy will encourage people to make poor fixes for strcpy code is wrong, since glibc already allows even *worse* fixes by changing the code to use strncpy. And it is blatently obvious that in 100% of use cases strlcpy is better than strncpy.

    He also repeatedly proposes snprintf(dst,size,"%s",src) as a replacement that is somehow "superior". How is it superior, when it is EXACTLY the same in the results and return value, except for being about 10 times slower, and harder to read?

    I am completely flabbergasted that somebody with this level of cluenessess can get in this position of influence in gnu.

    PS: I am a Linux software developer and don't actually use BSD at all, but I can certainly tell when somebody is being an idiot.

    Comments
    1. By Anonymous Coward (76.234.103.166) on

      > He also repeatedly proposes snprintf(dst,size,"%s",src) as a replacement that is somehow "superior". How is it superior, when it is EXACTLY the same in the results and return value, except for being about 10 times slower, and harder to read?

      Even worse, some versions of snprintf()--glibc included I believe--can fail at run time (w/ -1, or SIZE_MAX, depending on how you handle it). Some snprintf's use dynamically allocated internal buffers, or allow dynamic type specifiers, either of which can make things blow up unexpectedly, no matter that your format string and arguments are perfectly valid.

      So, not only is snprintf() slower, it requires more error logic (in addition to Win32 work-arounds, if you're into that).

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]