Contributed by cloder on from the toolchain improvements dept.
After months of slacking since C2K5, I have been hacking a little bit. I have started to focus on lint, the C static analysis tool. Lint has a reputation for being very spammy, but we are going to improve it so that it can help us to find and fix serious bugs, some of which will be exploitable.
At CanSecWest 2005 and following at the 2005 OpenBSD hackathon, I had some conversations with Theo about the kinds of bugs that don't get enough attention. Theo mentioned how integer conversion bugs (signed/unsigned, pointer/integer, etc.) are not given enough attention by programmers. People think these bugs are merely portability issues, but actually they are dangerous and potentially exploitable. Over the last couple years, we have seen how integer overflows and signed/unsigned conversions are at least as dangerous as straightforward buffer overflows (the Sun RPC/XDR integer overflow bugs from 2002 and the OpenSSL integer overflow bug from 2003 are just two examples). We are going to be seeing more of these.
I agreed with Theo that while lint has a well-deserved reputation for producing tons of spam, it could be very useful for finding certain classes of bugs that haven't been getting enough attention. In particular, lint can be very good at pointing out type conversion mistakes that sometimes lead to exploitable bugs. Lint was initially written to check for non-portable things, which makes it ideal for the kinds of bugs we want to find. More on that later.
While many people are aware of OpenBSD advances in the areas of run-time safety and resistance to attack (ProPolice, stackgap, W^X, randomized mmap, randomized malloc, privsep, etc.), I think fewer are aware of the improvements OpenBSD has made on the toolchain side. These improvements include compiler features (static bounds checking in gcc with -Wbounded) and the replacement of frequently-misused, confusing API's (replacing strn* with strl*, replacing atoi/atol with strtonum, etc.)
My recent hacking on lint is a modest effort along the same lines. I want to get lint to the point where people feel like it's worth using. My first goal is to get lint to parse our tree on all platforms without syntax errors (which involves understanding C99 syntax and gcc-isms). This isn't that hard, because lint uses gcc's preprocessor before parsing, which means we can hide some of the more disgusting things from lint.
My second goal is to get lint to shut up about things that don't help us find bugs. This is a bit harder, but it's crucial. Lint's verbosity is the reason why virtually nobody uses lint. Things are improving on this front.
Once lint's signal:noise ratio reaches an acceptable level, we will hopefully see more people using it, and people will start realizing that lint is very good at certain things.
As I mentioned above, lint is great at finding type conversion bugs (e.g. signed/unsigned, pointer/integral, and pointer/pointer). The reason lint is very good at pointing these bugs out is because lint is very stupid. Lint does all of its checking during parsing, during the construction of the AST. It does not need much context to perform its checks -- it just needs to know the types of things. There is no attempt at value tracking, data flow analysis, or anything like that. We leave that stuff to other tools like splint (and this makes splint totally useless for non-annotated code, IMHO).
Lint simply encodes the rules for C type conversion. The ISO C type conversion rules are surprising (and dismaying) to even experienced C programmers. Unsigned operands often get "promoted" to signed types, regardless of types of the other operands. Sign extension happens in non-obvious places.
Some other notes, briefly:
- The goal is not to make our code lint-free. Not only is it impossible, it's also a waste of time.
- Sprinkling casts throughout the code usually makes things worse. Other open source projects (which I will not name) have gone this route. All you end up doing is hiding bad code from lint.
- Many people think that "gcc -Wall -Wsign-compare" will tell them about these things. Nothing could be further from the truth. Run gcc -Wall on some of the lint regression tests in regress/usr.bin/xlint and you might be surprised at how much bad code gcc will silently eat.
- While lint is getting better in userland, it still has a hard time in certain libs and an even harder time in the kernel. krw has started to use lint a little bit in the kernel and has been fixing some real issues in some drivers, but he's having to wade through hundreds of pages of warnings.
size_t n; n = snprintf(...);or:
n = read(...); write(..., n);Again, these are modest improvements. It's nothing revolutionary...the approach is to find and fix lots of instances of the same bugs over and over again. We fix things and we move on to the next thing.
(Comments are closed)