Over the last year, I’ve begun to use Clang for more of my day-to-day work. Often times, Clang isn’t available for the target I’m interested in, but I’ve begun to use it to compile my tests instead of GCC*. Here’s a few reasons why.
1. More Helpful Diagnostics Messages
Clang provides the best diagnostics messages I’ve seen on a C compiler. One of the ways it excels is by helping the user navigate through problems in macros.
Sorting through problems in preprocessor macros can be absolutely infuriating because of the difficulty in sorting out which macro the error came from or if it came from a macro expansion in the first place. Turns out, Clang has a way to handle that pretty nicely. Let’s use this snippet as an example:
#define PTR_OF(V) (V)
int main(int argc, char * argv[]) {
char someX;
char * x = PTR_OF(someX);
return 0;
}
There’s an error here: the macro PTR_OF
doesn’t actually do what it claims. When we compile this with GCC, one of the warnings we get is concerned with our initialization of x
.
warn.c: In function ‘main’:
warn.c:5: warning: initialization makes pointer from integer without a cast
This is all well and good, but it doesn’t give us information around how the macro was involved. If the macro were more complex, this could be a real issue. Here’s how Clang handles it:
warn.c:5:10: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'char'; take the address
with & [-Werror]
char * x = PTR_OF(someX);
^ ~~~~~~~~~~~~~
&( )
That’s quite a bit more helpful. In this message, we’ve been given the original syntax involved in the error including the non-expanded form of the macro! In addition, Clang is able to guess at the specific mistake and provide a suggested fix (using an &
to take the address of the char
).
2. A Better -Wconversion
I’ve lamented that C will coerce enumeration types to other inappropriate integer-like types without raising any issue. While there are definitely reasons to convert one enumeration value to an enumeration value of another type, they aren’t as common as the mistake of making this coercion accidentally.
I didn’t realize how often I was making this mistake until I started using the -Wextra
flag with Clang. This flag includes a warning message for conversions that seem a bit off to the compiler. In GCC, this does not include implicitly casting between enumeration values of different types, but in Clang, it does.
Here’s another comparison. Consider the following code:
#include
enum SomeEnum {
Some1 = 0, Some2 = 1,
};
enum AnotherEnum {
Another1 = 0, Another2 = 1,
};
int main(int argc, char * argv[]) {
(void)argc;
(void)argv;
// Mistake is here.
enum SomeEnum s = Another1;
switch(s) {
case Some1: puts("Some1"); break;
case Some2: puts("Some2"); break;
}
return 0;
}
In this example, I’ve made a mistake by using the wrong enumeration in the switch statement. Compiling this code with GCC doesn’t result in any warnings or errors:
$ gcc -Wall -Wextra -Werror warn.c -o warn
$ ./warn
Some1
Trying the same thing, again, with Clang, results in the following:
$ clang -Wall -Wextra -Werror warn.c -o warn
warn.c:16:25: error: implicit conversion from enumeration type
'enum AnotherEnum' to different enumeration type 'enum SomeEnum'
[-Werror,-Wconversion]
enum SomeEnum s = Another1;
~ ^~~~~~~~
1 error generated.
We can see here that Clang has correctly identified that I’ve improperly initialized a variable of type enum SomeEnum
with an enumeration value of type enum AnotherEnum
.
3. Compilation Speed
In general, Clang substantially outperforms GCC 4.0 in compilation time. There’s really not much else I can add that’s not covered by performance section of the Clang features page.
4. Run-time Analysis Tools
One thing I’ve always wished for was a warning when I’ve done something in my C code that will, according to the C standard, have undefined behvaior.
Turns out, some of these checks can be hard to perform at compile time. I think that’s why Clang provides -fsanitize=undefined
(and other sanitization flags). This flag builds in support to your executable to detect undefined behavior at runtime. When you run your test suite (and you have a test suite, right?), the test executables will raise warnings when undefined behavior is detected.
Here’s an example of a program that utilizes undefined behavior (signed-integer overflow):
#include
int main(int argc, char * argv[]) {
(void)argc;
(void)argv;
int x = 1;
while(x > 0)
{
x++;
}
printf("x is: %d\n", x);
return 0;
}
Compiling and running this program without the sanitizer results in the following output:
$ clang -Wall -Wextra -Werror warn.c -o warn
$ ./warn
x is: -2147483648
Rebuilding this again, but this time with the sanitizer results in this output:
$ clang -Wall -Wextra -Werror -fsanitize=undefined warn.c -o warn
$ ./warn
warn.c:11:10: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
x is: -2147483648
While running the program, Clang informs us that it’s encountered an integer overflow. Neat!
There are many more sanitization options supported by Clang; take a look.
Conclusion
This is by no means a complete list, but these are things that stand out to me. Interested in trying Clang? Check out the Clang home page to get started.
*Typically, I try to have a set of tests that run independent of the target hardware. This allows me to use any compiler supported by my host architecture instead of having to depend on the compiler we’re using to build software for our particular target.
I don’t have much (or rather, any) experience with LLVM or Clang. Is there much difference to the end compiled executable that Clang/LLVM produces rather than with GCC?
Also, with optimizations on are the runtime checks able to be as useful?
I know that there are differences in the resulting binary. I don’t know how substantial those are (yet).
I’ve also not yet noticed a case where optimization interferes with runtime checks, though that doesn’t mean it isn’t happening. :)
For other reasons, I often run with optimizations turned all the way on. At least for the overflow example in this blog post, `-O3` doesn’t have any adverse affects to the runtime check.
Ok, last Clang/LLVM question (well, probably). Does the resulting binary run interpreted or JIT’d on LLVM? Or is it compiled into assembly? I’m trying to understand the significance of the LLVM piece of Clang and what that implies.