In many languages, when you are unsure of a particular detail of the language, you can often “just run it” and see what happens. This might work in another language, but in C this will almost certainly bite you. It’s too easy to to accidentally invoke “undefined behavior”, where your code might do one thing in one case, but something totally different in another, without even getting a warning from the compiler.
Here are a few undefined behaviors you might not know about, along with the relevant section from the C99 spec. These aren’t just pedantic ramblings; they’re all cases that I’ve encountered on real projects out in the wild.
1. Integer Division by -1
Pretty much ever C programmer knows they should avoid dividing by zero. But there is another case where division is undefined: INT64_MIN / -1
on 64bit machines, and INT32_MIN / -1
on 32bit machines.
Give it a try! :)
#include
#include
int main(void)
{
/* Change these 64's to 32's if you're on a 32bit machine */
int64_t result = INT64_MIN / -1;
printf("The result: %l", result);
return 0;
}
On most implementations, this will result in the same kind of error/exception as divide by zero. But remember, this is not an “error” it is “undefined behavior”! The runtime is just being polite when it throws the error. It really could do anything it wanted (return 0, exit silently, scream, make make demons fly out of your nose) and still be fully compliant with the C spec.
When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded.105) If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a; otherwise, the behavior of both a/b and a%b is
undefined.– Section 6.5.5.6 – Multiplicative operators
2. Upcasting Pointers
Casting a void *
or uint8_t *
to a uint32_t *
or to a struct some_big_struct *
is undefined.
Actually its only undefined if your void *
or uint8_t *
doesn’t have a stronger alignment than required for a uint32_t *
. In this case, that would mean they would have to be divisible by 4.
Even though these casts are undefined, most C compilers will let you get away with them for most cases. But in certain cases at higher optimization levels, you’ll probably start seeing crashes. And they’ll be weird things, like “that function works great on even elements of an array, but crashes on the odd ones.”
What happens is that, because casting from a pointer of weaker alignment is undefined, the compiler will just trust that we are not doing that and use an instruction that is much faster, but requires stronger alignment. And then when you don’t pass in a properly aligned pointer, the CPU itself will throw an exception, and your program will probably crash. (Again this is all “undefined,” but this is just what happens in common implementations.)
In gcc and clang, there’s a command line option that will help point these types of errors out: -Wcast-align
. It’s not included as part of -Wall
or -Wextra
.
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
– Section 6.3.2.3.7 – Pointers
Note, that you don’t even have to dereference the pointer to stumble into undefined behavior. The actual conversion is undefined.
3. Using Uninitialized Variables
The usual assumption is that it’s only the value from an uninitialized variable that’s undefined. But actually just using the value from an uninitialized variable is undefined.
For example, given something like this:
#include
#include
int main(void)
{
bool var;
if (var)
{
fputs("var is true!\n")
}
if (!var)
{
fputs("var is false!\n")
}
return 0;
}
On some compilers on some optimization levels, you can get the output:
var is true!
var is false!
There is an excellent breakdown of why you might get this sort of behavior here
Except when it is the operand of the sizeof operator, the _Alignof operator, the unary & operator, the ++ operator, the — operator, or the left operand of the . operator or an assignment operator, *an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue)*; this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; additionally, if the lvalue has atomic type, the value has the non-atomic version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been
declared with the register storage class (never had its address taken), *and that object is uninitialized (not declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined*.– Section 6.3.2.1.2 – Lvalues, arrays, and function designators
4. Dereferenceing a Null Pointer
People don’t usually think of dereferencing a null pointer as undefined behavior. They usually think of it as “causes a crash”. This is not always the case. For example, on my current project, if I dereference a null pointer, I just get the value stored in address 0. You don’t realize how awesome segfaults are until you work on a system that doesn’t have them.
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
– Section 6.5.3.2.4 – Address and indirection operators
Hi,
In division sample, does that happen only when that MIN macro is used?
How to avoid that problem?
Thanks for article!
It would be interesting to see more similar posts about UB!
Nope, it will happen whenever you have the most negative number representable in the machine integer and divide it by -1. So -2147483648/-1 for 32bit systems and -9223372036854775808/-1 for 64bit systems (at least for x86). In general signed integer overflow is undefined in C. But it’s a little quirky that you get a hardware exception (and a subsequent crash) in this particular case. Especially since you _don’t_ get an exception/crash if you _multiply_ by -1, e.g.: -2147483648 * -1 , which you’d normally think would give the same result.
Unfortunately, in C, pretty much the only way to avoid this case is to check for it, just like you’d do checks to prevent division by zero.
Nice article! The evolution of indeterminate values in C is fascinating, the exact cases where it was undefined behavior changed a lot between C90 and C11 see “Reading indeterminate contents might as well be undefined”: http://blog.frama-c.com/index.php?post/2013/03/13/indeterminate-undefined for the background.
The waters are also muddied a bit by defect report 451 which bring up the new concept of “wobbly values”: http://www.open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm
It is also scary that even after the very public openssl random number via uninitialized values bug: https://www.schneier.com/blog/archives/2008/05/random_number_b.html people still think this is a viable method for generating entropy: http://stackoverflow.com/a/31746063/1708801