Things About C that Make Me Say, “WHAT?!?”

It’s hard to work with C code on a daily basis without developing a love-hate relationship with it. I’ve been working with C for several years now, and most days I truly enjoy using it. There are other days, however, when it makes me want to pull my hair out.

I have found several aspects of C that really tripped me up or left me staring at the screen with an expression of disbelief on my face. I’d like to share a few of them with you so hopefully you can avoid them!  

Uninitialized Booleans

The specification for C states that using undefined variables can cause “undefined behavior.” In some case compiler’s seem to take that to an extreme. For example, what do you think would happen if you ran the code below?

bool i_love_c;
if(i_love_c) printf(“I love writing in C. ”);
if(!i_love_c) printf(“I hate writing in C.”);

Logically, it would seem like only one or the other could be true. But sure enough, the output is, “I love writing in C. I hate writing in C.” So, what happened? When the compiler NOTs a Boolean, it tries to do a logical negation, but it does so by taking a short cut and toggling only the least significant bit.

An initialized Boolean always has a value of either 1 or 0, which is fine under normal circumstances. However, if left uninitialized, a Boolean could be any value. For example, take the number 119. When the compiler performs !119, it just toggles the least-significant bit, resulting in the number 118. Now it makes sense: 118 and 119 are both non-zero values, so the if-statement passes for both of them.

But now let’s take this one step further. What if we change the code to look like this:

bool i_love_c;
if(i_love_c == true) printf(“I love writing in C. ”);
if(i_love_c == false) printf(“I hate writing in C.”);

Surely that can’t both pass this time… can they? Believe it or not they do. The output is again, “I love writing in C. I hate writing in C.” Unfortunately, this time I can’t explain it. The value true equals 1 and false equals 0, so how can anything equal both 1 and 0? This is what they mean when they say “undefined behavior.” The thing to keep in mind is, always always always initialize your variables.

Using sizeof() on Arrays

I love the sizeof() function, and I use it all the time, but you have to be really careful when using it on arrays. Just keep in mind that if the declaration of the array is not within the same scope as the sizeof() function call, it will not return the correct result. Let’s take a look. This time we have two functions:

void main(void)
{
  char name[] = "HELLO";
  printf("array size = %d\n", sizeof(name));
  print_array_size(name);
}
void print_array_size(char array[])
{
  printf("array size = %d\n", sizeof(array));
}

The size of the name array is 6 bytes: one for each letter in HELLO, and one for a null character. The call to sizeof() in the main function returns the correct value because it’s in the same scope as the declaration of the array. However, if we pass the array to another function, what happens?

"array size = 6"
"array size = 4"

Fail! Inside the print_array_size() function, the declaration of the name array is no longer in scope, therefore, sizeof() cannot determine the correct length of the array. Where does the value of 4 come from? It’s the size of a char * type on my 32-bit machine. To avoid this pitfall just remember that in C, when you pass an array to a function you are actually passing a pointer to the first element in the array. The sizeof() function cannot determine the length of an array when all it has is an address.

Initializing Character Arrays

C does not have a native string type. Instead, we are forced to use character arrays. Maybe it’s just because I’ve gotten used to it, but I really don’t mind. Still, you can get into trouble if you’re not careful. For example, do you see anything wrong with the code below?

char my_name[6] = “JORDAN”;
printf(“Hello my name is %s”, my_name);

The output produced on my machine is: “Hello my name is JORDANΓöÇ[EwL0@”. Wow, I’d like to hear you try to pronounce that! If it’s not immediately obvious, this happens because strings are expected to be null terminated in C. We declared an array that is 6 bytes long and filled it with 6 letters, but we didn’t leave room for the null terminator at the end, and the compiler doesn’t put one there. The %s format character in the printf function tells it to begin printing characters beginning at the start of my_name and stop when it reaches a zero. Because we didn’t successfully null-terminate our string, a list of garbage characters (the next values in memory following our name) are printed out until a zero is finally reached.

For the reason just stated, if we use the strlen() function on my_name array, it returns an incorrect value of 10. However, if we use the sizeof() function, we see the correct value, 6. To be safe, if you want to declare a character array and initialize it to a string, just use empty square brackets such as char my_name[] = “JORDAN”. In this case, the compiler will automatically allocate enough bytes for your string and the null terminator.

C also allows you to declare character arrays using char pointers and string literals. Doing so can also get you into a lot of trouble. Do you see anything wrong with the code below?

char * p_your_name = “JOHNNY”;
memcpy(p_your_name, “GEORGE”, 6);

The problem is that p_your_name is a pointer to a string literal. String literals are not intended to be modified. For example, when I try to run the above code on my Windows machine, the program crashes, and I get a pop-up-box that says, “The program stopped working…” Not desirable. This happens because the compiler on my computer stores the string literal in read-only memory. Many embedded devices may not support read-only memory and will allow you to do this without any problems. Just keep in mind that you are venturing into undefined behavior land, and you might get screwed in the end.

Initialization of Structures and Enumerations

To all you C-newbies out there, be very careful and thoughtful when it comes to initializing your structs. Remember that the data contained in your struct is important. If it wasn’t, it wouldn’t be there, so give it the common courtesy of a decent initialization! This is even more important when the struct contains an enum type. Consider the following code.

typedef enum {APPLE, BANANA} FRUIT_T;

typedef struct{
    FRUIT_T f;
    uint8_t fruit_count;
} FRUIT_BASKET_T;

void main(void)
{
    FRUIT_BASKET_T fruit_basket;
    switch (fruit_basket.f)
    {
        case APPLE:
            break;
        case BANANA:
            break;
        default:
            printf("Never get here. Crash 'N Burn.");
    }
}

What do you think happens when this code runs? It outputs: “Never get here. Crash ‘N Burn.” This happens because enums are actually represented as integers. If you leave an enum uninitialized, it can be equal to any integer value, which means there’s a very good chance that it will not be equal one of the correct values of your enumeration. Also, imagine if you explicitly define an enumeration so that the first value is equal to 1. If a structure that contains that enum type is initialized to all zeros, the same problem occurs.

What About You?

These are some of the unusual aspects of C that I have found. What about you? What parts of the language make you want to pull your hair out?

Conversation
  • Anonymous Coward says:

    Your comment on character array initialization is a blatant violation of the DRY principle which applies to mostly any single aspect of software engineering and aims at reducing information redundancy hence facilitating code maintenance: there is no reason to state the string length when it can be inferred by the compiler.

    • Jordan Schaenzle Jordan Schaenzle says:

      It sounds like your disagreeing with me but that’s exactly what I said, “To be safe, if you want to declare a character array and initialize it to a string, just use empty square brackets”.

  • Matthew Saltzman says:

    (1) Your compiler may do strange things with booleans, but my C compiler reports that bool is not a type (as indeed it is not in C, unless you #include in C99, in which case there is a macro defining it). bool is a type in C++.

    (2) There is no reason to expect any sensible behavior when doing things that are undefined, such as reading the value of an uninitialized variable. The bahavior you observe is no more surprising than any other. As you yourself conclude, always initialize. That’s really the lesson of most of your gotchas.

    (3) Your full program example should declare main() as “int main(void)” or “int main (int argc, char **argv)”, not as “void main”, and you should return 0 at the end of main’s body.

    • Jordan Schaenzle Jordan Schaenzle says:

      Matthew, Thanks for your feedback. I realize that bool is not actually a C type. I should have explicitly indicated that I was #include-ing stdbool.h. Also, I am sure that different C compilers will behave differently in a lot of the undefined situations. I was just trying to have a little fun and point out things to avoid.

      • Matthew Saltzman says:

        BTW, a good compiler will warn about uninitialized variables.

        I do like the string examples, though. Those really are legitimate gotchas.

      • anon.coder says:

        On my system:

        Using built-in specs.
        Target: i686-linux-gnu
        Configured with: ../src/configure -v –with-pkgversion=’Ubuntu/Linaro 4.4.7-1ubuntu2′ –with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs –enable-languages=c,c++,fortran,objc,obj-c++ –prefix=/usr –program-suffix=-4.4 –enable-shared –enable-linker-build-id –with-system-zlib –libexecdir=/usr/lib –without-included-gettext –enable-threads=posix –with-gxx-include-dir=/usr/include/c++/4.4 –libdir=/usr/lib –enable-nls –with-sysroot=/ –enable-clocale=gnu –enable-libstdcxx-debug –enable-objc-gc –enable-targets=all –disable-werror –with-arch-32=i686 –with-tune=generic –enable-checking=release –build=i686-linux-gnu –host=i686-linux-gnu –target=i686-linux-gnu
        Thread model: posix
        gcc version 4.4.7 (Ubuntu/Linaro 4.4.7-1ubuntu2)

        I always get “I hate writing in C” as the output.

        Which compiler are you using. Thanks for the post though, its pretty interesting.

  • Toby says:

    When the compiler performs !119, it just toggles the least-significant bit

    No. You are thinking of x ^ 1.

    !x is semantically equivalent to x == 0.

    • Jordan Schaenzle Jordan Schaenzle says:

      Toby, that is kinda true. !x is actually equal to logical negation of x. If the compiler does not do a proper logical negation than all bets are off. I am using GNU gcc and all it does is toggle the LSb. Even with all optimizations turned off. Give it a try!
      http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B

    • Scott Vokes Scott Vokes says:

      That is correct according to the C standard / Harbison & Steele. I have seen embedded platform compilers mess it up, though. gcc will also cut some corners that violate the standard at higher optimization levels.

      • Toby says:

        Leaving aside C99 bool (which is a different case entirely), gcc does not (& cannot) make the optimisation described for integer values. It would violate the semantics of the operator (i.e. correct programs would break).

        To evaluate !a, gcc effectively evaluates x != 0 in all cases, as it is required to do.

        g5:~ toby$ gcc -x c - -S -O0
        int f(int a){ return !a; }
        g5:~ toby$ cat -- -.s
        .section __TEXT,__text,regular,pure_instructions
        .section __TEXT,__picsymbolstub1,symbol_stubs,pure_instructions,32
        .machine ppc7400
        .text
        .align 2
        .globl _f
        _f:
        stmw r30,-8(r1)
        stwu r1,-48(r1)
        mr r30,r1
        stw r3,72(r30)
        lwz r0,72(r30)
        cmpwi cr7,r0,0
        mfcr r0
        rlwinm r0,r0,31,1
        mr r3,r0
        lwz r1,0(r1)
        lmw r30,-8(r1)
        blr
        .subsections_via_symbols
        g5:~ toby$ gcc -x c - -S -O2
        int f(int a){ return !a; }
        g5:~ toby$ cat -- -.s
        .section __TEXT,__text,regular,pure_instructions
        .section __TEXT,__picsymbolstub1,symbol_stubs,pure_instructions,32
        .machine ppc7400
        .text
        .align 2
        .p2align 4,,15
        .globl _f
        _f:
        subfic r0,r3,0
        adde r3,r0,r3
        blr
        .subsections_via_symbols

        If you can find a case where gcc evaluates a ^ 1 for !a, where a is an integer value wider than one bit, then you have found a compiler bug.

        • Toby says:

          Scott, at first I interpreted you to say that the bitflip was acceptable to the standard, otherwise consider my post a reply to Jordan’s :)

        • Toby says:

          Compare:

          g5:~ toby$ gcc -x c - -S -O2
          int f(int a){ return a ^ 1; }
          g5:~ toby$ cat -- -.s
          .section __TEXT,__text,regular,pure_instructions
          .section __TEXT,__picsymbolstub1,symbol_stubs,pure_instructions,32
          .machine ppc7400
          .text
          .align 2
          .p2align 4,,15
          .globl _f
          _f:
          xori r3,r3,1
          blr
          .subsections_via_symbols

  • Toby says:

    Jordan, I wouldn’t be surprised if that is correct behaviour for an intrinsic bool type (I am not familiar with C99 per se so I can’t say), but the compiler cannot compute ! by flipping LSB, for a wider integral C type.

  • Eran says:

    Note that sizeof() is not a function, but an operator and is evaluated by the compiler in compile time. That is why the scope affects its result.

    Apart from pitfalls that make you wanna pull your hair out, some unusual aspects of C sometimes involve “interesting” syntax abuse, for example:
    x-macros make me say “what?”
    Duff’s device makes me say “what?!”
    IOCCC makes me say “I don’t want to live on this planet anymore”

  • Sanjoy Das says:

    About the

    bool i_love_c;
    if(i_love_c == true) printf(“I love writing in C. ”);
    if(i_love_c == false) printf(“I hate writing in C.”);

    example.

    In this case i_love_c is unitialized and has an undefined value, and by reading it you invoke some sort of undefined behavior, I think. And in that case the compiler is free to do anything. :)

    For instance, AFAIK, clang/llvm will mark i_love_c as undefined, a special value which optimizers may resolve to have any arbitrary value _they_ want. For the first if it could assume i_love_c to be true and optimize away the branch and in the second it could assume i_love_c to be false and do the same.

  • David says:

    If you try to read from uninitialised data, you are invoking “undefined behaviour” and the compiler can do exactly as it wants. In your “i_love_c” case, the compiler take the first statement and think “you haven’t set “i_love_c”. Obviously you don’t care about the value, so I’ll pick what I like – let’s pick “true” here and skip the test”. For the next statement, it can pick “false”. It doesn’t have to do any calculations, or it can do any calculations it wants. It can print out both statements, or neither statement. It could write “garbage in, garbage out” then crash your computer. All are valid interpretations of this invalid source code.

    The trick to avoiding unexpected behaviour is to write correct, sensible and valid source code. You can come a long way by learning to use your compiler’s warnings – any half-decent compiler will warn you about this code if you allow it to.

    And as others have mentioned, the compiler will not normally negate a bool by toggling the LSB. It might do so if it can guarantee that the original value is either 0 or 1 – and it might do if it can see you have written invalid code. But in general, it will not do so – because any non-zero is considered “true” in C, and when you use ! on it you must get 0.

    I don’t know why you expect sizeof to give the same value when applied to two different items – one being an array, and the other being a pointer. It would be nice if C had a better array syntax, but that’s what we have.

    The same sort of thing is true of your other points – learn the basic C syntax, and write meaningful code, and you will get meaningful results. And learn to use your tools properly, so that they will catch silly mistakes (everyone makes them at times).

    There are lots of crazy things about C, lots of bad design decisions, and lots of cases where code can be valid C and look sensible, yet have unexpected behavior – but the examples here do not fall into that category.

  • ha nguyen says:

    Hi Jordan Schaenzle, sizeof() is function or operator?

  • Comments are closed.