Common C Preprocessor Mistakes

The C Preprocessor can be a source of many subtle bugs. It knows little about C beyond the lexical level: in particular, it’s unaware of types and scoping rules. Also, code generated with the preprocessor is often too verbose to easily read, and can hide dangerous edge cases.

Macros are expanded to one long line, losing important visual clues. Proper formatting can be reintroduced after preprocessing, though: multi-line macros formatted with fixed line widths can be re-wrapped by `fmt` (or similar tools) for reading. Particularly hairy macros can be checked in isolation by creating a file with the macro’s #include and a few representative uses, then running `cpp` directly on the file. (This is the closest thing C has to Common Lisp’s macroexpand-1.) This is another strong argument for keeping CPP macros in their own header files, as mentioned in John Van Enk’s preprocessor post.


For example,

/* macro.h */
#define GREATEST_ASSERT_EQm(MSG, EXP, GOT)                              \
    do {                                                                \
        greatest_info.msg = MSG;                                        \
        greatest_info.fail_file = __FILE__;                             \
        greatest_info.fail_line = __LINE__;                             \
        if ((EXP) != (GOT)) {                                           \
            GREATEST_CALL_TEARDOWN();                                   \
            return -1;                                                  \
        }                                                               \
        greatest_info.msg = NULL;                                       \
    } while (0)

/* use.c */
#include "macro.h"
TEST test_example() {
    GREATEST_ASSERT_EQm("example message", x + 1, 10);

    GREATEST_ASSERT_EQm("example message", x*x, y*y);
}

/* To expand the macro and wrap lines at 70 characters:
       $ cpp use.c | fmt -w 70
*/

Macro invocations can be mistaken for function calls, unless style rules keep them distinct. One common convention is to write macro names and arguments in ALL_CAPS, highlighting potential scope violatations from token captures. Otherwise, macros with unintended name captures can lead to inscrutable error messages – replacing a field name in a struct with a macro expansion, for example.

Since C macros are expanded before parsing (rather than expanding directly into parse trees), they can distort nearby code. This can be avoided by wrapping every statement macro in `do { … } while (0)` blocks, so they will parse as a single expression and not disrupt the structure of neighboring `if` expressions. (This also creates a nested scope for variables, which prevents macro-internal variables from leaking into the surrounding scope and eases pressure on the compiler.) Macros with operators should be wrapped in parentheses, so their meaning won’t be changed by the precedence of adjacent operators.

#define CONTRIVED_IF_EXAMPLE(F, COUNTER) F(COUNTER); COUNTER++;
if (some_condition) CONTRIVED_IF_EXAMPLE(func, x);

/* This will expand to: */
if (some_condition) func(x); x++;
/* Note that the `x++;` statement will always evaluate - it is outside the `if`. Defining the macro with braces will fix the issue.  */
#define FIXED_IF_EXAMPLE(F, COUNTER) { F(COUNTER); COUNTER++; }

/* Similarly, compare */
#define CONTRIVED_EXAMPLE(X, Y) X + Y
uint32_t val = 3*CONTRIVED_EXAMPLE(12, 345); /* */
/* to */
#define CONTRIVED_EXAMPLE(X, Y) (X + Y)
uint32_t val = 3*CONTRIVED_EXAMPLE(12, 345); /* */

It’s good style to avoid trailing semicolons in statement macros, too – they should be required where the macro used. Statements without semicolons throw off many C tools, to say nothing of annoying other developers.

It’s cleaner to follow macros that are going to be used at the outermost scope with a semicolon, but a top-level semicolon is only legal in a few contexts. If the macro defines any variables, structs, unions, or typedefs, the last one will need to be followed by a semicolon. If the macro expansion only defines functions, then it can be followed by an unused struct declaration (e.g. “struct PACKAGE_NAME__EAT_TRAILING_SEMICOLON”) as a workaround.

Finally, macros that return for the surrounding function (or break out of loops or switch statements) need to be used with caution, since they break scope / control flow expectations. They should probably be avoided, except in specific cases where their behavior will be obvious (such as in error-handling macros).