5 Comments

Fighting Log Entropy – 4 Simple Rules

Screenshot-Merijn-Vogel

Screenshot / Merijn Vogel / CC BY 2.0

While certainly not the most elegant of debugging techniques, logging to a console is sometimes the most effective technique, and in many cases it’s your only lifeline. During development, sometimes a simple “I am here” print statement can be a life saver.

There are many ways to create a log—from a few characters pumped through a UART to recording log statements to a file on a disk, but I think most people can agree that a good application log can be a life saver and provide a good place to start when tracking down a difficult bug.

Letting Nature Take its Course

The art of logging is often overlooked. Everyone does it, but everyone does it differently—some log less, some log more. And no one really cares about how the log looks in the end. That is, until you are the poor soul that has to try to comprehend a 50,000 line text file that is full to the brim with worthless and outdated information.

If the log for your project is not monitored and regularly groomed, it can quickly get out of control and lose almost all of its value. With some careful thought and consideration, a log can be your greatest asset; but an ignored log can become your worst nightmare. The law of entropy can quickly turn your log into a jumbled mess of meaningless strings, after which no one will even dare to use it.

Fighting Log Entropy

There are few simple things you can do (and watch out for) to get the most out of your log.

1. Avoid side effects.

Depending on your target/platform, excessive logging can cause serious side effects. The performance of the system can become seriously impacted. Keep in mind that the act of printing a single log statement will consume both CPU and memory. If you are printing 1,000 lines per second, you can bet that system performance will be negatively impacted.

You should make sure that your system behaves “identically” (within reason) both with and without logging enabled. I have been on projects where we could not disable the logs anymore because the system behaved so differently without logging that it was essentially a totally different piece of software.  New bugs were exposed with logs disabled, and there wasn’t enough time to go back and fix them.

Also avoid situations where a log statement literally alters the program state. For example, printf(“i = %d, i++) will change the value of i. If someone down the road casually deletes this useless log statement (or if logs become globally disabled for the system), there will likely be a bug here.

2. Add useful data.

A log statement without data is barely better than no log statement at all (and sometimes worse). For example, a log statement that simply says “Unhandled event…” is probably not going to be useful after initial development, and will almost never be worth the time to print for end users. But if the log says “Unhandled event USER_LOGIN in state: IDLE”, it’s much more useful for tracking down a potential bug.

3. Keep it up to date.

An outdated/incorrect log statement is a severe offense. Make sure you don’t leave faulty or misleading log statements in your code. If your log is riddled with errors that aren’t actually errors, it leads to a certain numbness for the reader (even if you yourself are the reader), which can be detrimental in the long run.

Outdated or incorrect logs can also simply lead to false positives, making them literally a waste of time and money (and CPU and memory). So the next time you’re about to excitedly type git commit -m "Fixed a huge bug", make sure you go back and remove all ‘debug’ logs that you sprayed throughout your code. Neglecting to do so is only contributing to the effects of log entropy.

4. Make it look nice.

Poorly formatted logs are not only ugly, but they are hard to read. How would you like to read a book that uses different font every other line, or where 30% of the words are misspelled?

A consistent and well-formatted log makes debugging easier for both humans and machines, so readability should not be ignored!

Avoid logs that look like this:

[PROC1] Info: I am here!!!!! 
[PROC1] Warning: Reading file
[PROC2] Info: Processing input from user
[PROC2] Error: !!!!!!!!! Invalid input!!!!!!!!! 
[PROC1] Info: Entering function foo()…..
[PROC1] Info: Leaving function foo()….
[PROC4] Critical: ~~~~ ~~~~~~~~~~~~ DONE! ~~~~ ~~~~~~~~~~

Conclusion

I have been a part of several projects where the effects of sloppy and thoughtless logging became a huge, systematic issue, to the point that there was no way to get it back under control. Especially in the context of a large project with many developers, the log can quickly give in to entropic forces and thereby lose almost all of its value.

With some conscious effort to keep the log clean, accurate, and detailed, it’s quite easy to avoid these pitfalls and make your log your secret weapon.

What other logging guidelines do you live by?