Fuzz Testing with afl-fuzz (American Fuzzy Lop)

Article summary

Setup
Back to My Experiment

Last year’s wave of major network security vulnerabilities has kept adversarial testing on my mind. Security auditing tools can discover bugs which are missed during more general testing. In particular, my interest was piqued by American fuzzy lop, a fuzzer released by the Google security team, and I’ve been waiting for the right project to try it out.

I recently wrote a simple plotting tool, guff, that generates simple ASCII or SVG diagrams from command-line input data. After writing some tests for its basic functionality, I held off on trying to find edge cases–I wanted to see what afl-fuzz would find. Since my program works directly with untrusted user input, it has to be able to cope with anything. Any input that leads to strange behavior could be a bug waiting to happen.

Setup

All it takes to use afl-fuzz is a C program that takes input from standard input (or a file), and one or more example input files. (It can also be used on C++ and Objective C programs.) Then follow four simple steps:

1. Compile the program to be tested using afl’s wrapper for GCC or Clang.

$ env CC=afl-clang make
afl-clang -std=c99 -g -Wall -pedantic   -O3   -c -o args.o args.c
afl-cc 1.83b by <[email protected]>
afl-as 1.83b by <[email protected]>
[+] Instrumented 123 locations (64-bit, non-hardened mode, ratio 100%).
...

This builds a version of the program that tracks branching, so afl-fuzz can tell when jostling the input finds new paths through the program.

2. Make a directory for example input and copy it there.

$ mkdir -p afl/in
$ cp SOME_INPUT_FILES afl/in

3. Make an output directory.

$ mkdir -p afl/out

4. Finally, start `afl-fuzz` with the example input file directory, the output directory, and any other command line arguments for the program under test.

$ afl-fuzz -i afl/in -o afl/out guff -x -ly

After some messages during setup, its “hip, retro-style” UI should show up, and it will start searching:

This shows how long it’s been running, and how many unique code paths, crashes, and hangs (timeouts) it has been able to find so far. As it discovers unique failures, it writes them into crashes or hangs in its output directory. afl-fuzz uses genetic algorithms and various other heuristics to guide how it tweaks the input, and it uses input that reaches new paths as a starting point for future exploration.

Back to My Experiment

This was quite effective with my program. First, it discovered that points which were all on the same axis caused a divide-by-zero crash in my axis bounding code:

(I probably would have thought of that soon enough.) I added a special case so that if all the points had the same X or Y coordinate, the range would be set to (N, N + 1). Then I unleashed afl on my updated program.

It didn’t take afl long to break that:

-33333333333333333333333

-333e3333333333

I’m using a double for each coordinate, so if the point is far enough from the origin, floating point inaccuracy kicks in. As far as my program is concerned, (-333e3333333333 + 1) – -333e3333333333 equals 0, so it’s still dividing a range of 0 and crashing. I decided to filter any point that far out, since floating point math will treat them as positive or negative infinity anyway, but it was a real bug.

After finding a couple of other issues involving ill-formatted numbers such as “-,” it went quiet for several hours, then found bugs with a couple of really weird input data sets, such as:

0000000: 2833 2732 202d ea0a 2323 f4ff 300a 2d3f  (3'2 -..##..0.-?
0000010: ffe8 000a 9315 430a 3d8c fc30 0a19 202b  ......C.=..0.. +
0000020: e6e6 e600 0a19 112b 0a3d 237b 0a33 8c2f  .......+.=#{.3./
0000030: 8c18 310a 1980 8000 1831 0a19 f72d ce2d  ..1......1...-.-
0000040: 0030 0a2d 32af 1831 0a7f 7f00 300a 2d3f  .0.-2..1....0.-?
0000050: ffe8 000a 1915 2b0a 3d8c fc30 0a19 202b  ......+.=..0.. +
0000060: 0a2b 0a33 231e 520a d710 2f33 0a3d 237b  .+.3#.R.../3.=#{
0000070: 8002 0000 e6e6 e60a 2d3f ffe8 000a 3d23  ........-?....=#
0000080: 7b0a 338c 2f8c 0731 0a19 802b 0a2b 0a33  {.3./..1...+.+.3
0000090: 2318 310a 19f7 2d43 7f00 300a 2d32 af18  #.1...-C..0.-2..
00000a0: 310a 7f7f 0030 0a2d 3fff e800 0a19 152b  1....0.-?......+
...
0000b10: 202b 0a2b 0a33 2318 0a19 202b 0a2b 0a33   +.+.3#... +.+.3
0000b20: 2318 310a 1910 2f33 2264 300a 2d32 2318  #.1.../3"d0.-2#.
0000b30: 310a 2d3f 0a19 202b 0a2b 0a33 2318 310a  1.-?.. +.+.3#.1.
0000b40: 1910 2f33 2264 300a 2d32 af18 311a 7f7f  ../3"d0.-2..1...
0000b50: 0000 802d 0a19 202b 0a3d 2385            ...-.. +.=#.

This input goes on for quite a while, and it’s mostly binary garbage. After several minutes of mystified scrutiny, I determined that this was triggering a bug because it had more than 255 columns, and my code wasn’t consistent about whether the column index fit in 8 or 16 bits. Trying to plot 256 columns on a small ASCII art display is obviously a waste of time, and maybe nobody would ever intentionally try that to see what happens, but the code wasn’t actively enforcing my assumed upper limit. Pushing programs well beyond their design can lead to security vulnerabilities. It was a good find.

This isn’t a hypothetical issue. For such a new tool, afl-fuzz already has a pretty impressive list of bug trophies, including several in SQLite, which has remarkably thorough unit testing–971 times as much test code as production code.

Using afl was also a very different experience from using property-based testing tools such as QuickCheck or theft. With those, I would specify properties that should always hold, such as “the endpoints chosen for the axis should always contain all points,” and let the tool try to find counter-examples. While this requires more upfront thought about the problem domain than just letting afl-fuzz bring the noise, it can also use that information for “shrinking”: rather than turning in a bug report that looks like it came from an intoxicated fax machine, it can then search for simpler ways to reproduce the bug. With shrinking, the bug report might have looked like:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 (256 times)

which would have made that bug’s root cause clear.

These tools are complementary–while I will continue to use property-based testing, I won’t hesitate to bring in afl-fuzz afterwards to search for really weird bugs, especially in code that has to safely handle untrusted input.