While unit and integration testing can verify that code correctly handles known edge cases, randomized testing can detect the edge cases themselves. The programmer describes their assumptions about a function’s output and how to randomly generate representative input, then lets the test runner search for counterexamples. This is particularly useful when checking algorithms’ invariants, and when auditing software for security vulnerabilities (commonly called fuzz testing).

One of the best known implementations of randomized testing is QuickCheck, part of the Haskell standard library. It uses the Arbitrary typeclass as a hook to specify how specific data types can be generated, matches those with inferred argument types, and searches for bugs. There’s nothing specific to Haskell about the underlying idea, however — clones exist for Erlang and several other languages.

Lunatest, my testing library for Lua, also has functionality for randomized testing. Here is an example of it being used on an implementation of run-length encoding, a simple data compression algorithm that is particularly useful on embedded platforms (which usually do not have the space for libraries like zlib).

- I specified how to generate appropriate input (a sorted array of 8-bit integers, since my use case for the library involves compressing sorted, delta-encoded integer keys):
`-- Generate a random array of bytes, with some redundancy (via sorting). function gen_int_array(limit) limit = limit or 10000 local ri = lunatest.random_int local length = ri(1, limit) local ints = {} for i=1,length do -- keep them small, to increase duplication ints[i] = ri(0, 2^8 - 1) end -- group repeated values together table.sort(ints) return ints end`

- I added two basic unit tests for encoding and decoding known input:
`function test_basic_encoding_for_known_case() local ints = {1, 0, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4} local res = rle.encode(ints) local expected = {1, 0, 0, 0, 6, 2, 3, 3, 3, 4} for i=1,#expected do assert_equal(expected[i], res[i]) end end function test_basic_decoding_for_known_case() local packed = {1, 0, 0, 0, 6, 2, 3, 3, 3, 4} local expected = {1, 0, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4} local res = rle.decode(packed) for i=1,#expected do assert_equal(expected[i], res[i]) end end`

- I described four properties I expect to hold:
- Encoding and decoding should be lossless.
- Encoding should compress the data (with varying success).
- Encoding the same data multiple times should have rapidly diminishing returns, as almost all repeating values should be removed in the first pass.
- Encoded values should no longer have long runs of duplicate values.

`function test_encoding_and_decoding_should_be_lossless() assert_random({ name="encode_decode", always={ -- regressions: always test these seeds -- found test_encoding_N_zeroes's edge case 3735485075354, 2721442120304, 3956609256407, -- found test_encoding_two_zeroes's edge case 175063139413, 3771894109384, 1972533966070}, }, function () local ints = gen_int_array() local packed = rle.encode(ints) local unpacked = rle.decode(packed) local max = #ints > #unpacked and #ints or #unpacked for i=1,#ints do assert_true(unpacked[i] == ints[i]) end end) end function test_encoded_data_should_usually_decrease_in_size_or_stay_the_same() assert_random({ name="decrease" }, function () local ints = gen_int_array() local packed = rle.encode(ints) assert_true(#packed <= #ints) end) end function test_encoding_multiple_times_quickly_reach_a_fixpoint() assert_random({ name="diminshing_returns" }, function () local prev = gen_int_array() for count=1,5 do local cur = rle.encode(prev) if #cur >= #prev then return end prev = cur end fail("still shrinking after 5 cycles") end) end function test_encoded_values_should_not_have_long_runs_of_duplicate_values() assert_random({ name="decrease" }, function () local ints = gen_int_array() local packed = rle.encode(ints) local longest, cur_run, last = 0, 0, nil for i=1,#packed do if last and last == packed[i] then cur_run = cur_run + 1 else if cur_run > longest then longest = cur_run end cur_run = 0 end last = packed[i] end assert_true(longest < 3) end) end`

- I implemented my encode and decode functions. (These are the final versions.)
`-- Use literal zero for escape before compression data local esc = 0 -- Run-length-encode an array of ints. function encode(ints) local res, i, ct = {}, 1, 1 -- result, index, count while i <= #ints do local v = ints[i] while ints[i+1] == v do ct = ct + 1 i = i + 1 end -- if there's benefit in reducing a series if ct > 3 or (ct >= 2 and v == esc) then res[#res+1] = 0 res[#res+1] = ct res[#res+1] = v -- literal zeroes must be escaped elseif v == esc then for n=1,ct do res[#res+1] = esc res[#res+1] = esc end -- leave other content as-is else for n=1,ct do res[#res+1] = v end end ct = 1 i = i + 1 end return res end -- Decode an array of ints from encode(). function decode(ints) local res = {} local i = 1 while i <= #ints do local v = ints[i] -- check for escaped 0s and series markers after escapes if v == esc then if ints[i+1] == esc then res[#res+1] = esc -- escaped 0 literal i = i + 2 else local count, value = ints[i+1], ints[i+2] for n=1,count do res[#res+1] = value end i = i + 3 end else res[#res+1] = v i = i + 1 end end return res end`

- The property tests detected a couple edge cases:
- When I had exactly 2 literal zeroes, I wasn’t escaping them properly.
- I had an off-by-one error with content immediately following runs of zeroes. (Oops.)

`function test_encoding_two_zeroes() -- this should be {0, 2, 0}, not {0, 0, 0, 0}, to keep -- encoding from filling up with too many escapes. local ints = {0, 0} local res = rle.encode(ints) local expected = {0, 2, 0} for i=1,#expected do assert_equal(expected[i], res[i]) end end function test_encoding_000122() -- previously, the 1 after the 0s was dropped. local ints = {0, 0, 0, 1, 2, 2} local res = rle.encode(ints) local expected = {0, 3, 0, 1, 2, 2} for i=1,#expected do assert_equal(expected[i], res[i]) end end`

- Finally, I added those seeds to the lossless property test, identified the underlying issues, added regression tests to document them more clearly, and made the new tests pass.

Randomized testing is complementary to other automated tests: it can find edge cases in the space outside of the test coverage. While it’s important to keep potential edge cases in mind while developing, focused random stress can bring many unexpected bugs into the light.