Randomized Testing in Lua with Lunatest

While unit and integration testing can verify that code correctly handles known edge cases, randomized testing can detect the edge cases themselves. The programmer describes their assumptions about a function’s output and how to randomly generate representative input, then lets the test runner search for counterexamples. This is particularly useful when checking algorithms’ invariants, and when auditing software for security vulnerabilities (commonly called fuzz testing).

One of the best known implementations of randomized testing is QuickCheck, part of the Haskell standard library. It uses the Arbitrary typeclass as a hook to specify how specific data types can be generated, matches those with inferred argument types, and searches for bugs. There’s nothing specific to Haskell about the underlying idea, however — clones exist for Erlang and several other languages.

Lunatest, my testing library for Lua, also has functionality for randomized testing. Here is an example of it being used on an implementation of run-length encoding, a simple data compression algorithm that is particularly useful on embedded platforms (which usually do not have the space for libraries like zlib).

  1. I specified how to generate appropriate input (a sorted array of 8-bit integers, since my use case for the library involves compressing sorted, delta-encoded integer keys):

    -- Generate a random array of bytes, with some redundancy (via sorting).
    function gen_int_array(limit)
        limit = limit or 10000
        local ri = lunatest.random_int
        local length = ri(1, limit)
        local ints = {}
        for i=1,length do
            -- keep them small, to increase duplication
            ints[i] = ri(0, 2^8 - 1)
        end
        -- group repeated values together
        table.sort(ints)
        return ints
    end
    

  2. I added two basic unit tests for encoding and decoding known input:

    function test_basic_encoding_for_known_case()
        local ints = {1, 0, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4}
        local res = rle.encode(ints)
        local expected = {1, 0, 0, 0, 6, 2, 3, 3, 3, 4}
    
        for i=1,#expected do
            assert_equal(expected[i], res[i])
        end
    end
    
    function test_basic_decoding_for_known_case()
        local packed = {1, 0, 0, 0, 6, 2, 3, 3, 3, 4}
        local expected = {1, 0, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4}
        local res = rle.decode(packed)
        
        for i=1,#expected do
            assert_equal(expected[i], res[i])
        end
    end
    

  3. I described four properties I expect to hold:
    • Encoding and decoding should be lossless.
    • Encoding should compress the data (with varying success).
    • Encoding the same data multiple times should have rapidly diminishing returns, as almost all repeating values should be removed in the first pass.
    • Encoded values should no longer have long runs of duplicate values.

    function test_encoding_and_decoding_should_be_lossless()
        assert_random({ name="encode_decode",
                        always={ -- regressions: always test these seeds
                            -- found test_encoding_N_zeroes's edge case
                            3735485075354, 2721442120304, 3956609256407,
                            -- found test_encoding_two_zeroes's edge case
                            175063139413, 3771894109384, 1972533966070},
                    },
                      function ()
                          local ints = gen_int_array()
                          local packed = rle.encode(ints)
                          local unpacked = rle.decode(packed)
                          
                          local max = #ints > #unpacked and #ints or #unpacked
                          
                          for i=1,#ints do
                              assert_true(unpacked[i] == ints[i])
                          end
                      end)
    end
    
    function test_encoded_data_should_usually_decrease_in_size_or_stay_the_same()
        assert_random({ name="decrease" },
                      function ()
                          local ints = gen_int_array()
                          local packed = rle.encode(ints)
                          assert_true(#packed <= #ints)
                      end)
    end
    
    function test_encoding_multiple_times_quickly_reach_a_fixpoint()
        assert_random({ name="diminshing_returns" },
                      function ()
                          local prev = gen_int_array()
                          for count=1,5 do
                              local cur = rle.encode(prev)
                              if #cur >= #prev then return end
                              prev = cur
                          end
                          fail("still shrinking after 5 cycles")
                      end)
    end
    
    function test_encoded_values_should_not_have_long_runs_of_duplicate_values()
        assert_random({ name="decrease" },
                      function ()
                          local ints = gen_int_array()
                          local packed = rle.encode(ints)
                          local longest, cur_run, last = 0, 0, nil
                          for i=1,#packed do
                              if last and last == packed[i] then
                                  cur_run = cur_run + 1
                              else
                                  if cur_run > longest then
                                      longest = cur_run
                                  end
                                  cur_run = 0
                              end
                              last = packed[i]
                          end
                          assert_true(longest < 3)
                      end)
    end
    

  4. I implemented my encode and decode functions. (These are the final versions.)

    -- Use literal zero for escape before compression data
    local esc = 0
    
    -- Run-length-encode an array of ints.
    function encode(ints)
        local res, i, ct = {}, 1, 1  -- result, index, count
        while i <= #ints do
            local v = ints[i]
            while ints[i+1] == v do
                ct = ct + 1
                i = i + 1
            end
            
            -- if there's benefit in reducing a series
            if ct > 3 or (ct >= 2 and v == esc) then
                res[#res+1] = 0
                res[#res+1] = ct
                res[#res+1] = v
            -- literal zeroes must be escaped
            elseif v == esc then
                for n=1,ct do
                    res[#res+1] = esc
                    res[#res+1] = esc
                end
            -- leave other content as-is
            else
                for n=1,ct do
                    res[#res+1] = v
                end
            end
            ct = 1
            i = i + 1
        end
        return res
    end
    
    -- Decode an array of ints from encode().
    function decode(ints)
        local res = {}
        local i = 1
        while i <= #ints do
            local v = ints[i]
    
            -- check for escaped 0s and series markers after escapes
            if v == esc then
                if ints[i+1] == esc then
                    res[#res+1] = esc     -- escaped 0 literal
                    i = i + 2
                else
                    local count, value = ints[i+1], ints[i+2]
                    for n=1,count do res[#res+1] = value end
                    i = i + 3
                end
            else
                res[#res+1] = v
                i = i + 1
            end
        end
        return res
    end

  5. The property tests detected a couple edge cases:
    • When I had exactly 2 literal zeroes, I wasn’t escaping them properly.
    • I had an off-by-one error with content immediately following runs of zeroes. (Oops.)

    function test_encoding_two_zeroes()
        -- this should be {0, 2, 0}, not {0, 0, 0, 0}, to keep
        -- encoding from filling up with too many escapes. 
        local ints = {0, 0}
        local res = rle.encode(ints)
        local expected = {0, 2, 0}
    
        for i=1,#expected do
            assert_equal(expected[i], res[i])
        end
    end
    
    function test_encoding_000122()
        -- previously, the 1 after the 0s was dropped.
        local ints = {0, 0, 0, 1, 2, 2}
        local res = rle.encode(ints)
        local expected = {0, 3, 0, 1, 2, 2}
    
        for i=1,#expected do
            assert_equal(expected[i], res[i])
        end
    end
    

  6. Finally, I added those seeds to the lossless property test, identified the underlying issues, added regression tests to document them more clearly, and made the new tests pass.

Randomized testing is complementary to other automated tests: it can find edge cases in the space outside of the test coverage. While it’s important to keep potential edge cases in mind while developing, focused random stress can bring many unexpected bugs into the light.