We're hiring!

We're actively seeking developers for our new Detroit location. Learn more

heatshrink: An Embedded Data Compression Library

DIY TiVo IR Blaster

In embedded systems, space is always tight. Adding pennies of extra storage can be enough to kill a budget (when multiplied by hundreds of thousands or even millions of units), so available space has to be used effectively.

For my current project, I need to fit a lot of data into 16 MB of flash: a backup of the firmware, 9 MB (!) of graphical assets (including bitmapped Chinese fonts with over 30,000 characters), and other configuration data for the unit. On top of this, I need space for a second version of everything during updates. There’s no way this could fit as-is.

While data compression has obvious benefits for embedded systems, I was surprised by the lack of suitable libraries. While there are well-known open-source libraries such as zlib, I found none that could work in my resource-constrained environment. Existing options were too large for my program ROM (zlib), GPL’d (lzo), required too much working memory (zlib, again), required an actual filesystem, or hijacked control flow by blocking and doing everything at once (lzjb).

heatshrink for Embedded Data Compression

I decided to write my own, heatshrink. It’s released under a BSD license, it can run in very small amounts of memory (under 50 bytes for practical decompression), and it can work in small increments while addressing the other needs of a hard real-time system. (For my project, it reduces the graphical assets from 9 MB to about 6.5 MB, so everything can fit.)

The Basics

I chose to base it on LZSS because it needs little working space beyond a cache of the last 2^N bytes of data (where N is configurable), and decompression is quite simple. The interface is also pretty simple: dump data in, keep turning the crank until no more data comes out, dump more in, repeat. Notify the encoder/decoder when the end of input is reached, and keep cranking until it’s finished. It can suspend on individual bit I/O boundaries as necessary, so by adjusting buffer sizes, work can be done in configurably-brief intervals.

Size Requirements

Its buffer size parameter allows an explicit trade-off between compression effectiveness and working memory. Beyond the space used for IO buffering (and it can work a single byte at a time, if necessary), its space requirements are modest:

Encoding:

16 + 2 * 2^N bytes, plus more for an optional search index (currently an additional 2 * 2^N bytes) to speed up encoding.

Decoding:

16 + 2^N bytes to decode, where N matches the setting used when encoding.

Results

It also compares reasonably to libraries such as zlib/gzip. For example, here are the results for the Canterbury Corpus, a set of documents used to compare effectiveness of compression algorithms.

gzip HS 13,4 HS 11,4 HS 8,4 HS 6,3 HS 5,3
Memory Used > 128 KB 8,208 B 2,064
B
272 B 80 B 48 B
Document
alice29.txt 64.2 % 54.90 % 48.02 % 26.50 % 5.82 % -2.79 %
asyoulik.txt 60.9 % 50.51 % 44.23 % 24.87 % 5.63 % -3.34 %
cp.html 67.6 % 57.63 % 52.03 % 35.67 % -2.26 % -4.68 %
fields.c 72.1 % 64.68 % 63.36 % 49.52 % 26.65 % 10.98 %
grammar.lsp 67.5 % 56.87 % 59.61 % 52.43 % 32.71 % 14.92 %
kennedy.xls 79.9 % 71.65 % 73.10 % 74.57 % 68.89 % 69.96 %
lcet10.txt 66.1 % 56.53 % 48.55 % 25.40 % 5.21 % -2.13 %
plrabn12.txt 59.5 % 48.57 % 41.37 % 21.65 % 6.00 % -5.61 %
ptt5 89.0 % 77.74 % 78.79 % 79.16 % 70.84 % 71.88 %
xargs.1 59.3 % 47.76 % 48.76 % 30.42 % 11.00 % 1.70 %
everything 73.8 % 64.40 % 61.91 % 53.23 % 41.27 % 37.71 %

“HS 11,4″ means heatshrink was run with a window size of 2^11 and a lookahead of 2^4 bits (command line arguments of “-w 11 -l 4″), and the percentages represent size reduction (more is better). HS 8,4 is a reasonable default for embedded systems. If more memory is available, a larger window size allows better compression, but even in a severely constrained environment, HS 5,3 is still effective on certain kinds of data (such as bitmapped images or sparse sensor data). The memory used to decompress does not include the decoder’s input buffer size, but that can be as small as 1 byte.

When compiled for an Arduino microcontroller by avr-gcc, the decoder only needs about 1 KB of program ROM space. (I’m working on getting it smaller, and this will vary from platform to platform. On my project’s hardware platform, it’s under 800 bytes.)

You can download heatshrink at our github page.

Scott Vokes (19 Posts)

Finite domain constraint solver.

This entry was posted in Embedded Systems and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.

13 Comments

  1. Fabian
    Posted March 14, 2013 at 9:09 am

    Very nice project. (At work i had to do a similar compression library unfortunately i’m not allowed to open-source it…)

    Have you thought about also adding a (optional?) huffman coding? With our data it didn’t need so much RAM and we were able to increase the compression rate. (In our tests we had better compression rate/RAM consumptions with a combination of two algorithms)

    • Scott Vokes
      Posted March 14, 2013 at 9:45 am

      Thanks!

      I thought about including huffman coding (or using an LZ78 variant instead of LZSS), but reducing RAM, program ROM, and blocking time for individual operations is higher priority for heatshrink than absolute compression effectiveness. Any change needs to justify the added program ROM space, particularly in the decoder. I will likely try huffman coding, arithmetic coding, etc. later, but they will also add to the state that needs to be tracked as (de)compression is suspended and resumed. LZSS (on its own) seems to be in a sweet spot for compression effectiveness vs. program size and working memory. I’d love to hear more details about your project, though.

      Currently, I’m focusing on smarter indexing to speed up the encoder. I have tried a couple approaches, but nothing has improved on my one-pass, single-byte flattened linked list indexing so far without taking quite a bit more space.

      • Trent R
        Posted March 14, 2013 at 2:47 pm

        If you want to use Huffman, my advise would be to pass in the tree and not the frequency in the header. You only pass in the frequency so that you can reconstruct the tree from the encodded file. When I recently did Huffman, I found you save signficant space by only dealing with the tree. This becomes very clear when you are dealing with small files.

  2. Fabian
    Posted March 14, 2013 at 10:04 am

    We had to compress mostly text which would be displayed on a 3×16 char display. So the big part was text with a very limited amount of different character (around 80 possible characters). Mostly we would encode the data beforehand on a computer (during build) and on the device it would be decoded.

    For one of our Dataset i needed 117 Bytes of RAM for the huffmantree (we have a fix huffmantree which is added to the encoded data) and 80Bytes to save the tree with the encoded Data.

    But if your Dataset uses all possible values the Huffmantree will be much bigger => more RAM, ROM consumption.

    • Scott Vokes
      Posted March 14, 2013 at 10:21 am

      Indeed. I’m assuming general binary data, so the space will probably be prohibitive.

      117 bytes is pretty good, too. :) Too bad you couldn’t open-source it.

  3. Fabian
    Posted March 14, 2013 at 10:41 am

    It would probably be not so universal usable code because our system has very strange limitations. (old and strange CPU with an limitated Compiler and a lot of rules)

    The tree was saved as a stream of Entries.
    The first bit of each Entry was used to show the type. (Node or Leaf).
    A Leaf has only 1 Byte of Data.
    A Node has 2 Numbers (left and right) with the offset in Entries to the next Node.

    So we had a slow (Needed to go over the Huffman Data structure for each Symbol) but small (RAM size) implementation.

  4. Bryce Schober
    Posted March 14, 2013 at 2:21 pm

    Did you check out the lz4 compressor? It’s pretty low-level and blazingly fast. I’m not sure how memory-consumptive it is (the source suggests that I ran it w/ 16kB), but it seems more comparable to gzip in compression performance than yours… It also has some interesting repetitive-block compression characteristics that have to be seen to be believed. In my own experimentation, a CSV file that was originally compressed to ~50% of original got down to ~24% of original by adding characters to achieve fixed-width fields.

    • Scott Vokes
      Posted March 14, 2013 at 2:49 pm

      Unfortunately, 16kB is probably >15kB too much in this case.

      The LZ4 page says, “It trades CPU for compression ratio.” I’m instead trading compression ratio for low memory usage and adding little program space. Very different trade-offs, but heatshrink is designed primarily for microcontrollers (e.g. Arduinos), not typical computers. They mention LZ4′s indexing “needing only 4K”, but my optional index can get by with 512 bytes in typical use. The whole micro may only have 4K RAM. LZ4 sounds very interesting, though, and the throughput is impressive. It just doesn’t fit my use case.

      I experimented with an explicit literal length, too (rather than a 1-bit tag per literal byte), but once the sliding window was full, it was usually a wash. It wasn’t worth the extra complexity / compiled code space.

      (Also, the patent situation with LZSS is clearly safe; newer stuff is usually unclear.)

  5. Posted March 14, 2013 at 4:03 pm

    This is great, thank you for posting it!

    If you’re sending updates, do you think differential compression could be feasible at this scale, or is that a ridiculous idea?

    • Scott Vokes
      Posted March 14, 2013 at 4:13 pm

      Thanks.

      I’ve thought about that problem quite a bit. My tangram could potentially be adapted to run in an embedded context (the working memory is fairly small, aside from a buffer for the current content chunk), as long as it has sufficient external storage. Any approach will need that anyway, though.

  6. Daniel
    Posted May 10, 2013 at 8:11 am

    Great project, I was happy I found it.
    Unfortunately have run into an issue: have you tried the symmetry of the command line tool? I compiled heatshrink.c with gcc and tried the resulting exe file via

    $ heatshrink -e -v test.bin test.bin.z
    test.bin 32.01 % 100013 -> 68001 (-w 11 -l 4)

    and then

    $ heatshrink -d -v test.bin.z test.bin.unz
    test.bin.z -47.40 % 68001 -> 100235 (-w 11 -l 4)

    As you can already see from the output, the file sizes are different. When diffing the original and the de-en-compressed file, it is also obvious that they are significantly different, so it’s not just some trailing bytes or so.

    • Scott Vokes
      Posted May 10, 2013 at 8:25 am

      Are you able to send me the file, or otherwise give me a detailed description how to reproduce the issue? My contact info is on my github page.

    • Scott Vokes
      Posted May 12, 2013 at 9:15 pm

      With private communication, I was able to quickly reproduce the issue, and it should be fixed now. (Edit: It’s fixed.)