Portable Ruby Bit-Twiddling with #unpack()

Karlin Fox and I have been playing around with using the MQTT protocol to distribute information about the availability of certain high-demand resources in our office (namely, the bathroom). For some time now we have had an Arduino-powered red/green light (nicknamed “PottyMon”) radiating occupancy information about the upstairs bathroom across our open workspace.

Most of us work upstairs in this open workspace, but we’ve grown, and for the first time, we now have some project teams working in another space downstairs. This change has made a good excuse to embark on an educational adventure to add network support to PottyMon.

MQTT with Arduino and Ruby

After some initial testing with HTTP polling, I decided that it would be much better to add a message broker and use a publish and subscribe pattern for distributing this information across the network. Thanks to the work of Nicholas O’Leary, publishing MQTT messages from the Arduino was not difficult. The next question was how other clients might consume these messages.

Thanks to the work of another Nicholas (Nicholas Humfrey), I was able to whip up a proof of concept Ruby script for Growl notifications without too much effort. It even seemed to work with both Ruby 1.8 and 1.9.

Faye and Portability Problems

However, Karlin recently started working on a Sinatra app to extend the pub/sub model to the web using Faye, and we discovered a problem we hadn’t encountered before. There was a section of code in the MQTT gem that my script never called, but Karlin’s project used heavily. This code deals with parsing packet headers and uses a lot of bit-masking tricks that work fine in Ruby 1.8, but fall apart completely in Ruby 1.9.

Changes to String

A bit of investigation revealed that the code was breaking on 1.9 due to changes with String (to better support unicode and different character encodings).

In Ruby 1.8.7 for example

a = "hello there"
a[1]                   #=> 101

Whereas in Ruby 1.9.3 this gives us a completely different result!

a = "hello there"
a[1]                   #=> "e"

The way that #[] works in 1.8 is similar to how byteslice() works in 1.9 (barring details like the fact that Strings in 1.9 all come with character encodings), but byteslice() is not available in 1.8, so code written with it would not be backward compatible. In many scenarios (like with the MQTT gem, for example), losing compatibility with 1.8 is not an option. So what’s the answer?

A Portable Solution – #unpack()

After looking at the core docs for String in 1.8 and 1.9 side by side, I discovered that I could get the behavior I wanted in both versions of Ruby by using unpack(). It’s slightly more verbose, but it’s relatively portable, and that’s what mattered to me.

According to the documentation:

unpack(format) → anArray

Decodes str (which may contain binary data) according to the 
format string, returning an array of each value extracted. The
format string consists of a sequence of single-character 
directives, summarized in the table at the end of this entry. 
Each directive may be followed by a number, indicating the 
number of times to repeat with this directive. An asterisk 
(“*”) will use up all remaining elements. 

Integer      |         |
Directive    | Returns | Meaning
   C         | Integer | 8-bit unsigned (unsigned char)
...          | ...     | ...

So instead of using a[1] to get 101 on just 1.8, we can use a.unpack("C*")[1] and get 101 reliably on both 1.8 and 1.9.

Ruby 1.8.7:

a = "hello there"
a.unpack("C*")[1]                   #=> 101

Ruby 1.9.3:

a = "hello there"
a.unpack("C*")[1]                   #=> 101

With this knowledge, I was able to contribute a patch to the MQTT gem (to be included in the next release, 0.9) and get Karlin’s project to run.