All atomic-powered posts filed in “Technologies”:
Working with embedded CDATA in XML documents
Recently while working on the SME Toolkit, a project sponsored by the International Finance Corporation (a member of the World Bank Group), I encountered a problem with CDATA sections in XML documents.
CDATA sections are used in markup languages to identify general character data -- data that should only be interpreted as characters, and not as specialized markup or commands. In XML, CDATA sections allow XML markup to be embedded, but not interpreted as part of the XML document itself.
For example, CDATA would allow XHTML to be embedded inside a larger XML document without treating the XHTML as part of the parent document:

Unfortunately, there seems to be a great deal of confusion about the proper usage of CDATA sections. This is probably because they are not often worked with, and the CDATA markers behave differently than traditional XML tags. CDATA sections are defined as beginning with the following character sequence:<![CDATA[ ...and ending with the first occurrence of the following character sequence: ]]>. Unfortunately, this means that CDATA sections cannot be 'nested' hierarchically like XML tags because any occurrence of the ending CDATA marker will terminate any open CDATA section.
This means that the following XML document is invalid because the first occurrence of "]]>" within the style section of the embedded XHTML document terminates the first CDATA section, leaving half of the embedded XHTML document to be considered as part of the larger XML document.

The preferred solution to this problem is to break-up the CDATA end markers when nesting them in a new XML document by inserting markers to close and re-open a CDATA section. Then, when the combined CDATA sections are interpreted, the original CDATA markers will be restored. This is accomplished by utilizing the following character sequence: ]] ]]> <
Essentially, while CDATA sections cannot be nested, it is possible to escape ending CDATA markers to prevent a CDATA section from being prematurely terminated during parsing. In the example above, parsing of the parent or container XML document will combine the two separate, yet adjacent, CDATA sections into a single set of general character data as intended, preserving the embedded CDATA markers. The nature of the embedded data will be preserved without having it mistakenly treated as part of the XML markup.
Further Reading:
- The XML Standard, by the W3C:
Extensible Markup Language (XML) 1.0 (Fifth Edition) § 2.4 Character Data and Markup - The Joel on Software Discussion Group, by Joel on Software:
Nesting CDATA sections
Native App Vs. Mobile Friendly Web Application
There are two main ways to create mobile applications. The following post lays out advantages and disadvantages of both approaches.
Native Apps – applications that are installed directly on smart phone devices (iPhone, Android, Blackberry, etc.).
Advantages- Cool Factor – “There’s an app for that.” Being in the Apple App Store or Google Android App Market is great marketing for an organization.
- Application Icon – When an app is installed its icon is placed onto the user’s smart phone desktop.
- Experience – Native apps are generally faster and more fun to use.
- Hardware Access – Native apps can easily take advantage of a smart phone’s GPS or camera.
- Offline Mode – App features can be developed that do not require an internet connection.
- Many Platforms – Each application is unique to its platform. For example, an iPhone app will only work on an iPhone. If you want it to also work on a Blackberry, you will need to create another application tailored to Blackberry. New frameworks are being developed to help ease this pain.
- Many Versions – When a new version of an existing native app is released, the users of the native app will need to download and install the update. People are not forced to update; therefore there will be multiple versions of the application in production.
Mobile Friendly Web Application – web application that is easily viewable and usable on a smart phone.
Advantages:- Reaches everyone – Anyone that has an internet enabled phone can view the application.
- Web application already exists – If a web application already exists it can be updated to accommodate mobile phones.
- One Version – Updating the website updates all the application users.
- Not as cool – There is no app store or app icon. You need to access that application through your mobile browser.
- Must be online – The application only works when you have internet connectivity.
- Limited hardware – Although it is technically possible to access some smart phone hardware from a website, it is not as seamless.
- Optimized look and feel – Each smart phone has its own look and feel and screen dimensions. Optimizing the web experience for specific smart phones requires implementing various mobile stylesheets.
The correct approach in any given situation depends upon the experience you are trying to achieve. It is important to note that this decision is not mutually exclusive. In some cases it makes sense to do both.
On the Importance of Character Sets and Character Encodings in MySQL
When transmitting and storing digital data, one of the most important considerations should be the character encoding. Unfortunately, this rarely seems to be on anyone's mind when setting up a database or making a database connection. For the most part, the defaults are just expected to work and provide the best set of options. With regards to character encodings (in any context), this is a dangerous approach.
In MySQL, the default character set is Latin-1. As a reminder, Latin-1 is an 8-bit, single byte, character encoding capable of representing 255 values. This would be awesome if you only ever had to represent characters from the Latin alphabet, and would never store or retrieve characters outside of the Latin-1 character set. Unfortunately, in a world driven by the Internet, this is almost never the case, and it causes problems.
Why? Well, because the default MySQL character set is Latin-1, any characters not within that character set may not be properly stored (or retrieved). This often doesn't occur to developers in the U.S. because nearly everything is represented in characters from the Latin alphabet anyway. However, should you try to store (or retrieve) something not in the standard Latin-1 character set, there are often problems.
For instance, let's create a sample database on a new MySQL server installation from a UTF-8 client:
mysql> SET NAMES utf8;
mysql> CREATE DATABASE mydatabase;
mysql> USE mydatabase;
mysql> CREATE TABLE `mytable` (`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text, PRIMARY KEY (`id`));
Our database has been created, and is using the Latin-1 character set, as expected:
mysql> SHOW CREATE DATABASE mydatabase; +------------+-----------------------------------------------------------------------+ | Database | Create Database | +------------+-----------------------------------------------------------------------+ | mydatabase | CREATE DATABASE `mydatabase` /*!40100 DEFAULT CHARACTER SET latin1 */ | +------------+-----------------------------------------------------------------------+ mysql> SHOW CREATE TABLE mytable; +---------+--------------------------------------------------------------------------+ | Table | Create Table | +---------+--------------------------------------------------------------------------+ | mytable | CREATE TABLE `mytable` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` text, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +---------+--------------------------------------------------------------------------+
Now let's try to insert some data:
- Standard Latin-1:
mysql> INSERT mytable (name) VALUES ("abc"); - UTF-8 (Greek):
mysql> INSERT mytable (name) VALUES ("αβγ");
Now let's try to retrieve our data:
mysql> SELECT * FROM mytable; +----+------+ | id | name | +----+------+ | 1 | abc | | 2 | ??? | +----+------+
Well, our UTF-8 data doesn't look right, does it? No, not at all. In fact, our data is gone. Permanently. Because the database is Latin-1, and our UTF-8 characters don't exist in Latin-1, MySQL simply replaced all of our UTF-8 characters with the "replacement character" -- which is supposed to signify that the character was not understood, and not properly converted. However, this is little help when you are trying to retrieve data from the database at a later time.
In practice, rarely will developers set up a MySQL database and send all non-Latin-1 characters to it. Usually most of the characters will be Latin-1, with an odd UTF-8 character thrown in. These UTF-8 characters may be forever lost or corrupted, but the Latin-1 characters are just fine. Because these UTF-8 characters may be rarely used, their loss or corruption may not be noticed for some time (if at all). This varied behavior contributes to the lack of awareness and understanding about properly configuring character encodings in general. It should be noted that this is only one specific example of the data corruption and loss that can occur due to improperly configured character encodings -- many different variants can and do occur.
So, what is one to do about this problem? The answer is really very simple: always use the correct character set all the time. From a practical perspective, this should mean always using UTF-8 for everything. Why? Because that is the way the world is trending -- the Internet is international, and nearly all locales except the U.S. and Western Europe rely upon UTF-8 (or some other form of Unicode) to represent characters all the time. If anyone hopes to serve an international or Internet audience, the character encoding of choice is UTF-8.
So, how is this accomplished in MySQL? Generally, the MySQL server itself should be configured to use UTF-8 as the default character set.
- This can be done by inserting the following line into the MySQL configuration file, usually
/etc/my.conf:
[mysqld] ... default-character-set = utf8 ...
- If the server configuration file isn't accessible, you must specify the correct character set at database creation:
mysql> CREATE DATABASE mytest2 DEFAULT CHARACTER SET utf8; - You can also specify the correct character set at table creation:
mysql> CREATE TABLE `mytable2` (`id` int(11) NOT NULL AUTO_INCREMENT, `name` text, PRIMARY KEY (`id`)) CHARACTER SET utf8; - Alternatively, you can specify the correct character set on a per column basis at table creation:
mysql> CREATE TABLE `mytable2` (`id` int(11) NOT NULL AUTO_INCREMENT, `name` text CHARACTER SET utf8, PRIMARY KEY (`id`));
Existing character sets for servers, databases, tables, and columns can be altered, but this poses a risk for further corrupting or damaging existing data.
Our database and table have both been created, using the UTF-8 character set as specified:
mysql> SHOW CREATE DATABASE mytest2; +----------+------------------------------------------------------------------+ | Database | Create Database | +----------+------------------------------------------------------------------+ | mytest2 | CREATE DATABASE `mytest2` /*!40100 DEFAULT CHARACTER SET utf8 */ | +----------+------------------------------------------------------------------+ mysql> SHOW CREATE TABLE mytable2; +----------+------------------------------------------------------------------+ | Table | Create Table | +----------+------------------------------------------------------------------+ | mytable2 | CREATE TABLE `mytable2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` text CHARACTER SET utf8, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +----------+------------------------------------------------------------------+
We can now insert UTF-8 data into the database without a problem:
mysql> INSERT mytable2 (name) VALUES ("αβγ");
mysql> SELECT * FROM mytable2;
+----+--------+
| id | name |
+----+--------+
| 1 | αβγ |
+----+--------+
It is important to note that clients and their connections to MySQL server also have their own character sets. These should also always be the same as the server, database, and table: UTF-8. The MySQL client will often try and establish a connection to the MySQL server using the default character set (Latin-1), so it must sometimes be specifically set to UTF-8.
- On the MySQL client command line, this can be accomplished by setting the following variable:
mysql> SET NAMES utf8; - Certain other MySQL clients must also specifically be told to use UTF-8. For example, in Ruby on Rails, the database.yml file should specify UTF-8:
production: adapter: mysql database: mydatabase username: myuser password: mypass host: mydb encoding: utf8
The SQL samples shown here are really intended to illustrate the importance of using the proper character set on MySQL server, and on MySQL clients. These are just examples, and not total solutions. You should do proper research on the character encoding of the server and clients that you utilize. Always backup data and use caution when trying to change the character encoding used in a production database.
Further Reading:
- MySQL Server Reference, by MySQL:
Specifying Character Sets and Collations - Blue Box Group, by Nathan Kaiser:
Getting out of MySQL Character Set Hell - The Unicode standard and associated documentation, by the The Unicode Consortium:
What is Unicode?
Undelete!
I was working on a server this morning and accidentally deleted an important configuration file. Like many Linux users, I lamented the absence of an “undelete” command. The file wasn’t still open by any processes, wasn’t present in the backups, and would be painful to recreate.
Fortunately, not all hope was lost. When a file is deleted from a hard drive, the blocks are freed, but not actually cleared. The data remains on disk, but it cannot be directly accessed and is in danger of being overwritten. Recovery is a matter of search and rescue.
Since the file I was hoping to recover was a text file, and I knew a fair amount about it (such as approximate file size and some text that was definitely going to be included), finding it actually turned out to be fairly simple task using grep:
grep -a -B 25 -A 100 'some string in the file' /dev/sda1 > results.txt
Here’s what the command does:
grep searches through a file and prints out all the lines that match some pattern. Here, the pattern is some string that is known to be in the deleted file. The more specific this string can be, the better. The file being searched by grep (/dev/sda1) is the partition of the hard drive the deleted file used to reside in. The “-a” flag tells grep to treat the hard drive partition, which is actually a binary file, as text. Since recovering the entire file would be nice instead of just the lines that are already known, context control is used. The flags “-B 25 -A 100” tell grep to print out 25 lines before a match and 100 lines after a match. Be conservative with estimates on these numbers to ensure the entire file is included (when in doubt, guess bigger numbers). Excess data is easy to trim out of results, but if you find yourself with a truncated or incomplete file, you need to do this all over again. Finally, the ”> results.txt” instructs the computer to store the output of grep in a file called results.txt.
Once the command is done, results.txt will probably contain lots of gibberish, but if you’re lucky, the contents of the deleted file will be intact and recoverable.
To help prevent this problem from happening in the first place, many people elect to alias the rm command to a script which will move files to a temporary location, like a trash bin, instead of actually deleting them.
Researching HTML5 Offline
I spent much of the last week prototyping and researching offline web application technologies. I am going to share some of my notes and a few useful links the team came across in the process.
A Ruby on Rails web application we developed needs its main feature to work when there is no internet available. Our only browser requirements are that the application must work on the most recent releases of Firefox (3.6+), Safari (5.0+) and Internet Explorer (8). Our goal is to stick to HTML5 standard offline technologies as much as possible.
Cache Manifest
If a web application is going to function properly without access to its web server, the browser needs to have a local cache of any resource it might need. The follow snippet from the Let’s Take This Offline chapter of the outstanding Dive Into HTML5 online book describes an offline web application well:
At its simplest, an offline web application is a list of URLs – HTML, CSS, JavaScript, images, or any other kind of resource. The home page of the offline web application points to this list, called a manifest file, which is just a text file located elsewhere on the web server. A web browser that implements HTML5 offline applications will read the list of URLs from the manifest file, download the resources, cache them locally, and automatically keep the local copies up to date as they change. When the time comes that you try to access the web application without a network connection, your web browser will automatically switch over to the local copies instead.
HTML5 lets you specify which URLs a browser needs to cache in a cache manifest file. You can specify URLs that need to be cached, URLs that should never be cached (always attempt to hit the server), and even a URL to show as a fallback when a non-cached page is requested offline. The details of the cache manifest file are discussed in great detail in the previously mentioned chapter.
All of the latest browsers support the cache manifest file with the lone exception being Internet Explorer 8. Since we need to support IE8, we looked into the recently deprecated Google Gears as an alternative. Not surprisingly Gears uses a similar manifest file to specify which resources need to be downloaded before the application can work offline. The biggest downside to using Google Gears is that the user will need to install it before they can use our application offline. Unfortunately it looks like there might not be a better choice for IE8 users.
Getting a working prototype up and running with HTML5 and/or Google Gears was pretty trivial. We used Sinatra for our prototyping. Here are some things we ran into and concerns we have going forward:
- The cache manifest file must be served with the MIME type
text/cache-manifest. Our Sinatra app serves up a static “manifest.cache” file so we added the following to our app:mime_type: cache, 'text/cache-manifest' - To support both HTML5 and Google Gears we will need two different versions of the manifest file. In addition, we don’t want to have to manually update a manifest file when there are new images/stylesheets/javascript files. This means dynamically generating most, if not all, of the manifest. Rack::Offline looks like it could be useful, but it appears to only support Rails3.
- Once a browser has downloaded all of your site’s offline resources it will only update them when the manifest file changes. This is a royal pain when you are used to being able to make a change to your HTML or javascript and hit refresh to see the changes. Unless you also update something in the manifest file (like a version number) you will not see any changes. You also need to hit refresh twice. The first time the browser will detect the change to the manifest and re-download all of the resources. The second time it will display the page with the new resources. We are going to have to come up with something to make this less painful during development.
- If you have an error in your cache manifest the browser can silently fail to use it. This leads to much developer confusion. This tutorial discusses some javascript that can be used to monitor the state of the application cache.
HTML5 Storage
In addition to needing our application’s pages be displayed when offline, we also need to be able to save changes a user makes while offline. Again from the Dive Into HTML5 book:
So what is HTML5 Storage? Simply put, it’s a way for web pages to store named key/value pairs locally, within the client web browser. Like cookies, this data persists even after you navigate away from the web site, close your browser tab, exit your browser, or what have you. Unlike cookies, this data is never transmitted to the remote web server (unless you go out of your way to send it manually). Unlike all previous attempts at providing persistent local storage, it is implemented natively in web browsers, so it is available even when third-party browser plugins are not.
All of the latest browsers, including Internet Explorer, support HTML5 Storage (also known as Local Storage). Our prototype application stored a few JSON documents and allowed the user to make changes. For example:
1 2 3 |
var developer = JSON.parse(localStorage.getItem("developer") || "{}"); developer["company"] = "Atomic Object"; localStorage.setItem("developer", JSON.stringify(developer)); |
We didn’t run into any issues using HTML5 Storage on any of the browsers we tested. Our main concerns with this aspect of the offline application are:
- How do we handle synchronizing offline changes when the user goes back online?
- All browsers currently limit a single site to 5 megabytes of local HTML5 storage. We are going to have to make sure we stay under that limit to prevent any loss of data.
Using Sparse Files as Disks for Networked RAID
In Unix and its variants, devices (disks, peripherals) are treated as files. Modern Linux distributions mount /dev at boot using the devtmpfs file-system and populate the device files dynamically (based on what is present on the system) using udev. Listing the /dev directory shows that the device files appear just like any other file.
Devices are files. That much we understand, but can files be devices, and if so, how?
In this article we look at creating sparse files, assigning them to loop devices and placing them in a software RAID configuration. The final step in the network RAID configuration is moving one (or more) of the files to a remote mount or share.
Read the rest of this entryConsolidate Multiple FileSystemWatcher Events
The .NET framework provides a FileSystemWatcher class that can be used to monitor the file system for changes. My requirements were to monitor a directory for new files or changes to existing files. When a change occurs, the application needs to read the file and immediately perform some operation based on the contents of the file.
While doing some manual testing of my initial implementation it was very obvious that the FileSystemWatcher was firing multiple events whenever I made a change to a file or copied a file into the directory being monitored. I came across the following in the MSDN documentation’s Troubleshooting FileSystemWatcher Components
Multiple Created Events Generated for a Single Action
You may notice in certain situations that a single creation event generates multiple Created events that are handled by your component. For example, if you use a FileSystemWatcher component to monitor the creation of new files in a directory, and then test it by using Notepad to create a file, you may see two Created events generated even though only a single file was created. This is because Notepad performs multiple file system actions during the writing process. Notepad writes to the disk in batches that create the content of the file and then the file attributes. Other applications may perform in the same manner. Because FileSystemWatcher monitors the operating system activities, all events that these applications fire will be picked up.
Note: Notepad may also cause other interesting event generations. For example, if you use the ChangeEventFilter to specify that you want to watch only for attribute changes, and then you write to a file in the directory you are watching using Notepad, you will raise an event. This is because Notepad updates the Archived attribute for the file during this operation.
I did some searching and was surprised that .NET did not provide any kind of wrapper around the FileSystemWatcher to make it a bit more user friendly. I ended up writing my own wrapper that would monitor a directory and only throw one event when a new file was created, or an existing file was changed.
In order to consolidate the multiple FileSystemWatcher events down to a single event, I save the timestamp when each event is received, and I check back every so often (using a Timer) to find paths that have not caused additional events in a while. When one of these paths is ready, a single Changed event is fired. An additional benefit of this technique is that the event from the FileSystemWatcher is handled very quickly, which could help prevent its internal buffer from filling up.
Here is the code for a DirectoryMonitor class that consolidates multiple Win32 events into a single Change event for each change:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;
namespace FileSystem
{
public delegate void FileSystemEvent(String path);
public interface IDirectoryMonitor
{
event FileSystemEvent Change;
void Start();
}
public class DirectoryMonitor : IDirectoryMonitor
{
private readonly FileSystemWatcher m_fileSystemWatcher =
new FileSystemWatcher();
private readonly Dictionary<string, DateTime> m_pendingEvents =
new Dictionary<string, DateTime>();
private readonly Timer m_timer;
private bool m_timerStarted = false;
public DirectoryMonitor(string dirPath)
{
m_fileSystemWatcher.Path = dirPath;
m_fileSystemWatcher.IncludeSubdirectories = false;
m_fileSystemWatcher.Created += new FileSystemEventHandler(OnChange);
m_fileSystemWatcher.Changed += new FileSystemEventHandler(OnChange);
m_timer = new Timer(OnTimeout, null, Timeout.Infinite, Timeout.Infinite);
}
public event FileSystemEvent Change;
public void Start()
{
m_fileSystemWatcher.EnableRaisingEvents = true;
}
private void OnChange(object sender, FileSystemEventArgs e)
{
// Don't want other threads messing with the pending events right now
lock (m_pendingEvents)
{
// Save a timestamp for the most recent event for this path
m_pendingEvents[e.FullPath] = DateTime.Now;
// Start a timer if not already started
if (!m_timerStarted)
{
m_timer.Change(100, 100);
m_timerStarted = true;
}
}
}
private void OnTimeout(object state)
{
List<string> paths;
// Don't want other threads messing with the pending events right now
lock (m_pendingEvents)
{
// Get a list of all paths that should have events thrown
paths = FindReadyPaths(m_pendingEvents);
// Remove paths that are going to be used now
paths.ForEach(delegate(string path)
{
m_pendingEvents.Remove(path);
});
// Stop the timer if there are no more events pending
if (m_pendingEvents.Count == 0)
{
m_timer.Change(Timeout.Infinite, Timeout.Infinite);
m_timerStarted = false;
}
}
// Fire an event for each path that has changed
paths.ForEach(delegate(string path)
{
FireEvent(path);
});
}
private List<string> FindReadyPaths(Dictionary<string, DateTime> events)
{
List<string> results = new List<string>();
DateTime now = DateTime.Now;
foreach (KeyValuePair<string, DateTime> entry in events)
{
// If the path has not received a new event in the last 75ms
// an event for the path should be fired
double diff = now.Subtract(entry.Value).TotalMilliseconds;
if (diff >= 75)
{
results.Add(entry.Key);
}
}
return results;
}
private void FireEvent(string path)
{
FileSystemEvent evt = Change;
if (evt != null)
{
evt(path);
}
}
}
} |
bundler and git-deploy
Over the last few months I have helped a number of other projects in the office setup bundler. I’ve been using bundler with my new Rails application for the last six months and have earned a reputation as the bundler master within the office.
The fact of the matter is that, although I’m happy to help, the bundler website is so well done that my help was not strictly necessary. I’m a big fan of the large text, code snippets, succinct descriptions, and piecewise instructions. I’d say it is nothing short of negligent for someone using bundler to not read through the site at least once.
On a related note, aside from bundler, the git-deploy project has been the other piece of really awesome technology we’ve used on this latest Rails application. git-deploy makes great use of git and Passenger to make for simple and extremely fast deployments. It has also been trivial for me to add some of my own extensions to it. Some of my favorite moments have been when I’ve deployed a new version of the application to our staging system, while our tester was working on it, and have her not even notice. Big thanks to Mislav Marohnić for creating git-deploy.
Finally, here are some articles about why you should care about bundler. They are written by one of bundler’s authors.Matchure: Serious Clojure Pattern Matching
One of the things I find myself yearning for in a lot of programming languages is a powerful pattern matching system. I wrote one for ruby, but ruby's syntax just wasn't flexible enough to make something as elegant as I'd like. When I started using clojure, it seemed like a great little project for getting to know clojure's macro facilities as well as clojure itself. I set out to build a kickass pattern matching library for clojure that fit in with the language's way of doing things and was at least as expressive than any other pattern matching facility I've used.
The outcome of this effort, is matchure (github, clojars), a pattern matching library for clojure featuring
- equality checks,
- sequence destructuring,
- map destructuring,
- regexp matches,
- variable binding,
- "instance of" checking,
- arbitrary clojure expressions,
- and boolean operators (and, or, not).
All of which compile down to high performance clojure.
Introduction
Clojure, like most lisps, has a built-in facility which gets you part of the way there: destructuring in most variable-binding contexts. For example,
1 2 3 |
(let [a 1] a) ; returns 1 |
binds 1 to the variable a. You can just as easily grab the first value out of a sequence:
1 2 3 |
(let [[fst & rst] (list 1 2 3)] [fst rst]) ; returns [1 (2 3)] |
This facility is even more powerful. You can also destructure maps as well. There's just one problem: let doesn't work well as a way of testing values.
1 2 3 |
(let [[fst & rst] (list)] [fst rst]) ; returns [nil nil] |
This is, in part, why I wrote matchure. Matchure provides a pattern matching facility that makes it really easy to perform complex tests against values and bind parts of them to variables.
Syntax and Examples
Like any lisp, clojure is homoiconic - clojure code is represented by clojure data structures. Unlike other lisps, however, clojure has a rich set of data structures with literal representations. '(1 2 3) is a linked list, [1 2 3] is a vector, with fast random access, and {:a 1, :b 2} is a hash map. Furthermore, these syntaxes have special places within clojure itself. Lists are clojure function or macro calls and vectors are used anywhere variables are bound or sequences are destructured. Aside from being a literal representation in code, the hash syntax is used for destructuring maps in let bindings.
Matchure takes this rich set of literal representations and uses it as a natural way to match patterns. At the moment, there are three main forms, if-match, when-match, and cond-match. They work similarly to the clojure built-ins, but match values.
At the most basic level, matchure can be used to test for equality.
1 2 3 4 5 |
(if-match [nil nil] true) ;=> true (if-match [1 1] true) ;=> true (if-match ["asdf" "asdf"] true) ;=> true (let [s "asdf"] (if-match ["asdf" s] true)) ;=> true |
Regular expression literals test for a match
(if-match [#"hello" "hello world"] true) ;=> true |
and fully qualified class names test for instance? relationships.
1 2 |
(if-match [java.lang.String "foo"] true) ;=> true (if-match [java.lang.Comparable "foo"] true) ;=> true |
Matchure supports _ and ? as wildcards, so both of these match anything
1 2 |
(if-match [_ "foo"] true) ;=> true (if-match [? "foo"] true) ;=> true |
_ is idiomatic in clojure when you want to ignore a value. In matchure, ? has special meaning. ? can be thought of as "the thing this part of the pattern is matching against". As such, ? always matches successfully. It is also used in binding variables. ?foo matches successfully and stores the matched value in the variable foo.
(if-match [?foo "bar"] foo) ;=> "bar" |
just like regular clojure, list literals represent clojure code. Here, too, the special meaning of ? comes into play. You can perform arbitrary tests this way using ? to represent the matched against value:
(if-match [(odd? ?) 1] true) ;=> true |
Just like with let, you can destructure sequences
(if-match [[?fst & ?rst] [1 2 3]] [fst rst]) ;=> [1 (2 3)] |
and maps
(if-match [{:foo (even? ?)} {:foo 2}] true) ;=> true |
Finally, unlike any other pattern matching facility I've seen, matchure has support for boolean operators.
1 2 |
(if-match [[(and ?fst (even? ?)) & ?rst] [2 3 4]] [fst rst]) ;=> [2 (3 4)] (if-match [[(and ?fst (even? ?)) & ?rst] [1 2 3]] [fst rst] :failed-match) ;=>:failed-match |
or and not are also supported.
You can get the code from github, or use it in your clojure project by grabbing it from clojars.
Migrating to Closures in Objective C
Apple introduced closure support with the release of Snow Leopard. Plausible Labs has since created a framework and a drop-in runtime that provides closure support for both MacOS X 10.5 and the iPhone SDK. Closure support is a fundamental and important change to the Objective-C language. Functions are now first-class citizens in the runtime environment; they can be saved, copied, and passed around just like any other object.
That being said, much of the existing API in both MacOS X and the iPhone SDK rely on callbacks, selectors, and delegate patterns to cope with lack of first-class functions. But don’t let that stop you from using closures. It is relatively easy to create small intermediary wrappers that bridge the gap between callbacks and closures.
Read the rest of this entryEasy data visualization with JFreeChart
This week one of our customers asked us to create a small, single-purpose tool to help support one of our JRuby desktop applications. The requirement was simple: make it easy to input minimum, maximum, and beta values into a beta distribution function for a few thousand samples and visualize the results.
After a few hours of work we produced this:

- We brought in JFreeChart, a powerful, yet easy-to-use charting library for Java.
- We used the beta distribution functionality from the Bayesian Logic (BLOG) Inference Engine project.
- We had prior experience with these two libraries from our JRuby projects.
- The NetBeans visual Swing editor made it trivial to get some simple widgets on the screen.
- As always, JarJar Links allows us to distribute this application as a single jar file.
Factor #1, JFreeChart, is far and away the most significant. Our experiences with JFreeChart are all positive: getting data into a nice looking chart has always been straightforward. The API is relatively easy to work with in that it strikes a good balance between configurability and getting a chart on the screen with minimal hassle.
The next time you’re looking to visualize data in your JRuby or Java application, don’t overlook JFreeChart.
Environment Configurable
During our last big rails project, Bloomfire, we found ourselves integrating with all kinds of external services. Because of this we had a diverse set of environment dependent configuration variables. A consistent pattern started to arise where we would extract our configuration variables into a YAML file and then wrap the configuration using a small class wrapper. This eventually gave rise to Environment Configurable, a library that makes environment dependent configuration easy in rails.
Read the rest of this entryBetter Java with Google Collections
Version 1.0 of the Google Collections library was officially released December 30, 2009. I have been using the library for the past 6 months or so on a variety of Java projects, with great success. Today I gave a brown-bag talk to share with some interested Atoms.
I have included below the sample code I used as presentation material, in case others are interested.
Read the rest of this entryShawn Anderson to present at LA RubyConf and SCALE
Later this month, Shawn Anderson will be going back to Cali, his previous home, to make back-to-back presentations at two different conferences: LA RubyConf and SCALE (the second time in two years). For both, he is presenting on developing 2D games using Ruby and open source tools (including a library that he wrote).
Read the rest of this entryRunning a Ruby application with jruby-complete
One of the great things about the JRuby project is that it’s easy to run Ruby programs without installing Ruby. In fact, you don’t even need to install JRuby. All you need is a JVM runtime and jruby-complete.

Rationale
Check out this other post for a discussion of my reasons for locking down your JRuby runtime. In summary, embedding jruby-complete gives you complete control of your Ruby runtime. That’s a good thing. The downside is that discovering and executing commands through jruby-complete can be a pain. The rest of this post describes how to ameliorate the pain.
Running jruby-complete
The base jruby-complete command is:java -jar jruby-complete-1.4.0.jar |
ruby or jruby.
java -jar jruby-complete-1.4.0.jar -e "puts 'Hello'" |
I lied.1 To get the same JVM heap and stack sizes as typing
jruby, you need to pass a couple of JVM options:
java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -e "puts 'Hello'" |
java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -e 'load "META-INF/jruby.home/bin/jirb"' |
1 2 3 4 |
irb(main):001:0> puts "Hello" Hello => nil irb(main):002:0> % |
java command is getting to be a pain. It’s time to introduce some rake tasks to help us out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
JRUBY_COMPLETE = "jruby-complete-1.4.0.jar" JRUBY = "java -Xmx500m -Xss1024k -jar #{JRUBY_COMPLETE}" namespace :jruby do desc "Run JRuby help" task :help do sh %+#{JRUBY} --help+ end desc "Run any command with JRuby" task :run do sh %+#{JRUBY} -e '#{ENV["cmd"]}'+ end end |
Now I can type rake jruby:run cmd='puts "Hello"'. Shell escaping is becoming a real annoyance at this point. Thankfully, I’m usually not using jruby-complete to run silly little commands. By the time I’ve introduced a Rakefile I’ve got a real application with tasks oriented around testing and running it, so it’s rare that I’m using a task like jruby:run very often.
1 2 3 |
task :run do sh "#{JRUBY} lib/application_bootstrap.rb" end |
1 2 3 4 |
fletcher-git/github/jruby-complete-example(master) rake run (in /Users/fletcher/git/github/jruby-complete-example) java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar lib/application_bootstrap.rb Hello from application_bootstrap |
-S parameter runs files in JRuby’s bin directory:
1 2 3 4 5 6 7 |
namespace :spec do desc "Run RSpec against a specific file" task :run do raise "You need to specify a spec with spec=" if not ENV["spec"] sh %+#{JRUBY} -S spec -f specdoc #{ENV["spec"]}+ end end |
1 2 3 4 5 |
describe "John Galt" do it "does not tolerate logical fallacies" do "A".should == "A" end end |
1 2 3 4 5 6 7 8 9 10 |
fletcher-git/github/jruby-complete-example(master) rake spec:run spec=spec/unit/objectivism_spec.rb (in /Users/fletcher/git/github/jruby-complete-example) java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -S spec -f specdoc spec/unit/objectivism_spec.rb John Galt - does not tolerate logical fallacies Finished in 0.123 seconds 1 example, 0 failures |
spec spec/unit/objectivism_spec.rb? Yes. Do I care? No. I know how to use my shell.
1 2 3 4 5 6 7 |
fletcher-git/github/jruby-complete-example(master) which sp
sp () {
rake spec:run spec=$@
}
fletcher-git/github/jruby-complete-example(master) sp spec/unit/objectivism_spec.rb
(in /Users/fletcher/git/github/jruby-complete-example)
java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -S spec -f specdoc spec/unit/objectivism_spec.rb |
Alright, so now that my application has been built up, I might want to start compiling the .rb files into .class files. Here comes jrubyc:
1 2 3 4 5 6 7 8 9 10 11 12 |
require "rake/clean" namespace :jruby do output_directory = "classes" directory output_directory CLEAN.include output_directory desc "Compile Ruby files in lib" task :compile => output_directory do sh %+#{JRUBY} -S jrubyc -p com/atomicobject -t #{output_directory} lib+ end end |
1 2 3 4 5 6 7 8 9 10 11 |
fletcher-git/github/jruby-complete-example(master) rake jruby:compile (in /Users/fletcher/git/github/jruby-complete-example) mkdir -p classes java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -S jrubyc -p com/atomicobject -t classes lib Compiling all in '/Users/fletcher/git/github/jruby-complete-example/lib'... Compiling lib/application_bootstrap.rb to class com/atomicobject/lib/application_bootstrap fletcher-git/github/jruby-complete-example(master) rake jruby:run cmd='require "classes/com/atomicobject/lib/application_bootstrap"' (in /Users/fletcher/git/github/jruby-complete-example) java -Xmx500m -Xss1024k -jar jruby-complete-1.4.0.jar -e 'require "classes/com/atomicobject/lib/application_bootstrap"' Hello from application_bootstrap |
java command, we can pass any typical JVM parameters before the -jar parameter. We’ve done this for things like:
- enabling antialiasing in Apple’s JVM via a Java property.
- tweaking Substance’s widget behavior via a Java property.
- enabling Yourkit Java Profiler via the
-agentlibparameter. - including libraries and directories in the JVM’s classpath via the
-cpparameter.
Since these parameters need to be passed before the -jar parameter, a more sophisticated method for setting up the JRuby command is needed than the constant I’ve used. A method like that is specific for your application and beyond the scope of this post, but is not be difficult to create.
Conclusion
There are an uncountable number of good things about JRuby and jruby-complete is one of them. A little help from scripts and your shell means you can build and run your application with a controlled Ruby runtime.
Additional resources
- The Rakefile, jruby-complete, and other files used in this post are available in this GitHub project.
- JarJar Links is an ant library that is useful for combining multiple jar files together.
- I wrote a post a while ago about using JarJar to combine jruby-complete and other application dependences into a single file.
- The AGI Production Simulator is built using the jruby-complete commands described in this post as well as the above jar-rolling technique.
-
Replicating the true
jrubybehavior is way, way beyond the scope of this post. Check out thejrubyscript if you really care. Most of the time the JVM heap and stack sizes are the most important things to worry about. ↩
Edit 2/6/2010: Reduced -e ‘load…’ parameters to -S

