Article summary
Recently, I was able to attend Jamie Jenning’s talk on the Rosie Pattern Language (RPL/“Rosie”) at Strange Loop 2018. I had not previously been aware of Rosie, but after learning about it, I am extremely excited about the prospect of never writing another regular expression again.
The Rosie Pattern Language is an alternative to traditional regular expressions. Rosie patterns are Parsing Expression Grammars (PEGs), which are more potent than formal Regular Expressions (regexes). However, they typically require much more memory.
Rosie has several benefits over traditional regexes, including the ability to parse recursive structures like HTML and JSON, to create new patterns by combining other patterns, and to name patterns. You can combine these named patterns into libraries which you can import and use elsewhere.
Rosie itself ships with a library of common useful patterns, and it is available as a C library that’s compatible with many different languages (with libffi
). It produces output from matches in a variety of formats, including JSON, which makes it very easy to integrate with other tools that can easily consume structured data.
For me, Rosie’s pattern transparency and re-usability are the most exciting aspects.
Pattern Transparency & Re-Usability: Regex vs. Rosie
I use regexes frequently, and I am always frustrated by their opacity. When I come across a regex in code, I need to spend several minutes trying to understand the intended match. Fortunately, with Rosie, I can name patterns. This allows me to make sure that the name is descriptive and easy to recall in the future.
For example, if I wanted to use a regex to find IPv4 addresses, I might need to write something like:
([0-9]{1,3}\.){3}[0-9]{1,3}
To be able to print out ifconfig
lines matching the regex with grep:
jk@GERTY-MK-VI ~ $ ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" inet 127.0.0.1 netmask 0xff000000 inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255 inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255
(If you aren’t familiar with this subject, this is tame for a regex. Try the regex for an e-mail address).
With Rosie, defining the IP address pattern may seem a bit verbose. However, once it is defined, it can be re-used anywhere—easily and without any doubt about what we are matching.
The following example shows how to define an IPv4 address. It comes from the net
library which ships with Rosie:
local alias ipv4_component = [:digit:]{1,3} local alias ip_address_v4 = { ipv4_component {"." ipv4_component}{3} } ipv4 = ip_address_v4
Using the Rosie CLI, we can easily match for IPv4 addresses in a much clearer fashion:
jk@GERTY-MK-VI ~ $ ifconfig | rosie grep 'net.ipv4' inet 127.0.0.1 netmask 0xff000000 inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255 inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255
We can even take this a step further, using the PEG choice operator (/
) to get the interface names:
jk@GERTY-MK-VI ~ $ ifconfig | rosie grep '{[:alpha:]+ [:digit:] ":" [:space:]} / net.ipv4' lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255 p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 bridge0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 vboxnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255 utun1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380
In this case, the pattern is to match any number of alphabetic characters, followed by a digit, followed by a colon, followed by a space, and if that fails, to match an IPv4 address.
I can then create my own .rpl
pattern file to store this for future use:
rpl 1.1 package jk import net local alias interface = {[:alpha:]+ [:digit:] ":" [:space:]} ifconfig = interface / net.ipv4
Then I can invoke:
jk@GERTY-MK-VI ~ $ ifconfig | rosie -f ~/jk.rpl grep jk.ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255 p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 bridge0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 vboxnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255 utun1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380
While I have yet to make use of Rosie’s C library or other interfaces in Ruby or Python, I’m very excited for the possibilities that Rosie’s readability and ease of comprehension bring to application development.
Has anyone else made extensive use of Rosie? I’d be interested in chatting with you and learning more.