Pattern Matching without Regex! – Introducing the Rosie Pattern Language

Recently, I was able to attend Jamie Jenning’s talk on the [Rosie Pattern Language](http://rosie-lang.org/) (RPL/“Rosie”) at Strange Loop 2018. I had not previously been aware of Rosie, but after learning about it, I am extremely excited about the prospect of never writing another [regular expression](https://en.wikipedia.org/wiki/Regular_expression) again.

The Rosie Pattern Language is an alternative to traditional regular expressions. Rosie patterns are [Parsing Expression Grammars](https://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEGs), which are more potent than formal [Regular Expressions](https://en.wikipedia.org/wiki/Regular_expression) (regexes). However, they typically require much more memory.

Rosie has several benefits over traditional regexes, including the ability to parse recursive structures like HTML and JSON, to create new patterns by combining other patterns, and to name patterns. You can combine these named patterns into libraries which you can import and use elsewhere.

Rosie itself ships with a library of common useful patterns, and it is available as a C library that’s compatible with many different languages (with `libffi`). It produces output from matches in a variety of formats, including JSON, which makes it very easy to integrate with other tools that can easily consume structured data.

For me, Rosie’s pattern transparency and re-usability are the most exciting aspects.

## Pattern Transparency & Re-Usability: Regex vs. Rosie
I use regexes frequently, and I am always frustrated by their opacity. When I come across a regex in code, I need to spend several minutes trying to understand the intended match. Fortunately, with Rosie, I can name patterns. This allows me to make sure that the name is descriptive and easy to recall in the future.

For example, if I wanted to use a regex to find IPv4 addresses, I might need to write something like:

`([0-9]{1,3}\.){3}[0-9]{1,3}`

To be able to print out `ifconfig` lines matching the regex with grep:

jk@GERTY-MK-VI ~  $ ifconfig | grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}"
	inet 127.0.0.1 netmask 0xff000000
	inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255
	inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255

(If you aren’t familiar with this subject, this is tame for a regex. Try the regex for an e-mail address).

With Rosie, defining the IP address pattern may seem a bit verbose. However, once it is defined, it can be re-used anywhere—easily and without any doubt about what we are matching.

The following example shows how to define an IPv4 address. It comes from the `net` library which ships with Rosie:

local alias ipv4_component = [:digit:]{1,3}
local alias ip_address_v4 = { ipv4_component {"." ipv4_component}{3} }
ipv4 = ip_address_v4

Using the Rosie CLI, we can easily match for IPv4 addresses in a much clearer fashion:

jk@GERTY-MK-VI ~  $ ifconfig | rosie grep 'net.ipv4'
	inet 127.0.0.1 netmask 0xff000000
	inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255
	inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255

We can even take this a step further, using the PEG choice operator (`/`) to get the interface names:

jk@GERTY-MK-VI ~  $ ifconfig | rosie grep '{[:alpha:]+ [:digit:] ":" [:space:]} / net.ipv4'
lo0: flags=8049 mtu 16384
	inet 127.0.0.1 netmask 0xff000000
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863 mtu 1500
	inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255
p2p0: flags=8843 mtu 2304
awdl0: flags=8943 mtu 1484
en1: flags=8963 mtu 1500
en2: flags=8963 mtu 1500
bridge0: flags=8863 mtu 1500
utun0: flags=8051 mtu 2000
vboxnet0: flags=8843 mtu 1500
	inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255
utun1: flags=8051 mtu 1380

In this case, the pattern is to match any number of alphabetic characters, followed by a digit, followed by a colon, followed by a space, and if that fails, to match an IPv4 address.

I can then create my own `.rpl` pattern file to store this for future use:

rpl 1.1

package jk
import net

local alias interface = {[:alpha:]+ [:digit:] ":" [:space:]}
ifconfig = interface / net.ipv4

Then I can invoke:

jk@GERTY-MK-VI ~  $ ifconfig | rosie -f ~/jk.rpl grep jk.ifconfig
lo0: flags=8049 mtu 16384
	inet 127.0.0.1 netmask 0xff000000
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863 mtu 1500
	inet 192.168.0.13 netmask 0xffffff00 broadcast 192.168.0.255
p2p0: flags=8843 mtu 2304
awdl0: flags=8943 mtu 1484
en1: flags=8963 mtu 1500
en2: flags=8963 mtu 1500
bridge0: flags=8863 mtu 1500
utun0: flags=8051 mtu 2000
vboxnet0: flags=8843 mtu 1500
	inet 33.0.123.1 netmask 0xffffff00 broadcast 33.0.123.255
utun1: flags=8051 mtu 1380

While I have yet to make use of Rosie’s C library or other interfaces in Ruby or Python, I’m very excited for the possibilities that Rosie’s readability and ease of comprehension bring to application development.

Has anyone else made extensive use of Rosie? I’d be interested in chatting with you and learning more.