When Is a Set Better Than an Array in Ruby?

I’ve been using Ruby for quite some time, but it was only recently that I found a hidden gem in the standard library, the Set class. If you are familiar with the mathematical concept of a Set, then Ruby’s implementation will hold no surprises for you.

Using a Set in Ruby

Ruby’s Set supports all of the common set operations including union, intersection, and subtraction.


require 'set'

# Union
Set[1,2,3] | Set[3,4,5]
#=> #{1,2,3,4,5}

# Intersection
Set[1,2,3] & Set[2,3,4,5]
#=> #{2,3}

# Subtraction
Set[1,2,3,4] - Set[3,4,5]
#=> #{1,2}


# Set equality is based only on contents, not content and position like an Array
Set[1,2,3] == Set[3,1,2]
#=> true

It includes methods to check if a Set is a subset or superset of another Set (or the proper versions of either of those relationships).


# Subsets and Supersets
Set[1,2,3].subset?(Set[1,2,3,4])
#=> true

Set[1,2,3].superset(Set[3,6,9])
#=> false

Set[1,2,3].proper_subset?(Set[1,2,3])
#=> false

It also provides a powerful divide method which allows you to partition the Set into subsets based on a given block.


# Partitioning using divide
Set[1,2,3,4,5,6,7,8,9].divide {|el| el%3 }
#=> #{ #{1,4,7},
#      #{2,5,8},
#      #{3,6,9}
#      }

We can use divide to partition a group of text strings by starting character
z = Set["alpha", "bravo", "charlie", "delta", "echo", "apple", "code"]

z.divide {|str| str[0]}
#=>  { {"alpha", "apple"},
#      {"bravo"},
#      {"charlie", "code"},
#      {"delta"},
#      {"echo"}
#     }

As you might suspect, the Set class also implements the enumerable module. This gives you access to the common iterators and gives greater interoperability between Sets and Arrays, as many of the operations that don’t make sense in the context of a set (such as sort) will return an array with the expected results.


# Enumerable methods work on Sets
Set[2,4,8,16].each {|el| puts el}
#=> 2
#   4
#   8
#   16 

# Enumerable methods that don't make sense in terms of a Set return an Array
Set[3,2,5,1].sort
#=> [1,2,3,5]

Set or Array?

The use case that originally led me to the Set class is a rather common one when working with Ruby Arrays:

  • What is the best way to create an array with no duplicate values?
  • How do you best determine if two such Arrays are equal when the items might be in different orders?

Since a Set has no duplicates by definition, it satisfied the first need and provided a much cleaner solution the using Array’s include? or uniq methods. A little bit of investigation in the REPL also confirmed that the == method for Sets only compares the contents of Set rather than the content and position as the Array == does. This saved me from having to do something like a sort before comparing for equality. The only downside to using a Set vs an Array in this situation is that you lose the fixed ordering an Array provides, and as a result you lose the ability to randomly access elements by index. In my particular case, being able to have random access to elements of the data was not important.

Since I discovered Ruby’s Set class and all of the great features it provides, I have found myself stopping in spots of my code where I would normally use an Array and considering whether a Set would better suit my needs. Some use cases I have found where Sets work better are:

  • Creating sequences that cannot have duplicates.
  • Creating sequences where the order of elements doesn’t matter.
  • Creating sequences that need to be compared for equality regardless of order.

There are plenty of cases where you need to use an Array rather than a Set, however, such as:

  • Sequences that might have duplicates.
  • Sequences where the order of elements matter.
  • Sequences where you need to access individual elements.
  • Using existing API’s that expect Arrays instead of Sets.

My rule as of late when deciding which of the two structures to use is to figure out if I have to use an Array because of one of the above limitations, and if I don’t, to use a Set. This allows me to leverage the unique features of a Set if needed, and since the Set class has a number of methods that will return an Array with the Set’s data if needed, I don’t really lose anything.

So if you are using Ruby in your current project and any of the use cases above stick out to you, consider using the Set class in place of traditional Array’s and seeing if it makes your code cleaner and simpler.

Conversation
  • owain says:

    One of my first CS projects was to write a noughts and crosses (tic-tac-toe) game. My class all came up with some two dimensional array (in Pascal) and all sorts of code to check for winning lines based on array indexes.

    The school solution was to create a collection of sets with the winning line positions [1,2,3], [3,6,9] etc and then you just check to see if a winning line is a subset of the the player’s positions. No sorting. No game related logic to code. All very simple.

    Ever since then I have always kept Sets in mind, validations, combinations and so forth.

    Next lesson was to start counting from zero. Obviously.

  • […] When Is a Set Better Than an Array in Ruby? – Great post, the Set are very useful and powerful but are also under know.  […]

  • […] When Is a Set Better Than an Array in Ruby? This entry was posted in Haskell, Java, Math, Monads, Pattern matching, Programming, Recursion, Scala on September 21, 2012 by edofic. […]

  • […] There’s a lot of other interesting things you can do with sets. You can read more about it here. […]

  • Comments are closed.