Experimenting with Clojure’s PersistentHashMap in JRuby

Over the last few months, I have written two blog posts that mention using Hamster (Efficient, Immutable, Thread-Safe Collection) classes in JRuby: Preventing JRuby Concurrency Errors with Hamster and Making Refs in Ruby Using Celluloid Actors.

The latter post was commented on by Mike Busch he mentioned his evaluation of Hamster and comparing it to using Clojure’s persistent data structures in JRuby. He even linked to a simple benchmark for select he had done comparing the two.

Intrigued, I decided to to do some of my own experimentation and ended up extending Hamsterdam so it could use either Hamster Hashes or Clojure Maps internally. The results were mixed. The Clojure Map did turn out to be faster, but there are some (potentially annoying and confusing) side effects of using a Java collection as a primary data structure in JRuby.

Using a Clojure Data Structure in JRuby

To make things easier, I started by monkey patching PersistentHashMap so its interface more closely matched that of a Hamster Hash:

class Java::ClojureLang::PersistentHashMap
  alias_method :put, :assoc
  alias_method :delete, :without
end

With that change in place, all that was left was a bit of refactoring to allow for the internals of Hamsterdam to be switched out at runtime to use something other than the default Hamster implementation.

In order to use Clojure Maps with Hamsterdam, you will obviously need to be using JRuby. In addition, you will need to require a clojure.jar (not included as part of the Hamsterdam gem):

require 'clojure-1.5.1.jar'
require 'hamsterdam'
require 'hamsterdam/clj'

Benchmarks

I did some benchmarking after making the changes to compare using the two different internal data structures. I ran the following benchmarks on my MacBook Pro 2.66 GHz Intel Core i7 three times for Hamster and three times for Clojure and averaged the “real” seconds reported by Benchmark:

class BenchmarkStruct < Hamsterdam::Struct.define(:a, :b, :c, :d, :e, :f, :g)
  def self.default
    @default ||= new
  end
end

number = 100_000

Benchmark.bm do |bm|
  bm.report("Create structs, with validation given a ruby hash") {
    number.times { BenchmarkStruct.new(a: "1", b: "2", c: "3", d: "4", e: "5") }
  }
  bm.report("Create structs, with validation given the correct internal hash") {
    number.times { BenchmarkStruct.new(Hamsterdam.from_ruby_hash(a: "1", b: "2", c: "3", d: "4", e: "5")) }
  }
  bm.report("Create structs, without validation") {
    number.times { BenchmarkStruct.new(Hamsterdam.from_ruby_hash(a: "1", b: "2", c: "3", d: "4", e: "5", f: nil, g: nil), false) }
  }
  bm.report("Create structs, using setters") {
    number.times { BenchmarkStruct.default.set_a("1").set_b("2").set_c("3").set_d("4").set_e("5") }
  }
  bm.report("Struct equality") {
    a = BenchmarkStruct.default.set_a("1").set_b("2").set_c("3").set_d("4").set_e("5")
    b = BenchmarkStruct.default.set_a("1").set_b("2").set_c("3").set_d("4").set_e("5")
    number.times { a == b }
  }
  bm.report("Setting and getting values") {
    obj = BenchmarkStruct.default
    number.times do |i|
      obj = obj.set_a(i).set_b(i+1)
      obj.a
      obj.b
    end
  }
end

As expected the Clojure Map performed quite a bit better than the Hamster Hash, ranging from 2.5 to just over 6 times faster depending on the operations being performed:

Operation Hamster (sec) Clojure (sec) Times Faster
Create structs, with validation given a ruby hash 36.071 11.353 3.2
Create structs, with validation given the correct internal hash 33.846 8.104 4.2
Create structs, without validation 4.232 0.673 6.3
Create structs, using setters 5.139 2.224 2.3
Struct equality 0.948 0.267 3.6
Setting and getting values 2.349 0.918 2.6

Not So Fast My Friend

During the course of my experiment I did end up running into some irritating issues.

Automatic Time Conversion

It appears that JRuby automatically converts Time objects into java.util.Date objects when they are added to a Java collection (which the PersistentHashMap is). Since Ruby Time and Java Date have pretty incompatible interfaces, this caused numerous problems for me. I have been experimenting with using Joda Time monkey patched to match what I need of the Ruby Time interface.

Sets and Lists

After things went so smoothly with PersistentHashMap, I thought it would be reasonable to try to use Clojure’s PersistentHashSet and PersistentList in place of Hamster::Set and Hamster::List in my application. At first this seemed to be working quite well, but after a while, I realized there was a major issue.

JRuby sees that these data structures can be treated like Enumerable, so you can call any of the Ruby Enumerable methods on them (map, reject, etc.). But because these are Ruby methods, not Clojure methods, the resulting data structure will be a plain old mutable Ruby Array. Definitely not what I was looking for.

I have hopes that it won’t be that much work to override the Enumerable methods so they call into the corresponding Clojure functions to return another immutable List or Set, but I haven’t gone down that road yet.

Conclusion

Based on the benchmarks, if you want to use immutable data structures in JRuby and performance is a major concern, then Clojure’s data structures will give you a boost. But there are some drawbacks, so you need to know what you are getting into.
 

Conversation
  • Mike Busch says:

    Great writeup, Patrick!

  • Nice article, thanks!

    The automatic time conversion seems like a really strange thing to do. It kind of makes sense from an interoperability point of view (though even there it’s a bit too magical for my taste), but severely limits use cases like this when you just wan’t to use some JVM data structure libraries in Ruby.

    Do you have any pointers to some information on why the JRuby guys did this? And what other magical conversions do they have?

    Btw, I’ve recently also been writing about immutable data structures and Ruby: https://deveo.com/blog/2013/03/22/immutability-in-ruby-part-1/

    • Patrick Bacon Patrick Bacon says:

      Tero,

      I haven’t done much research into why the automatic conversion is done, but my guess is also for interoperability with Java. With the exception of putting something into a collection there are probably very few cases where you would want a RubyTime object to be passed into a Java function.

      A quick search through the JRuby code base shows that there are automatic conversions for Boolean, Nil, Numeric, String, Symbol, and Time. I haven’t noticed any issues when dealing with any of those with the exception of Time.

      I am now wondering what kind of conversions are taking place each time a Ruby object is placed in or retrieved from a Java (Clojure) collection. Something to look into. :-)

  • Comments are closed.