Using ActiveRecord to Abstract “Greatest N Per Group” Queries

Some database-related projects require access to information based around maximum and minimum values. Recently, I was trying to figure out the best way to perform a “greatest N per group” query in an ActiveRecord model with a SQL database backend. Eventually, I settled on the SQL proposed by Bill Karwin on StackOverflow. Once I had the SQL, I worked it into a form that fit my needs and moved it into my ActiveRecord model.

My approach allowed me to specify the group based on a set of columns and to abstract the required join logic behind an ActiveRecord scope. The following gist is a complete script that demonstrates the approach.


# A file demonstrating how I use ActiveRecord to perform "All records where N
# is greatest N" queries.
#
# This script contains the schema, the models, and the example code. Normally
# these pieces would be broken out into separate files.
#
# The example demonstrates the use by producing a set of relationships between
# users, services, and "logins" of those users to the services. Then, we
# perform some queries that relate to the longest session time when stratified
# different ways.

require 'active_record'

### Database Connection and Creation ###
def connect_database
  ActiveRecord::Base.establish_connection(
    adapter: "sqlite3",
    database: "a_database.sqlite3"
  )
end

def create_database
  create_tables
end

def create_tables
  ActiveRecord::Schema.define do
    create_table(:users, force: true) do |t|
      t.column :name, :string
    end

    create_table(:services, force: true) do |t|
      t.column :name, :string
    end

    create_table(:logins, force: true) do |t|
      t.column :session_length, :integer
      t.column :service_id, :integer
      t.column :user_id, :integer
    end
  end
end

### Model Creation ###
class User < ActiveRecord::Base
  has_many :logins, dependent: :destroy
  def to_s; name end
end

class Service < ActiveRecord::Base
  has_many :logins
  def to_s; name end
end

class Login < ActiveRecord::Base
  belongs_to :user
  belongs_to :service

  # Fetch the row(s) from `logins` where no other row exists with the same
  # `columns` value(s) and a greater `session_length`.
  scope :longest_session_for, lambda { |*columns|
    selectors = columns.map {|c|
      "logins.\"#{c}\" = ls.\"#{c}\""
    }.join (" AND ")

    joins(%Q{
      LEFT OUTER JOIN logins ls ON
      (
        logins.session_length < ls.session_length AND
        #{selectors}
      )
    }).where("ls.id IS NULL")
  }

  def to_s
    "LOGIN ID %02d: [ %5s ] to [ %5s ] for [ %02d ]" % [id, user, service, session_length]
  end
end

def insert_data
  john = User.create!(name: "John")
  alex = User.create!(name: "Alex")

  mail = Service.create!(name: "Mail")
  chat = Service.create!(name: "Chat")
  pony = Service.create!(name: "Pony")

  Login.create!(service: mail, user: john, session_length: 2)
  Login.create!(service: mail, user: alex, session_length: 2)
  Login.create!(service: pony, user: john, session_length: 3)
  Login.create!(service: chat, user: john, session_length: 4)
  Login.create!(service: chat, user: alex, session_length: 5)
  Login.create!(service: mail, user: john, session_length: 5)
  Login.create!(service: mail, user: alex, session_length: 6)
  Login.create!(service: pony, user: john, session_length: 7)
  Login.create!(service: pony, user: john, session_length: 8)
  Login.create!(service: mail, user: john, session_length: 9)
  Login.create!(service: mail, user: john, session_length: 9)
end

def header(title)
  "===== #{title} ====="
end

def print_data
  puts header("All Logins")
  puts Login.all

  puts header("Longest Sessions for Each User")
  puts Login.longest_session_for(:user_id)

  puts header("Longest Sessions for Each Service")
  puts Login.longest_session_for(:service_id)

  puts header("Longest Sessions for Each User on Each Service")
  puts Login.longest_session_for(:user_id, :service_id)
end

connect_database()
create_database()
insert_data()
print_data()

# The output of this script will look something like this:
#   ===== All Logins =====
#   LOGIN ID 01: [  John ] to [  Mail ] for [ 02 ]
#   LOGIN ID 02: [  Alex ] to [  Mail ] for [ 02 ]
#   LOGIN ID 03: [  John ] to [  Pony ] for [ 03 ]
#   LOGIN ID 04: [  John ] to [  Chat ] for [ 04 ]
#   LOGIN ID 05: [  Alex ] to [  Chat ] for [ 05 ]
#   LOGIN ID 06: [  John ] to [  Mail ] for [ 05 ]
#   LOGIN ID 07: [  Alex ] to [  Mail ] for [ 06 ]
#   LOGIN ID 08: [  John ] to [  Pony ] for [ 07 ]
#   LOGIN ID 09: [  John ] to [  Pony ] for [ 08 ]
#   LOGIN ID 10: [  John ] to [  Mail ] for [ 09 ]
#   LOGIN ID 11: [  John ] to [  Mail ] for [ 09 ]
#   ===== Longest Sessions for Each User =====
#   LOGIN ID 07: [  Alex ] to [  Mail ] for [ 06 ]
#   LOGIN ID 10: [  John ] to [  Mail ] for [ 09 ]
#   LOGIN ID 11: [  John ] to [  Mail ] for [ 09 ]
#   ===== Longest Sessions for Each Service =====
#   LOGIN ID 05: [  Alex ] to [  Chat ] for [ 05 ]
#   LOGIN ID 09: [  John ] to [  Pony ] for [ 08 ]
#   LOGIN ID 10: [  John ] to [  Mail ] for [ 09 ]
#   LOGIN ID 11: [  John ] to [  Mail ] for [ 09 ]
#   ===== Longest Sessions for Each User on Each Service =====
#   LOGIN ID 04: [  John ] to [  Chat ] for [ 04 ]
#   LOGIN ID 05: [  Alex ] to [  Chat ] for [ 05 ]
#   LOGIN ID 07: [  Alex ] to [  Mail ] for [ 06 ]
#   LOGIN ID 09: [  John ] to [  Pony ] for [ 08 ]
#   LOGIN ID 10: [  John ] to [  Mail ] for [ 09 ]
#   LOGIN ID 11: [  John ] to [  Mail ] for [ 09 ]