14 Comments

ActiveRecord Black Magic

There’s been a lot of buzz around the Rails world about the new ActiveRecord query interface, since the 3.0 release. It’s a fairly awesome tool, and I’ve been excited about how it lets you compose pieces of queries, which can help make code more DRY. However, it isn’t always clear how to get ActiveRecord and Arel to generate the correct SQL for the more difficult cases.

Here’s a case I ran into lately and how I solved it.  

The Initial Solution

Suppose that your software product is sold to companies, and each company’s administrator can manage employees who work for any of his companies. To get all the employees a given user can access, you might:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids)

This will, unfortunately, give you duplicate employees when an employee works for more than one matching company, so (following the example in the Rails documentation) you might write:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids).
  select("DISTINCT(employees.id)")

This is closer, but the Employee objects you get back will not be very useful — they only have one column! Try this instead:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids).
  select("DISTINCT(employees.id), employees.*")

Success! Now go and make a method for that so you can write something like EmployeeFinder.accessible_by(user). You can then use this encapsulated query all over your application to get that data, and if your business rules change, you’ll only have one place to make the update.

class EmployeeFinder
  def self.accessible_by(user)
    Employee.joins(:companies).
      where("companies.id IN (?)", user.company_ids).
      select("DISTINCT(employees.id), employees.*")
  end
end

The Exception

However, there is still one case where this will fall down: it’s not a composable query. Suppose your employees can upload their TPS reports, and that these files are stored in the uploaded_files table. If you wanted to get all the files uploaded by a given user’s employees, you might try to write something like this:

UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user))

It looks like it should work, but it’ll blow up with the error: "Mysql2::Error:
Operand should contain 1 column(s)"
What this cryptic error is really saying is
that ActiveRecord is trying to generate a subselect query that looks something like:

SELECT * FROM uploaded_files WHERE uploaded_by_id IN (
  SELECT DISTINCT(employees.id), employees.* FROM employees
    JOIN companies_employees ON employees.id=companies_employees.employee_id
    JOIN companies ON companies_employees.company_id=companies.id
    WHERE companies.id IN (1, 2, 3)
);

The problem is that the "WHERE uploaded_by_id IN (...)" clause expects the
subquery to return one column only (the list of ids.) I’ve tried the below
approach, which seems like the logical way to overwrite it:

UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user).
  select(:id))

This doesn’t work because ActiveRecord is just going to aggregate the select and add it on – not replace it. I’ve discovered that we can rewrite the select clause portion of the statement using Arel’s API:

employees = EmployeeFinder.accessible_by(user)
employees.projections = [Arel::Nodes::SqlLiteral.new("employees.id")]
UploadedFile.where(uploaded_by: employees)

This will help Arel produce a valid query that returns the information you’re interested in and without duplicating logic anywhere: your authorization logic is tightly wrapped up only in the EmployeeFinder.

It’s almost certainly possible to use PredicateBuilder or update the where clauses directly (see some of the code here), but I feel like if you’re going to take this approach the code above is minimally invasive.

The Caveats

The advantage of the above approach is this: You can find the records you are interested in with one single query to the database without error-prone duplicated logic. But there are other ways to do this:

  • You could duplicate the employee finder logic and write it up as an explicit join.
  • You could just run two queries.

Depending on your database and your measured performance characteristics (take into account this
experience
and the optimizations available to your DBMS), one of those might make more sense for you. Regardless, it’s one more tool for you to have in your toolbox.

Has anyone else out there run into this problem? I’m curious if people have solved this differently than I have.