ActiveRecord Black Magic

There’s been a lot of buzz around the Rails world about the new ActiveRecord query interface, since the 3.0 release. It’s a fairly awesome tool, and I’ve been excited about how it lets you compose pieces of queries, which can help make code more DRY. However, it isn’t always clear how to get ActiveRecord and Arel to generate the correct SQL for the more difficult cases.

Here’s a case I ran into lately and how I solved it.  

The Initial Solution

Suppose that your software product is sold to companies, and each company’s administrator can manage employees who work for any of his companies. To get all the employees a given user can access, you might:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids)

This will, unfortunately, give you duplicate employees when an employee works for more than one matching company, so (following the example in the Rails documentation) you might write:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids).
  select("DISTINCT(employees.id)")

This is closer, but the Employee objects you get back will not be very useful — they only have one column! Try this instead:

Employee.joins(:companies).
  where("companies.id IN (?)", user.company_ids).
  select("DISTINCT(employees.id), employees.*")

Success! Now go and make a method for that so you can write something like EmployeeFinder.accessible_by(user). You can then use this encapsulated query all over your application to get that data, and if your business rules change, you’ll only have one place to make the update.

class EmployeeFinder
  def self.accessible_by(user)
    Employee.joins(:companies).
      where("companies.id IN (?)", user.company_ids).
      select("DISTINCT(employees.id), employees.*")
  end
end

The Exception

However, there is still one case where this will fall down: it’s not a composable query. Suppose your employees can upload their TPS reports, and that these files are stored in the uploaded_files table. If you wanted to get all the files uploaded by a given user’s employees, you might try to write something like this:

UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user))

It looks like it should work, but it’ll blow up with the error: @“Mysql2::Error:
Operand should contain 1 column(s)”@ What this cryptic error is really saying is
that ActiveRecord is trying to generate a subselect query that looks something like:

SELECT * FROM uploaded_files WHERE uploaded_by_id IN (
  SELECT DISTINCT(employees.id), employees.* FROM employees
    JOIN companies_employees ON employees.id=companies_employees.employee_id
    JOIN companies ON companies_employees.company_id=companies.id
    WHERE companies.id IN (1, 2, 3)
);

The problem is that the "WHERE uploaded_by_id IN (...)" clause expects the
subquery to return one column only (the list of ids.) I’ve tried the below
approach, which seems like the logical way to overwrite it:

UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user).
  select(:id))

This doesn’t work because ActiveRecord is just going to aggregate the select and add it on – not replace it. I’ve discovered that we can rewrite the select clause portion of the statement using Arel’s API:

employees = EmployeeFinder.accessible_by(user)
employees.projections = [Arel::Nodes::SqlLiteral.new("employees.id")]
UploadedFile.where(uploaded_by: employees)

This will help Arel produce a valid query that returns the information you’re interested in and without duplicating logic anywhere: your authorization logic is tightly wrapped up only in the EmployeeFinder.

It’s almost certainly possible to use PredicateBuilder or update the where clauses directly (see some of the code here), but I feel like if you’re going to take this approach the code above is minimally invasive.

The Caveats

The advantage of the above approach is this: You can find the records you are interested in with one single query to the database without error-prone duplicated logic. But there are other ways to do this:

  • You could duplicate the employee finder logic and write it up as an explicit join.
  • You could just run two queries.

Depending on your database and your measured performance characteristics (take into account this experience and the optimizations available to your DBMS), one of those might make more sense for you. Regardless, it’s one more tool for you to have in your toolbox.

Has anyone else out there run into this problem? I’m curious if people have solved this differently than I have.
 

Conversation
  • Tadas says:

    Is there any particular reason why you check for distinct employee.id and not just leave ‘distinct employees.*’? This is what AR’s #uniq does.

    Also, a useful method to know about is #except() http://api.rubyonrails.org/classes/ActiveRecord/SpawnMethods.html#method-i-except

    • Mike Swieton Mike Swieton says:

      Tadas,

      Thanks for the feedback!

      There’s no particular reason why I wrote it the way I did. I just copied from the documentation.

      Thanks for the pointer to except. I didn’t realize that worked for select clauses as well (though it certainly makes sense now that you point it out.) That is certainly a more idiomatic solution than what I had above. I’m glad there’s a better way!

  • Drew says:

    Try this instead:

    Employee.joins(:companies).
    where("companies.id IN (?)", user.company_ids).
    select("DISTINCT(employees.id), employees.*")

    I don’t think this is valid SQL for Postgres and it’s a distortion of the meaning of DISTINCT so even though it works on MySQL, it doesn’t make sense. The reason is that once DISTINCT eliminates duplicate rows based on the specified column, there’s no way to resolve the other columns.

    • Mike Swieton Mike Swieton says:

      Excellent point, Drew. I get sloppy sometimes since MySQL will allow things that it probably shouldn’t. As I recall that exact problem occurs with GROUP BY clauses as well (or at least, it did at one point.)

      Thanks!

  • s says:

    How about

    UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user).collect(&:id))

    • Mike Swieton Mike Swieton says:

      That will definitely work, but the disadvantage is that it’s going to make an extra round-trip to the database. Calling collect / map on the relation will cause it to be executed. For many applications and queries, that’s ok, but it’s always good to have other options available for when it isn’t.

      Thanks!
      Mike

  • Tanel Suurhans says:

    Couldn’t you just query the employees via companies directly from the user? By using a has_many :through for example. And distinct them on employees.* for unique results. Seems like solving the wrong problem right now.

  • AlexSlynko says:

    Why not just:
    Employee.includes(:companies).
    where(“companies.id IN (?)”, user.company_ids) ?

    • Mike Swieton Mike Swieton says:

      Because it doesn’t work ;)

      Unless it’s fixed in a different Rails version, when I run that in my app, this code: Employee.includes(:companies).where("companies.id IN (?)", [1, 2, 3]).to_sql will return SELECT `employees`.* FROM `employees` WHERE (companies.id IN (1,2,3)).

  • Dan Shultz says:

    Hi Mike,

    While your code works, it’s a little to much sql for me and you are not taking advantage of half of what AR/ARel has to offer. I rewrote the code and now it also should work across ORMs


    class EmployeeFinder
    def self.accessible_by(user)
    Employee.joins(:companies) \
    .merge(Company.where(:id => user.company_ids)) \
    .uniq
    end
    end

    UploadedFile.where(uploaded_by: EmployeeFinder.accessible_by(user))

  • […] Making Queries More Composable with ActiveRecord and Arel Mike Swieton says it isn't always clear how to get ActiveRecord and Arel to tackle certain difficult database queries so he shares some pointers he figured out here. […]

  • Comments are closed.