6 Comments

Rails, Active Record & Postgres – Optimizing Deletions

Every Rails project I’ve worked on has used Active Record. ORMs like Active Record have many benefits. They abstract you away from the database. They make querying data elegant and simple (in most cases).

However, sometimes Active Record is not more efficient than Postgres.

Let’s look at an example. Say our Postgres database has the following schema. Our top-level table is a building. The building has many apartments. An apartment has many rooms. Each room has multiple pieces of furniture, and each piece of furniture has several materials. Your models might look something like this:


class Building < ActiveRecord::Base
  has_many :apartments, dependent: :destroy
end

class Apartment < ActiveRecord::Base
  has_many :rooms, dependent: :destroy
end

class Room < ActiveRecord::Base
  has_many :furnitures, dependent: :destroy
end

class Furniture < ActiveRecord::Base
  has_many :materials, dependent: :destroy
end

class Materials < ActiveRecord::Base
end

Now, let's say we want to remove a building. This building has 100 apartments, each apartment has 10 rooms, each room has 15 pieces of furniture, and each piece of furniture has five materials.

Using Active Record

Removing the building with Active Record is simple. You just need to call building.destroy. Because we set up all of those dependent destroys, Active Record will take care of removing all of the related records.

Active Record is so nice that it will also make sure to call the delete hooks on any object that has them. It does this by loading each object into memory and calling object.destroy on it. This transaction will create 91,001 objects. It takes my development machine about 77 seconds to run this scenario. So, it's not very efficient. Let's try this a different way.

Using Postgres

Databases are built to handle operations on large data sets. So, why not leverage the database for its purpose? Rails 4+ supports foreign keys. Let's add a migration to add foreign keys to our tables. It might look like this:


class CreateForeignKeys < ActiveRecord::Migration
  def change
    add_foreign_key :apartments, :buildings, on_delete: :cascade
    add_foreign_key :rooms, :apartments, on_delete: :cascade
    add_foreign_key :furnitures, :rooms, on_delete: :cascade
    add_foreign_key :materials, :furnitures, on_delete: :cascade
  end
end

Now, when we call building.destroy, instead of creating all of those objects in memory, Postgres will handle destroying all of the related records. It takes my machine 45 seconds to run the same operation using cascade deletes.

Conclusion

Active Record is a powerful tool. Because of this, we may shy away from letting the database handle work for us. But there are occasions where the database is the correct answer.

Some will argue that putting optimizations in the database causes opaqueness, and new developers rolling onto a project may not understand why all apartments disappear when a building is removed. I would argue that one of the first things developers should do when joining a project is to become familiar with the project schema. If they do, they will understand why and how the deletion happens.

Let your database do what it was built to do!