Rails, Active Record & Postgres – Optimizing Deletions

Every Rails project I’ve worked on has used Active Record. ORMs like Active Record have many benefits. They abstract you away from the database. They make querying data elegant and simple (in most cases).

However, sometimes Active Record is not more efficient than Postgres.

Let’s look at an example. Say our Postgres database has the following schema. Our top-level table is a building. The building has many apartments. An apartment has many rooms. Each room has multiple pieces of furniture, and each piece of furniture has several materials. Your models might look something like this:


class Building < ActiveRecord::Base
  has_many :apartments, dependent: :destroy
end

class Apartment < ActiveRecord::Base
  has_many :rooms, dependent: :destroy
end

class Room < ActiveRecord::Base
  has_many :furnitures, dependent: :destroy
end

class Furniture < ActiveRecord::Base
  has_many :materials, dependent: :destroy
end

class Materials < ActiveRecord::Base
end

Now, let's say we want to remove a building. This building has 100 apartments, each apartment has 10 rooms, each room has 15 pieces of furniture, and each piece of furniture has five materials.

Using Active Record

Removing the building with Active Record is simple. You just need to call building.destroy. Because we set up all of those dependent destroys, Active Record will take care of removing all of the related records.

Active Record is so nice that it will also make sure to call the delete hooks on any object that has them. It does this by loading each object into memory and calling object.destroy on it. This transaction will create 91,001 objects. It takes my development machine about 77 seconds to run this scenario. So, it's not very efficient. Let's try this a different way.

Using Postgres

Databases are built to handle operations on large data sets. So, why not leverage the database for its purpose? Rails 4+ supports foreign keys. Let's add a migration to add foreign keys to our tables. It might look like this:


class CreateForeignKeys < ActiveRecord::Migration
  def change
    add_foreign_key :apartments, :buildings, on_delete: :cascade
    add_foreign_key :rooms, :apartments, on_delete: :cascade
    add_foreign_key :furnitures, :rooms, on_delete: :cascade
    add_foreign_key :materials, :furnitures, on_delete: :cascade
  end
end

Now, when we call building.destroy, instead of creating all of those objects in memory, Postgres will handle destroying all of the related records. It takes my machine 45 seconds to run the same operation using cascade deletes.

Conclusion

Active Record is a powerful tool. Because of this, we may shy away from letting the database handle work for us. But there are occasions where the database is the correct answer.

Some will argue that putting optimizations in the database causes opaqueness, and new developers rolling onto a project may not understand why all apartments disappear when a building is removed. I would argue that one of the first things developers should do when joining a project is to become familiar with the project schema. If they do, they will understand why and how the deletion happens.

Let your database do what it was built to do!

Conversation
  • Fon says:

    Do you have to remove “dependent: :destroy” in models after add the foreign keys?

  • John says:

    I have the same question as Fon. Do you delete the dependent: :destroy statements because this new approach of cascading simply over-rides that?

  • Ilya Bylich says:

    Thanks for a nice article!

    A few notes:
    1. “dependent: :destroy” explicitly invokes “destroy” on each associated object which invokes before/after _destroy callbacks. Sometimes you really need them.
    2. Foreign key with “on_delete: :cascade” is very similar to “dependent: :delete_all”, the difference is the place where you actually perform this “DELETE”: on the database or on the application level.
    3. “dependent: :delete_all” shows your “DELETE …” query in your application logs. Foreign key does it directly in the database, so you can’t see it in your logs.
    4. Of course, “on_delete: :cascade” is much faster because it’s not written in Ruby.

    • Ryan Abel Ryan Abel says:

      Thank you for reading the post and for taking the time to leave a comment.

      You are absolutely correct with your first note. This approach would not work if the model relied on any deletion callbacks.

      One important thing to note about “dependent: :delete_all” is that it will only remove the next association. So, in this example, using “dependent: :delete_all” instead of foreign keys would leave orphaned records below the apartment model.

  • Bruno Wego says:

    Very nice article, congrats!

  • Marcus Crowley says:

    Nice one. Thanks, Ryan.

  • Comments are closed.