Don’t Repeat Yourself, but Sometimes Repeat Yourself

Software developers are great at recognizing patterns. Maybe it’s an inherent skill that draws us to this profession. Or maybe it’s the process of writing software that develops this skill. Either way, “don’t repeat yourself” (DRY) is a natural application of this skill.

However, repetition in itself is not the enemy that this principle makes it out to be.

Don’t Repeat Yourself

As intuitive as “don’t repeat yourself” may be, The Pragmatic Programmer summarizes it this way:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

It’s not hard to imagine the benefits of reducing or eliminating repetition.

  • Less duplication means less code to maintain. And if the best code is no code at all, this sounds like a good thing.
  • A single source of truth eliminates the possibility of things getting out of sync. Some business rule has changed? Update it in one place, and be done.
  • Code is reusable. Have a process that is shared by three things, and you’re adding a fourth? Most of the work has already been done.

Sometimes Repeat Yourself

Getting too zealous about reducing apparent duplication can create more problems than it solves. I say “apparent duplication” because sometimes things that look similar are not actually related. For instance, two data structures with identical properties and types may be the same structurally but not semantically.

A common strategy for reducing duplicate code is to factor out the common parts and hide them behind an abstraction. But an undesirable side effect of this is that it couples anything using the abstraction. Any change in the abstraction affects all of its consumers. And likewise, the abstraction may need to be bent to fit the requirements of just one consumer.

This increase in coupling also comes with a decrease in flexibility. Say that you have a process that’s used in three places. It’s nearly the same in all three places, with just a few important differences. So you implement the process as a single module that takes a few parameters to cover the differences.

Tweaking the process for just a single one of those use cases is now impossible: any change to one affects all three. Sure, you can add more parameters (or special cases!) as the use cases diverge. But it will quickly become impossible to distinguish the important parts of the process from the infrastructure separating the use cases.

Questions to Ask

When refactoring existing code or writing new code that may potentially be duplicated, I ask myself:

Is this a single piece of knowledge that has been duplicated, or is this just infrastructure that happens to look similar?

So you rummaged through the CSS and found a class that just happens to have the styles you want. That’s probably not a good enough reason to avoid defining another class.

How many times has this thing been repeated?

Until a truly redundant thing appears at least three times, I would be very skeptical of any repetition-reduction that increases complexity.

Will reducing duplication now make it harder to customize individual cases in the future?

There’s something satisfying about refactoring a bunch of copy-and-paste code into something more streamlined and reusable. But locking down the code so tight that any change would introduce special cases is not a good trade-off.

If I reduce this duplication, what is the ratio between the size of the abstraction and the number of parameters it will take?

Libraries and frameworks are great because they provide reusable code with relatively few parameters for customization. But imagine an application-specific function for presenting a dialog box that has grown and now accepts 20 parameters. Whatever repetition-reduction benefit existed when it had 2 parameters is no longer there.

Conclusion

As with many software development principles, “don’t repeat yourself” is a guideline more than a mantra.