Never Say “Click” – Good Cucumber System Testing Practices

I recently found myself making updates to an application that had a bad system test suite. It wasn’t bad because it was incomplete. It wasn’t bad because it was unreliable. It wasn’t slow, or ugly any of those things – it was just brittle. If one simple rule was followed when the suite was written originally, it would have been much better.

That rule is: *Never say “click”*.

In fact, in your cucumber features you probably shouldn’t say “follow”, or “press”. Don’t ever type “check”, “uncheck”, “highlight”, “drag” or anything like that either. The problem with writing such very specific steps is that they have a tendency to fail for the wrong reasons. A good test will fail because the feature is broken. A bad test will fail for other reasons.

h2. Writing general tests

Here’s a concrete example:

Suppose I want to test logging in to an application. I could write:

Scenario: Viewing a report
  Given that I am on the home page
  And I follow "Reports..."
  And I select "Time report" from "Report type"
  And I select "Trailing 12 months" from "Report range"
  And I press "Generate"
  Then I should see "Time report for 2010/06 - 2011/05"
  And I should see time ....

Tests like the above are equivalent to telling someone how to spell your name by describing the movements of their pen. It’ll break as soon as they decide to type it.

Maintaining tests like this is a nightmare because if I change the “reports” link to be a drop-down menu, or perhaps I change the “Report type” select field to be a text field with completion, then the test will break. While we’d prefer our tests don’t break for these sorts of changes, that isn’t always possible. However, we can at least write a test suite where a code update will require a minimal amount of test updates.

Here’s a better test for the above:

Scenario: Viewing a report
  Given that I am on the home page
  When I view the "Time report" for the range "Trailing 12 months"
  Then I should see "Time report for 2010/06 - 2011/05"
  And I should see time ....

These steps are 1) shorter, 2) closer to the business domain, and 3) robust in the face of workflow changes. If the workflow for generating a report in this fictional application changes, then we will only need to update one step definition.

If you always write your tests in your domain language and avoid terms like “click”, “link”, “follow”, “select” and the like, you will be well on your way to a maintainable test suite.

*Edit:* A coworker just pointed me towards “this”:http://collectiveidea.com/blog/archives/2010/09/08/practical-cucumber-scenario-specific-steps/ post, also discussing good cucumber test practices. Interestingly enough, Brandon is arguing precisely the opposite side that I am. He cites two reasons for using the explicit “press”, “follow”, etc style:

* It communicates exactly how the feature works to an end user.
* It makes use of very reusable steps.

Personally, I don’t feel like the first is valuable. My tests are there to validate code, and not to be a user’s manual.

The second is interesting, as it is the same motivation I felt. It doesn’t make much of a difference if it’s a step that’ll only be used once, but for any piece of important functionality which either has multiple test cases or is a part of many workflows, I think using custom steps is worthwhile for the reasons I cite above. An extreme example of this benefit is writing steps for logging in to the application: if I change the form, I don’t want to go back and edit every single test I have.Which side of the fence do you all fall into? Leave a comment and tell us about your experiences.

Conversation
  • Phil Kirkham says:

    Is it good practice to have a hard coded date as part of the test ?

    • Mike Swieton Mike Swieton says:

      Hi Phil,

      I think the answer to “Is it good practice to have a hard coded date as part of the test ?” is “it depends”.

      If it’s a central business rule that might change (such as 30-day, payment terms, biweekly pay periods, or something like that), then having a helper function to calculate a meaningful date is probably worthwhile because it will 1) make the tests easy to update when rules change, and 2) be easily reusable in the many other tests likely necessary for these core domain rules.

      On the flip side, if it’s a due date for a todo list item and it’s mostly arbitrary, I wouldn’t sweat it.

      For anything in code or tests I always ask myself “What’s the likelihood that this rule, process, or piece of data will change?” and then “If it does change, how many places will be affected and what will be the cost of the updates?” Based on that gut check, I usually have some idea what’s worth abstracting.

  • Jason Roelofs says:

    I disagree. This pattern of cucumber steps when put into practice leads to an explosion of steps, which itself leads to an inadvertent increase in code duplication because you never know if the step you want has been implemented somewhere else or not. This, then, can lead to a very unmaintainable test suite. Besides, I have a hard time seeing how a Scenario breaking because of a missing link is breaking “for the wrong reason.” In your example, your test *should* break if you change how the user interacts with the site (link to a drop down, or vis-versa), otherwise what is it you are testing?

    • Mike Swieton Mike Swieton says:

      Consider steps for logging into the app. If I change the way login happens (perhaps by using some ajax to popup a login form dynamically on the current page), every test in my system will fail. In that case, I think the cost of maintaining a “login_steps.rb” or maybe “base_steps.rb” will be far far less than the cost of manually updating every test in my system.

      In reality, most features are not as cross-cutting as login tends to be. At the same time, most nontrivial features have two, three, four, or more test cases. Why should I have to maintain each of those tests separately when I can factor out common portions?

      I don’t think it’s so bad if a lot of tests fail, as they’re meant to do that when things change, but being able to fix them quickly is extremely valuable.

      • Aaron Day says:

        I’m just beginning to play with cucumber so I’m certainly not an expert, and looking for ideas on how to test effectively with the tool.

        Aren’t there two features being tested? Navigation within the product and individual features of the product. Your post focuses on testing product features in isolation, which seems like a good tactic. However navigation within the product is also a feature which ought to be tested.

      • Jason Roelofs says:

        Right, because it’s either one or the other, never both. What was that about pragmatism again, and why is it always kicked out the window in discussions like this?

        If you start your cucumber writing by specifying in broad over-arching statements like “And I view report details for [date]” then you will end up with an explosion of steps for every little thing your app does, leading to a very unmaintainable cucumber suite.

        But, if you start with direct steps, “I click on this” and “I fill in that”, then you can be much more pragmatic about your features. When you find yourself doing things often throughout many features (I usually use 3 times as the key) then you look at refactoring those into their own special step. Doing that, you know that you have the minimal required step definitions, and steps that are easier to find and re-use.

        • Mike Swieton Mike Swieton says:

          I think we’re both coming from the same motivations – the attempt to save work and headaches later.

          What’s driving me to this conclusion is essentially this:

          * The cost of creating a step is negligible. Since the step is probably just aliasing a collection of other steps, writing it is trivial.
          * I don’t see having many steps to be a major maintenance problem. A little careful organization of step definition files, coupled with grep, and the success/failure of your tests themselves, I find it’s easy to find if there’s a step I need already or if I’ve duplicated a step accidentally.
          * Steps tend to be used more than once more often than not. Very frequently the features I am testing need more than one test to cover everything. Given that, we’re not just creating steps we don’t need – we’ll use them right away.

          There are no absolute rules – but I think falling back on the basic “I click on…” steps should be a red flag driving you to consider whether you’re actually hoping to test the click or whether there’s a large or complicated operation you’re trying to perform that you’d be better off abstracting.

  • Phil Kirkham says:

    Specification by Example: a love story has some ideas on how good Cucumber tests could be written

  • Zach Moazeni says:

    Hey Mike, thanks for taking the time to spark this discussion. However I do disagree with your strategy. Following your strategy, you’re going to create many unnecessary custom steps which is going to distract you debugging errors within those custom steps and ultimately ignores Cucumber’s greatest strengths: reusable steps. Those pains are felt even more sharply for Cucumber newcomers.

    I have outlined each point more throughly in my post Reusable Cucumber Steps

  • Comments are closed.