Adding End to End Cypress to a Web Project Part 2: Seeding the Test Data

In the first part of this series, we covered using Docker to set up test instances of the services the project depends on. With those services up and running, the next step is generating test data for your end-to-end tests to run against. In this post, we’ll describe how to generate a sample test data set and reset the test PostgreSQL database between test cases.

An Overview

We seed the test database by dropping all data from (almost) all tables of the DB. Then, we reinsert the sample data set. That’s it!

There is a lot of opportunity for additional enhancements. This simpler approach does not allow for loading different data scenarios or finer-grained controls for data creation right in the context of your tests. Rather, this represents the bare minimum that will allow you to hit the ground running and get started writing end-to-end tests without too much fuss.

Generating the Test Data

The first step is to generate a sample test data set. We use the pg_dump psql command to generate this test data.

A few things to note when generating your data:

  • Use the --exclude-table option to ignore any tables that get populated via migrations. You may also want to exclude tables you don’t care about including in the test dataset, such as auditing tables.
  • Similarly, we use the --exclude-schema option to ignore schemas that only contain info about the migrations run.
  • Switch out the values of the host, post, and dbname as needed to target different databases. When generating an initial data set, it may be useful to generate it against a pared-down version of your normal development database. When regenerating it to capture new data required for a new test you’re adding, targeting your test database may be better.

Seeding the Database

To seed the test DB, we run the following script:


dir="$(cd "$( dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "${dir}"

echo 'Clearing tables modified in tests...'
clear_tables_script="./clear_tables_modified_in_tests.sql"
if [[ -e "${clear_tables_script}" ]]; then
  echo "about to clear"
  if ! psql -v ON_ERROR_STOP=1 -h $DATABASE_HOST -U postgres -d test -p $DATABASE_PORT -c 'SET session_replication_role = replica;'  -f "${clear_tables_script}"  > /dev/null 2>&1; then
    echo "💥 Error clearing test DB!"
    exit 1
  fi
fi

echo 'Loading default scenario database dump...'
default_scenario="./default_scenario_dump.sql"
if [[ -e "${default_scenario}" ]]; then
  if ! psql -v ON_ERROR_STOP=1 -h $DATABASE_HOST -U postgres -d test -p $DATABASE_PORT -c 'SET session_replication_role = replica;' -f "${default_scenario}"  > /dev/null 2>&1; then
    echo "💥 Unable to load default scenario!"
    exit 1
  fi
fi

Since we do not have an established data layer in our codebase, we clear and then re-insert data by simply running SQL scripts. The first step is to run a SQL script to clear out data that might have been modified during the test run. We use TRUNCATE with CASCADE to do so. However, DELETE may be faster with smaller data sets (if you know the order to delete in without violating foreign key relationships).

Next, we load the test data from the default_scenario_dump file we generated using the pg_dump command, as described above. Note that, since we are inserting test data from a psql database dump, we do not want any database triggers to fire. To temporarily disable database triggers, we include -c 'SET session_replication_role = replica;' when running our SQL scripts.

With these additions, we now can generate an initial test data set, clear out all changes to the test DB, and re-insert the same test data set to ensure consistent behavior during test runs. In the next post, we’ll discuss how to call into these data setup scripts from our e2e tests using Cypress tasks.