A Simple Approach to Comparing Database Structures

On my current project, we are developing a replacement for a legacy system written in Visual FoxPro that our customer has used for the last couple of decades. Since there is a lot of risk in switching over, we have developed a bi-directional syncing tool that attempts to keep each system up-to-date with changes made in the other.

The syncing system is not perfect for a number of reasons (which I won’t get into here), so we decided to write a tool that we could run nightly to detect mismatches in records between the two systems. On the surface, this seems like a simple task. However, the data structures between the two systems are significantly different. We decided that we could use the database structure of the legacy system as a common format, as it is rather denormalized and uses natural keys which makes it easier to compare on a table-by-table basis.

We are developing the new system using a PostgreSQL database. We thought it would be nice to be able to query both our new database and the legacy system database using near-identical SQL statements so we could compare row-by-row.

For example, we can run this query against our new database:

SELECT
    *
FROM
    (
        SELECT
            customers.identifier AS custid,
            orders.identifier AS ordid,
            stores.identifier AS storid,
            products.identifier AS prodid,
            order_allocations.quantity AS qty
        FROM
            order_allocations
            INNER JOIN stores ON stores.id = order_allocations.store_id
            INNER JOIN customers ON customers.id = stores.customer_id
            INNER JOIN order_products ON order_allocations.order_product_id = order_products.id
            INNER JOIN products ON products.id = order_products.product_id
            INNER JOIN order_groups ON order_groups.id = order_products.order_group_id
            INNER JOIN orders ON orders.id = order_groups.order_id
    ) ordline
WHERE
    custid = 'MMM'
    AND ordid = 'C010917A'
ORDER BY
    prodid,
    storid;

And then run this query against our legacy Visual FoxPro database:

SELECT
    *
FROM
    ordline
WHERE
    custid = 'MMM'
    AND ordid = 'C010917A'
ORDER BY
    prodid,
    storid;

The key here is using the complex query as a sub-query, assigning the field names to match the legacy system’s field names, and using the legacy system’s names. This reshapes our new system’s data structures to match the simpler legacy system structure.

While this is just one piece of the differencing puzzle, it is a simple approach which allows us to query each database with the same field names, the same predicates, and the same record order. This allows us to more easily compare the two datasets on a row-by-row basis, even when one of the systems has a more complex record structure.