On my current project, we are developing a replacement for a legacy system written in Visual FoxPro that our customer has used for the last couple of decades. Since there is a lot of risk in switching over, we have developed a bi-directional syncing tool that attempts to keep each system up-to-date with changes made in the other.
The syncing system is not perfect for a number of reasons (which I won’t get into here), so we decided to write a tool that we could run nightly to detect mismatches in records between the two systems. On the surface, this seems like a simple task. However, the data structures between the two systems are significantly different. We decided that we could use the database structure of the legacy system as a common format, as it is rather denormalized and uses natural keys which makes it easier to compare on a table-by-table basis.
We are developing the new system using a PostgreSQL database. We thought it would be nice to be able to query both our new database and the legacy system database using near-identical SQL statements so we could compare row-by-row.
For example, we can run this query against our new database:
SELECT
*
FROM
(
SELECT
customers.identifier AS custid,
orders.identifier AS ordid,
stores.identifier AS storid,
products.identifier AS prodid,
order_allocations.quantity AS qty
FROM
order_allocations
INNER JOIN stores ON stores.id = order_allocations.store_id
INNER JOIN customers ON customers.id = stores.customer_id
INNER JOIN order_products ON order_allocations.order_product_id = order_products.id
INNER JOIN products ON products.id = order_products.product_id
INNER JOIN order_groups ON order_groups.id = order_products.order_group_id
INNER JOIN orders ON orders.id = order_groups.order_id
) ordline
WHERE
custid = 'MMM'
AND ordid = 'C010917A'
ORDER BY
prodid,
storid;
And then run this query against our legacy Visual FoxPro database:
SELECT
*
FROM
ordline
WHERE
custid = 'MMM'
AND ordid = 'C010917A'
ORDER BY
prodid,
storid;
The key here is using the complex query as a sub-query, assigning the field names to match the legacy system’s field names, and using the legacy system’s names. This reshapes our new system’s data structures to match the simpler legacy system structure.
While this is just one piece of the differencing puzzle, it is a simple approach which allows us to query each database with the same field names, the same predicates, and the same record order. This allows us to more easily compare the two datasets on a row-by-row basis, even when one of the systems has a more complex record structure.