be_ixf;ym_201910 d_15; ct_100
image

BI / DWH Testing

Setting Up Data Warehouse/BI Projects to Support Data Reconciliation

Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000.  Additionally, Wayne has taught  IIST (International Institute of Software Testing) courses on data warehouseETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.

This blog, which explains how a robust and flexible solution for data validation and reconciliation supports decision integrity, is the second article in a 3-part blog series. [Read Part 1 and Part 2]

All data information systems must be designed in a way to allow for transparent and straightforward data reconciliation during and after workflow processing. Even if precise and regular reconciliation is not required for the immediate needs, a data warehouse must be designed and built in a way to provide a clear view of inconsistent and unreconciled data. Even when BI reports objects don’t require high levels of accuracy, the end users must know the percentage/amount of data which is not accurate, and to know the reason for inaccuracy. [1]

Strategies for a Successful DW/BI Reconciliation Process

Utilize Your Data Warehouse Data Dictionary

An important key to a successful data dictionary defined and agreed with the business community to ensure successful reconciliation. If definitions of major business terms (such as for Customer or Product) in data sources are different from the definitions used by a data warehouse, successful data reconciliation will be very problematic. However, data dictionaries will be an important aid.

Preserve the Original Data Extracted from Source Systems

The original data extracted from sources will often be required for reconciliation. There may be situations that the original data in the source system is not available (no longer exists) at the time when reconciliation is required. In this case the original data extract preserved in the data warehouse (database or file system) becomes a critical element for the reconciliation process.

Capture Historical Data Changes

The need to capture details of the data changes (updates and deletes), and the ability to report on the data changes (or at least easy access to the history of changes), can be underestimated both by business users and data warehouse developers. If data changes are not captured properly, then reconciliation of the data becomes very problematic.

Provide Easy Access to Historical Data Changes

Regularly used reconciliation routines often need ad-hoc reporting functionality to analyze data changes. For large data warehouses with frequently changed data, it is essential that the data model allows for efficient and easy access to all historical changes.

Provide Access to ETL Data Exceptions

The design of data exceptions, which may work efficiently for the data reconciliation process, is probably the most complex part of the overall data reconciliation architecture. The design is dependent on the number of factors related to the nature of data exceptions (referential integrity, data structure, specific constraints, etc.).

The most important requirements for good exception handling design suitable for the data reconciliation process are:

  • All data exceptions – managed and unmanaged – should be captured, preserved, and easily accessed for analysis
  • Exception handling should be organized into logical modules e.g. interface exceptions, data load exceptions, data validation exceptions, data integration exceptions
  • Exceptions should be classified and standardized (like exceptions due to attribute validation, referential integrity exceptions, data integration exceptions, duplicates)
  • Easy association between captured exceptions and processes/data transformations that created these exceptions
  • A clear strategy of how to manage non-integrated data from multiple sources (e.g. requirements may be quite different – from presenting all data including duplicates, to presenting only integrated data.
  • Thorough consideration is required on how to continue (or interrupt) the process in case of unmanaged exceptions
  • Clear business rules of how to manage exceptions after they are captured

Data Lineage – the Origin of all Data Elements

Tracing data elements from reports back to sources is usually a non-trivial task if a data warehouse integrates similar data from multiple sources. This task becomes very complex if the data passes through multiple transformations or in a heterogeneous data warehouse environment. Efficient data lineage is often a mandatory requirement for compliance or regulatory reporting.

Efficient data reconciliation should have the ability to identify the source of data at any point in the data flow.

Availability of Data Integration Rules and Constraints

Complex validation and data integration rules implemented into the data transformation tend to be forgotten or misunderstood by business users.

It is important that the data integration, validation and exception rules are easily accessible by business users (e.g. from mapping documents, in-line report comments, or online documentation).

Restoring Data After Technical Failures

Data warehouse processes may be interrupted due to technical failures. A common situation is when a lengthy or complex data transformation routine has been interrupted and the data was only incompletely loaded and transformed.

In this case, a reliable data reconciliation method will be required to determine from which point to restart the process, which particular data needs to be reloaded or removed.

Automation of Data Reconciliation and Data Lineage

Automation of data reconciliation (particularly in production) needs to be reviewed thoroughly during the design process. It may be beneficial for large, complex and heterogeneous data warehouses to make a reconciliation process as an integral part of the overall data flow – starting from the data extraction, data loads and finishing by the end user reports and analytics.

Data Reconciliation – Toolsets

Considerations for selecting data reconciliation tools:

  • A single reconciliation system catering to enterprise-wide reconciliation needs
  • Ability to easily configure simple to complex reconciliation scenarios
  • Reduced cycle time in handling complex business reconciliation scenarios
  • Ability to process high transaction volumes in a relatively short time
  • Improve operational efficiency – reconciliation tools should allow operational users to monitor, track, and investigate the reconciliation process efficiently

Automating Data Reconciliation Processes with Tools

Reconciliation without effective processes and tools is always an arduous and time-consuming process where staff manually, or with multiple tools, investigate numerous files, tables, and spreadsheets analyzing each data merge and transformation. This approach is inefficient, error-prone, and unsustainable.

Leveraging automation tool capabilities, teams can focus on data reconciliation results rather than the process itself. This allows teams to shift focus and add value in more intensive areas that require their professional and business-related expertise, typically in the form of analyzing and resolving exceptions.

Automated data reconciliation tools empower teams to eliminate repetitive, time-intensive, and error-prone manual tasks fully.

Takeaways

Large-scale technology disasters are rarely a consequence of a single factor. Mostly, they result from a set of flaws in software development and maintenance processes. Data reconciliation controls serve as an independent additional protection mechanism for complex systems and therefore should be considered as a necessary part of production infrastructures.

In order to perform reconciliation tests, one needs to have data reconciliation tools available. The  Quality Assurance team should strongly consider the possibility of implementing test tools capable of running passive data consistency checks.

These tools should be implemented with a potential opportunity in mind that they will also be used in production environments.

References

[1] Rachid Mousine, “Design Essentials for Reconciliation Data Warehouse”, Formation Data Pty Ltd