Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000. Additionally, Wayne has taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at email@example.com.
This is the first article in a 3-part blog series.
What is decision integrity and why does it matter?
A 2019 Intellyx whitepaper, titled “Decision Integrity: The Growing Complexity of Data is Sabotaging Your BI Initiative,” highlights an important issue:
“With the volume and number of BI data sources growing exponentially, how do you ensure business decisions are based on trustworthy data?
After reviewing the information carefully, senior executives sometimes begin to suspect that BI reports are wrong! However, the challenge of unravelling where the potential issues may be is daunting:
- Are BI reports rendering incorrectly?
- Do problems exist in the report logic—or related business rules?
- Do problems exist with data feeds into the BI system?
- Is data being loaded from the appropriate data warehouse instance(s)?
- Do issues exist in the extraction, transformation, load, or sources of the various data feeds driving data into the data warehouse?“
Decision integrity is defined as “trust in high-stakes decisions that span multiple data sets, systems, and business units.”
Decision integrity emphasizes the need to go beyond isolated data checks in a DW/BI workflow process to adopt a broad, multiphase end-to-end approach to ensure quality across all data transfers and transformations. As the Intellyx paper states, data needs to arrive in BI reports with a contextual sense to the decision process it was meant to support.
The purpose of such a process is to assure that relevant, individual, data point quality checks after each ETL, data transformation, data cleansing, BI data reporting are implemented to support the business intelligence decision process for which each data point was designed.
See Figure 1 for a sampling of data movements during a data warehouse ETL workflow process.
Why reconcile/verify data in DWH/BI systems
An actual loss of data quality may occur for these reasons: (source: SAP)
- Inconsistent/dirty data that is not cleansed before or after loading from sources
- Logical flaws in ETL data exception/transformation rules
- Data exceptions that were not processed during ETLs
- Data constraint violations
- Loss of data in the data exception repositories
- Hardware related issues, e.g., loss of files, data connectivity, system failures)
- Manual intervention in source data entry
- Null and incorrect data values
- Missing records and fields within records
- Duplicated records
- Incorrectly formatted values
- Broken relationships across tables or systems
An assumed loss of data quality may occur when:
- End users do not have visibility and understanding of the modified and transformed data
- End users are not aware of all the data transformation/exception rules
- Data excluded during validation is not preserved and is not visible
- Data exception/validation rules are understood differently by various users
- Subtle modifications are made to data sources which result in inconsistent data transformation rules
- There is a high complexity of data extraction and loading processes (e.g., extraction via delta or full loads)
- Regular human intervention is required during ETL workloads
- Frequent modifications are made to standard extraction processes
- Frequent changes are made to business rules and the validation processes
Either actual or assumed losses of data quality should initiate data reconciliation action. Unless a data warehouse was built to accommodate the efficient reconciliation process, the effort to reconcile data would become an enormous and difficult task.
Next in the series: how a robust and flexible solution for data validation and reconciliation supports decision integrity.