Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000. Additionally, Wayne has taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at firstname.lastname@example.org.
If you are a data warehouse or BI architect, developer, tester, project manager, or business analyst responsible for integrating data feeds into a data warehouse, you want to make sure that your data is of the highest quality. This article explains how to define and measure the quality metrics that are essential for ensuring that your DWH/BI solution will be successful and deliver accurate and timely results to end users.
By quality metrics, we mean quantitative measures that help estimate the progress, quality, or health of a testing effort. A particular metric quantifies the degree to which a system, system component, or process manages a given attribute. Measuring metrics around test planning, test development, test execution, and test completion is key for enabling project teams to improve and control those processes—as well as demonstrate progress and successes.
Defining the Components of DWH/BI Information Quality
DWH/BI information quality encompasses both the quality of the data warehouse databases and the quality of BI reports (see Figure 1). Your QA team must be able to confidently assure DWH/BI information quality because BI reports are a primary and constant tool for developing your organization’s strategic decisions.
Figure 1: Components of DWH Information Quality
Here are descriptions of the DWH/BI Information Quality sub-components shown above:
Data model quality: Data model definitions and schemas must be reviewed/inspected by stakeholders so that their overall implementation quality is excellent in representing each of the DWH data sources and DWH targets.
Data quality: All DWH/BIdata must be fit to serve its purpose described in your business requirements. Data quality is often minimally determined by factors such as data accuracy, completeness, reliability, relevance, and being up-to-date for use in enterprise decision making and planning.
BI reporting quality: BI reports should be designed and tested to meet all business and technical requirements documented for the DWH/BI project. Report data quality includes assurance of data precisions, aggregations, summarized field values, date formats, query results, drill-downs, and much more.
Planning DWH/BI Tests to Meet the Needs of QA Metrics
While planning DWH/BI information quality testing metrics, the following recommendations should be implemented to assure DWH/BI information quality. DWH/BI teams should decide which testing metrics will most directly affect project success. The answer may be different for each project.
Here are few recommended testing metrics to consider for your DWH/BI project:
DBMS tests should be planned to report whether duplicate data has been loaded from sources to data warehouse targets. They should check that there are no null (empty) data values existing where they were defined as not null. They should ensure that data is properly cleansed, enriched, and aggregated according to business and transformation rules defined for your project. They should also demonstrate that DWH primary and foreign keys were properly generated to maintain relational data integrity.
Data model validations should be planned to report the number of records loaded and rejected (in error) from each source to each DWH target, the number of schema entities defined or implemented with defects, the number of data types, the data formats, and the data lengths in error.
Data defect metrics should be considered to report on DWH data element and attribute correctness, completeness, truncations, timeliness, precision, and compliance with definitions in the source to target mappings. Further, that changed data from incremental data loads deliver correct data representing inserts, updates, and deletes.
BI report quality verifications should include metrics that represent the degree of validity for data calculations, aggregations, report field precisions, report drill-downs, and date formats. Also of great value are metrics showing the total use cases with defects, the total number of requirements vs. the number of requirements covered through test scripts, number of open defects, BI reports or functions with the most defects, and number of test cases yet to be run.
Additional metrics for any of the four DWH/BI Information Quality categories above should ideally include test case preparation metrics and productivity metrics (e.g., tests run, not run) as well as defects discovered by priority/severity. See your local organization’s sources of information about DWH/BI QA for other metrics which are common to most software testing.
The QA team should provide stakeholders testing metrics that powerfully illustrate testing progress. Keep in mind that QA metrics are used by major project stakeholders when making go and no-go decisions which often determine the fate of a project.
QA metrics reports must be designed to be trusted and used to make actionable project decisions. The metrics developed during DWH/BI testing offer a major benefit to project managers and executives: visibility into the maturity and readiness for release or production, and visibility into the quality of the software product under development. QA reports enable effective management of the software development process by allowing a clear measurement of the quality and completeness of the DWH/BI product.