Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000. Additionally, Wayne has taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at email@example.com.
Data warehouse testers with data integration QA skills are in demand.
Data warehouse disciplines and architectures are well established and often discussed in the media, books, and conferences. They have become a standard necessity for most modern organizations. Each organization often uses one or more data warehouse systems to make business decisions every day. The data warehouse and associated business intelligence reports have undeniably become a standard component of modern business infrastructures.
New business models, constant technological progress, and ever-changing legal regulations require that companies periodically replace and update their business applications. In the process, businesses must integrate data from existing source applications to a target application – often supported by a data warehouse. Successfully completing such data migration and integration processes requires a stringent data integration process model combined with well-defined quality assurance measures.
Data warehouse testers are instrumental in ensuring that data integration processes go smoothly (and that data integrity issues don’t ultimately cost their companies millions of dollars). As a discipline, data warehouse testing covers much more than technology. It includes roles and organizational structures, as well as processes for monitoring, measuring, reporting and remediating data warehouse issues. The technical skills required to become a data warehouse tester are unique and demanding, but the rewards can be many. For those who enjoy testing and hold a strong interest in data quality, data warehouse testing may be a great career step. Below are our top tips for getting started with data warehouse testing, as well as advancing your mastery of it.
Understanding Data Warehousing Architectures and Concepts
First, QA analysts should understand the terminology used in data warehousing as well as the basic flows of data and testing that are typical on an ETL QA project. Data models and data mapping documents are a standard means for expressing DWH requirements and architecture. Therefore, the ability to grasp DWH requirements, then create test scenarios, is a must.
A good starting point is to read the Ralph Hughes book, “Agile Data Warehousing for the Enterprise”, MK (Morgan Kaufmann) Publishing, and “Databases, a Beginner’s Guide”, Andy Oppel, McGraw Hill Publishing. These books offer an overview as well as detail about the related architectures and terminology. Additionally, “Improving Data Warehouse and Business Information Quality”, Larry English, Wiley Publishing” offers several chapters on assessing, profiling, and assuring data quality.
Data is copied between different systems, at different locations, and updated in several phases. Data flows through the process…
- at high volumes
- in different formats
- from multiple sources (internal, external)
- through multiple platforms (SQL, No-SQL, Hadoop)
- via on premise or cloud infrastructures
None of the above usually play well with each other. Such complexity of data movement makes it problematic to validate every piece of data.
Challenges of Data Warehouse Testing
Testers moving to this new realm of QA should walk in with an understanding of the challenges that data warehouse testing frequently presents. For example, some of the top challenges that testers face are:
- Data models and data mapping documents in many ways represent project requirements—and, as such, are unique to data testing.
- A variety of data management systems are often implemented, including those from Oracle, Microsoft, and IBM. More than a few organizational data warehouses employ multiple database products from these and other vendors.
- Databases are often so large as to necessitate tests based on a variety of sample data. Choosing the best methods of sampling (or not) is a necessary skill.
- Only a few test tools exist that cover the many needs for data testing (ex., Tricentis Tosca BI, Informatica IDQ), so testers should be creative with manual quality assurance while learning and implementing as many of the available tools as possible. With expectations high but skilled staff in short supply, QA managers should seek out powerful yet easy-to-use, tools that can be quickly deployed without an expensive overhaul of the budget. DWH and BI testing tools are offered by Tricentis, Informatica, and CA Technologies.
- A good understanding of SQL queries, data profiling methods, Excel, and DB editors is essential.
- The ability to assess readiness for data warehouse testing during and after test planning is a big challenge.
Planning for the Data Warehouse Types and Test Cases
Initiating and implementing the data warehouse test planning process may be a new experience for DWH test candidates. You can ease this effort through reviews of existing data warehouse “test planning templates”, “master test plans”, “test strategies/approaches”, and data integration “end to end” test plans. Through observations of DWH planning templates and associated checklists, those wanting to become data warehouse testers will be able to start creating effective plans of their own. In such documents, testing topics and test scenarios unique to data warehouse testing will be discovered. For example:
- Common ETL goals and objectives as checklists for test coverage
- Examples of defects frequently found during ETL testing—plus ways of writing test cases to find them
- Recommended ETL test scenarios and test cases
- Estimating DWH test resources and schedules
- What to look for in source to target data profiling
- Formal QA entry and exit criteria for data build deployment
- Examples and walkthroughs of ETL test scenarios
- Functional testing: (e.g., security, performance, ETL error logs, regression testing guidelines…)
- Ways to assess DWH/BI/ETL test plans and test readiness
Test Data Planning and Management
Those new to data warehouse testing should learn how best to address the complications often encountered when planning, then selecting, test data for the data warehouse QA effort. Your ability to make wise choices depends upon your understanding of the variety of choices for data selection, and the challenges associated with each.
Characteristics of a Test Data Management Strategy
- Generate an adequate quantity of required test data from all data sources while maintaining referential integrity after loading to the DWH
- Create synthetic data for any missing data (e.g., data needed for negative testing)
- Carefully schedule test data extraction and load processes to simulate the production process
- Reduce the security risk in test data for sensitive/personal content
- Manage test environments for test data preparation and cleanup – gain DBA and configuration management support
- Provide access control to test data sets that are consumed by multiple QA users
Examples ofTest Data Planning Challenges
- Availability of essential test data to verify and validate all business uses cases and rules
- Identifying all source tables, files, etc.—the constraints and dependencies
- Understanding the range of possible values for all fields (Include boundary values)
- Ability to create the required test data volumes from heterogeneous data sources that support the test environment
- Number of resources (people, tools) needed to develop traceability for business requirements à to test cases à to test data
Tester Skills for Data Warehouse Testing
Following is a data warehouse tester job description that represents what organizations are often seeking.
- Solid understanding of data warehousing concepts, architectures, and processes
- Understanding and experiences with relational and dimensional database structures
- Experience with ETL & BI test planning and testing
- Strong understanding of ETL mapping documents
- Formal training in SQL queries: Oracle / SQL Server /DB2
- Strong with SQL query scripts based on ETL mapping documents to compare data
- Ability to use a variety of DB editors, Excel and MS Access for analysis of test query results
- Strong in ETL data validation: Informatica / IBM Datastage / Microsoft SSIS
- Exposure to end-to-end data validation for ETL & BI systems
- Strong in BI report validation in Cognos / Business Objects / Microstrategy / SSRS BI environments
- Ability to work with SMEs (project “subject matter experts”) to resolve gaps/questions in requirements
- An aspiration to assist developers in recreating test failures leading to problem resolutions
- Solid understanding of business intelligence (BI) dashboard tools such as Oracle OBIEE
- Experience partnering with Data Stewards, Data Architects, and Software Engineers
- Experience with automating DWH testing to include scripting and tools (e.g., Tricentis Tosca BI)
Data warehousing projects can fail for many reasons: Poor data architecture, inconsistently defined data, inability to relate data from different data sources, missing and inaccurate data values, inconsistent use of data fields, unacceptable query performance, and so forth.
These project risks can be diminished with well-trained and motivated testers who provide ongoing support from the earliest phases of the data warehouse development. We hope that this article helps you take the first steps towards that end.