Editor’s note: Wayne Yaddow is an independent consultant with over 20 years’ experience leading data migration/integration/ETL testing projects at organizations including J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, Oppenheimer Funds, IBM, and Achieve3000.  Additionally, Wayne has taught  IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. He continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.

Data warehouse testers with data integration QA skills are in demand.

Data warehouse disciplines and architectures are well established and often discussed in the media, books, and conferences. They have become a standard necessity for most modern organizations. Each organization often uses one or more data warehouse systems to make business decisions every day. The data warehouse and associated business intelligence reports have undeniably become a standard component of modern business infrastructures.

New business models, constant technological progress, and ever-changing legal regulations require that companies periodically replace and update their business applications. In the process, businesses must integrate data  from existing  source  applications  to a target  application – often supported by a data warehouse.  Successfully completing such data migration and integration processes requires a stringent data integration process model combined with well-defined quality assurance measures.

Data warehouse testers are instrumental in ensuring that data integration processes go smoothly (and that data integrity issues don’t ultimately cost their companies millions of dollars).  As a discipline, data warehouse testing covers much more than technology. It includes roles and organizational structures, as well as processes for monitoring, measuring, reporting and remediating data warehouse issues. The technical skills required to become a data warehouse tester are unique and demanding, but the rewards can be many. For those who enjoy testing and hold a strong interest in data quality, data warehouse testing may be a great career step. Below are our top tips for getting started with data warehouse testing, as well as advancing your mastery of it.

Understanding Data Warehousing Architectures and Concepts

First, QA analysts should understand the terminology used in data warehousing as well as the basic flows of data and testing that are typical on an ETL QA project. Data models and data mapping documents are a standard means for expressing DWH requirements and architecture. Therefore, the ability to grasp DWH requirements, then create test scenarios, is a must.

A good starting point is to read the Ralph Hughes book, “Agile Data Warehousing for the Enterprise”, MK (Morgan Kaufmann) Publishing, and “Databases, a Beginner’s Guide”, Andy Oppel, McGraw Hill Publishing. These books offer an overview as well as detail about the related architectures and terminology. Additionally, Improving Data Warehouse and Business Information Quality”, Larry English, Wiley Publishing” offers several chapters on assessing, profiling, and assuring data quality.

Data is copied between different systems, at different locations, and updated in several phases. Data flows through the process…

  • at high volumes
  • in different formats
  • from multiple sources (internal, external)
  • through multiple platforms (SQL, No-SQL, Hadoop)
  • via on premise or cloud infrastructures

None of the above usually play well with each other. Such complexity of data movement makes it problematic to validate every piece of data.

Challenges of Data Warehouse Testing

Testers moving to this new realm of QA should walk in with an understanding of the challenges that data warehouse testing frequently presents. For example, some of the top challenges that testers face are:

  • Data models and data mapping documents in many ways represent project requirements—and, as such, are unique to data testing.
  • A variety of data management systems are often implemented, including those from Oracle, Microsoft, and IBM. More than a few organizational data warehouses employ multiple database products from these and other vendors.
  • Databases are often so large as to necessitate tests based on a variety of sample data. Choosing the best methods of sampling (or not) is a necessary skill.
  • Only a few test tools exist that cover the many needs for data testing (ex., Tricentis Tosca BI, Informatica IDQ), so testers should be creative with manual quality assurance while learning and implementing as many of the available tools as possible. With expectations high but skilled staff in short supply, QA managers should seek out powerful yet easy-to-use, tools that can be quickly deployed without an expensive overhaul of the budget. DWH and BI testing tools are offered by Tricentis, Informatica, and CA Technologies.
  • A good understanding of SQL queries, data profiling methods, Excel, and DB editors is essential.
  • The ability to assess readiness for data warehouse testing during and after test planning is a big challenge.

Planning for the Data Warehouse Types and Test Cases

Initiating and implementing the data warehouse test planning process may be a new experience for DWH test candidates. You can ease this effort through reviews of existing data warehouse “test planning templates”,  “master test plans”, “test strategies/approaches”, and data integration “end to end” test plans. Through observations of  DWH planning templates and associated checklists,  those wanting to become data warehouse testers will be able to start creating effective plans of their own.  In such documents, testing topics and test scenarios unique to data warehouse testing will be discovered. For example:

  • Common ETL goals and objectives as checklists for test coverage
  • Examples of defects frequently found during ETL testing—plus ways of writing test cases to find them
  • Recommended ETL test scenarios and test cases
  • Estimating DWH test resources and schedules
  • What to look for in source to target data profiling
  • Formal QA entry and exit criteria for data build deployment
  • Examples and walkthroughs of ETL test scenarios
  • Functional testing: (e.g., security, performance, ETL error logs, regression testing guidelines…)
  • Ways to assess DWH/BI/ETL test plans and test readiness

Test Data Planning and Management

Those new to data warehouse testing should learn how best to address the complications often encountered when planning, then selecting, test data for the data warehouse QA effort. Your ability to make wise choices depends upon your understanding of the variety of choices for data selection, and the challenges associated with each.

Characteristics of a Test Data Management Strategy

  • Generate an adequate quantity of required test data from all data sources while maintaining referential integrity after loading to the DWH
  • Create synthetic data for any missing data (e.g., data needed for negative testing)
  • Carefully schedule test data extraction and load processes to simulate the production process
  • Reduce the security risk in test data for sensitive/personal content
  • Manage test environments for test data preparation and cleanup – gain DBA and configuration management support
  • Provide access control to test data sets that are consumed by multiple QA users

Examples ofTest Data Planning Challenges

  • Availability of essential test data to verify and validate all business uses cases and rules
  • Identifying all source tables, files, etc.—the constraints and dependencies
  • Understanding the range of possible values for all fields (Include boundary values)
  • Ability to create the required test data volumes from heterogeneous data sources that support the test environment
  • Number of resources (people, tools) needed to develop traceability for business requirements à to test cases à to test data

Available test data management tools include Informatica’s Test Data Management, IBM’s InfoSphere Optim, Solix’s EDMS Test Data Management, and Tricentis’ TDM.

Tester Skills for Data Warehouse Testing

Following is a data warehouse tester job description that represents what organizations are often seeking.

  • Solid understanding of data warehousing concepts, architectures, and processes
  • Understanding and experiences with relational and dimensional database structures
  • Experience with ETL & BI test planning and testing
  • Strong understanding of ETL mapping documents
  • Formal training in SQL queries: Oracle / SQL Server /DB2
  • Strong with SQL query scripts based on ETL mapping documents to compare data
  • Ability to use a variety of DB editors, Excel and MS Access for analysis of test query results
  • Strong in ETL data validation: Informatica / IBM Datastage / Microsoft SSIS
  • Exposure to end-to-end data validation for ETL & BI systems
  • Strong in BI report validation in Cognos / Business Objects / Microstrategy  / SSRS BI environments
  • Ability to work with SMEs (project “subject matter experts”) to resolve gaps/questions in requirements
  • An aspiration to assist developers in recreating test failures leading to problem resolutions
  • Solid understanding of business intelligence (BI) dashboard tools such as Oracle OBIEE
  • Experience partnering with Data Stewards, Data Architects, and Software Engineers
  • Experience with automating DWH testing to include scripting and tools (e.g., Tricentis Tosca BI)

Concluding Remarks

Data warehousing projects can fail for many reasons: Poor data architecture, inconsistently defined data, inability to relate data from different data sources, missing and inaccurate data values, inconsistent use of data fields, unacceptable query performance, and so forth.

These project risks can be diminished with well-trained and motivated testers who provide ongoing support from the earliest phases of the data warehouse development. We hope that this article helps you take the first steps towards that end.

2 Comments

  • Sebastian says:

    Thanks for this great post!
    Short note, under “Planning for […]” there’s a list item: “Functional testing: (e.g., security, performance, ETL error logs, regression testing guidelines…)” – security, performance aspects are, however, very standard examples for Non-Functional testing. Is this different for data quality?

  • Wayne Yaddow says:

    Hi Sebastian. Thanks for your question. I’m glad you like the post.

    Re. security and performance verifications as functional testing, it’s my experience that security, for example, is a capability that must be explicitly and intentionally implemented for a DWH/BI project to meet stated requirements. Therefore, how security functions were implemented should be verified. Following are two examples of DWH/BI security features that should be verified if implemented.
    Authentication – the process of identifying a person, usually based on a logon ID and password. This process is meant to ensure that the person is who he or she claims to be. There are several levels of authentication depending on how sensitive the data is. The first level consists of a simple, static password, followed by a system-enforced password pattern and periodically required changes. An organization with a DW solution should at least have this security method implemented and tested.
    Authorization – the process of determining what specific content a user is allowed to access. Once users are authenticated, the authorization process defines the access policy. Authorization is often a complex problem in the DW system because limiting access can have significant maintenance and computational overhead. Authorization features must be tested to assure they meet requirements.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

X
X