Data integrity testing

Testing for Business Intelligence & Data Warehouse Projects

Testing is part of every IT project plan – that’s true as well for Business Intelligence (BI) & Data Warehouse (DWH) projects. In the past, however, the practical implementation of software testing in the BI / DWH environment has posed problems again and again. Often I’ve had the impression that the BI / DWH world is still back in the Stone Age regarding development processes and environments. At least it is significantly behind the maturity level I know from the software engineering domain. The below chart illustrates this gap:

If there is something tested at all, typically it is in the BI frontend area where things are tested manually. In the DWH backend we see – besides manual tests – self coded test routines, e.g. in the form of stored procedures or dedicated ETL jobs. However the integration into a test case management tool and systematic evaluation of the test results simply doesn’t happen. This is a heavy contrast with software engineering where automated regression testing combined with modern development approaches like test driven design are applied. For some time we find first inputs regarding BI specific testing (cf. the (German) TDWI book here). Concepts and paper are patient though. Where are we in regards to a possible tool support, namely for the area of regression tests?

Since the summer of 2014, we at IT-Logix are actively looking for better (tool based) solutions for BI specific testing. We do this together with the Austrian company Tricentis. Tricentis develops the Tosca Testsuite, one of the worldwide leading software solutions for test automation. In a first step we ran a proof of concept (POC) for regression tests for BI frontend artifacts, namely typical reports. One of the architectural decisions was to use Excel and PDF export files as a base for our tests. With this choice of a generic file interface, the efforts to develop BI product specific tests were bypassed. And this way we reduced the implementation effort to about two days in the POC. The goal was to run “Before-After” tests in batch mode. We took 20 reports for the POC case (these were actually SAP BusinessObjects Web Intelligence reports, but you can imagine whatever tool you like as long as you can export to PDF and/or Excel). A current version of the PDF or Excel output of the report is compared to a corresponding reference file.

Typical real life situations where you can use this scenario are:

  1. Recurring scheduled regression tests to monitor side effects of ongoing DWH changes: The reference files are created, for example, after a successful release of the DWH. Imagine there are ongoing change requests on the level of your DWH. Then you want to make sure these changes only impact the reports where a change is expected. To make sure all the rest of your reports aren’t concerned by any side effects, you now run your regression tests e.g. every weekend and compare the produced files with the reference files.
  2. BI platform migration projects: If you run a migration project to migrate your SAP BusinessObjects XI 3.1 installation to 4.1 (for example), you’ll want to make sure reports still work and look the same in 4.1 as they did in XI 3.1. In this case you create the reference files in XI 3.1 and compare them with the ones from 4.1. (As the export drivers vary between the two versions, especially the Excel exports are not very useful for this use case. Still, PDF worked pretty fine in my experience)
  3. Database migration projects: If you run a database migration project migrating all your Oracle databases to Teradata or SAP HANA (for example), then you want to make sure all of your reports show still the correct data (or at least the data as was shown with the original data source…)

Tosca searches for the differences between the two files. For Excel this happens on a cell by cell basis, for PDF, we used a text based approach as well as an image compare approach.

Using the solution implemented during the POC we could see very quickly which reports were different in their current state compared to the reference state.

Another important aspect of the POC was the scalability of the solution approach as I work primarily with (large) enterprise customers. If I have not only 20 but hundreds of reports (and therefore test cases), I have to prioritize and manage the creation, execution and error analysis of these test cases somehow. Tosca helps with the feature to model business requirements and to connect them with the test cases based on that we can derive and report classical test metrics like test case coverage or test execution rate.

In my eyes, an infrastructure like Tosca is a basic requirement to systematically increase and keep the quality in BI / DWH systems. In addition, advanced methods like test driven development are only adaptable to BI / DWH undertakings if the necessary infrastructure for test automation is available.

This article has been republished with permission from the author. You can find the original article here in English and German.

Raphael Branger is the Chief Knowledge Officer and Senior Solution Architect at IT-Logix. Raphael is a regular speaker at national and international conferences. During 2015 you can listen to him at BI2015 / Nice (France) (about Agile BI and BO Frontend Tool Selection) and during BOAK (BusinessObjects Arbeitskreis) Zurich / Switzerland. Follow his blog here