This article by Jason English was originally published on cio.com
The big one has been coming for a while. The combined pressures of real-time business data streaming in from multiple sources, and the Moore’s law-defying increase in storage and compute scalability of hybrid cloud, has led us to this inevitable data explosion.
The impact of this data explosion?
A loss of business intelligence. A resulting lack of decision integrity, because of the complex data that informed business decisions can no longer be trusted.
The sky is falling, the data is failing
I’ve been warning about this phenomenon for a while. [Read “The growing complexity of business data is sabotaging your business intelligence”].
But what if it’s too late to reverse the chain reaction? What if your BI process is already compromised, and you’re underwater in an unmanageable swamp of data, which produces faulty results?
In these situations, damage control alone won’t help. Frantically working to close the gap long enough to patch it, and seal it, will require more than human effort. We need to look for more powerful countermeasures.
We need ways to reduce the blast radius of bad data reaching BI reports, and reduce the impact of the bad decisions being supported by flawed and inconsistent data.
Only through total accountability and complete automation will enterprises be able to plan a way forward: to a state where business intelligence data consistently supports decision integrity.
Too many manual steps cause a chain reaction
The CIO or CDO faced with such a data explosion can take the most obvious route: throw more data scientists, analysts, and accountants at the problem.
This practice is so ingrained in most large enterprises that they will have hundreds or thousands of controllers or auditors toiling away on massaging and reconciling data in order to generate a more accurate picture for management to act on.
These sweatshop-like ‘hidden data factories’ are just considered the cost of doing business, but the labor never scales fast enough to clear the backlog.
Manually fixing data on both ends of a data pipeline is no fun, and therefore, finding willing data analysts with enough technical and business acumen to do the job is hard and expensive.
To relate this to the supply chain ‘bullwhip effect’ of factory planning, small miscalculations of signals used in demand or supply forecasts can cascade into urgent order costs, inventory overruns, work bottlenecks, and delivery failures weeks or months later. What’s more, the blast radius of mistakes due to bad data increases over time.
The failure conditions exist wherever enterprises have hidden data factories, manually massaging the data feeding business intelligence: forecasting customer demand, planning capacity, delivering goods and services, and balancing cost versus revenue. Mistakes are bound to happen.
How do we get business leaders out of this data sweatshop mentality, so they will recognize that their entire process needs automation to recover?
Accountability doesn’t live in silos
Companies seeking an answer to faulty information can choose from hundreds of data testing and validation options, both proprietary and open source. At this point, one reasonable solution is to isolate testing to prevent cross-contamination in the results.
Most major data platforms have their own tools for recognizing errors in data migration and processing in their own world. For instance, SAP and Informatica have modules to tell you if a given data block was imported from a bad source file, or corrupted when it was moved in an ETL (Extract, transform and load) process.
Take for example a very large multinational car company receiving and sending forecast and production information with more than 700 dealerships, and additional resellers.
They receive EDI sales data into their SAP retail management system, spot-checking it with a proprietary tool for the job. Then the data is moved to MicroStrategy for forecasting, which has its own tool enabling auditors to validate that the data shown is formatted correctly for the scenario they are running. Finally, decision data is communicated to an Oracle system that manages corporate financials, and another SAP system for production scheduling.
Now let’s say there’s a sizable number of preorders and requests to demo next year’s hot new electric car, but the default setting on the order form was set to “Aqua blue” paint color. Without automation and machine learning of scenarios that cross all these silos, none of the above siloed systems detects an anomaly in testing, and hundreds of light blue cars are eventually produced shipped to dealers. The bullwhip effect of excess taxable inventory, discount sales, and returns impacts the company over the next four quarters.
From containment to attainment
Short of having the dealers in the above scenario create their own hidden data factories, and waste millions of dollars hiring workers to reconcile source data with their own ordering systems, what can be done to make strategic decisions an advantage again?
It’s up to the CIO to mandate a stronger automation approach that will not only fix the existing faulty data, but also transform the siloed data quality effort into a bulletproof end-to-end process that prevents such issues from entering into the system. A large part of this is to ‘shift left’ the testing of decision support data from its sources, all the way to execution. A robust automation suite might run as many as 200 million tests in a monthly forecast cycle, and for good reason. Planning really is that complex.
One mid-sized bank spent two years and thousands of man-hours building more 700 such end-to-end scenario tests, and still had decision lag at every planning cycle. Since the tests were largely handmade, they were brittle when confronted with unknown source data and business scenarios and lacked the AI to flexibly absorb out-of-bounds data into the scenarios.
In this case, the bank leveraged a test automation solution they already had from Tricentis, and employed Tricentis’ BI/Data Warehouse Testing to allow massive parallel execution of queries against all the possible decision models that could be run. In less than two days, they were able to trust their next decision cycle, knowing that even out-of-bounds scenarios had been run and anomalies in the source data would be observable.
The Intellyx Take
Most CIOs and CDOs think they are governing data the best they can. They have policies for how to move, store, and secure data.
That’s not going to cut it. When it comes to assuring the quality of that data for decision purposes, and the process that created the data in the first place, they are still in containment mode.
Rather than simply thrashing about to reduce the blast radius, the optimal strategy should truly monetize the value of that data by leveraging it for faster, more accurate decisions that provide a competitive advantage to the business.
©2019 Intellyx LLC. Intellyx retains final editorial control of this article. At the time of writing, Tricentis is an Intellyx customer. None of the other companies mentioned in this article are Intellyx customers. Image source: Jason English, Intellyx infographic