This article by Jason English (Intellyx analyst) was originally published on CIO.com
Enter the war room. The whole C-level team is on deck as your latest quarterly figures appear to tell a different story than expected. The BizOps report shows that specific regions are achieving greater than expected top-line growth, but global news of an economic slowdown suggests the opposite is true.
After reviewing the information carefully, senior executives suspect that the reports are wrong. However, unravelling where the potential issue lies is a daunting task:
- Is the report rendering incorrectly?
- Is there a problem in the report logic—or the business logic around it?
- Is there a problem in the data feed into the BI system?
- Is the data being drawn from the appropriate data warehouse instance(s)?
- Is there an issue in the load, transformation, extraction, or source of the various data feeds driving data into the data warehouse?
The potential for complexity is immense. Just 10 years ago, most large enterprises had only a handful of core data services (3 on average). Even if that picture was a bit rosy, with the advance of cloud, process outsourcing to SaaS and partners, mobile work, and connected IoT devices, the total number of possible enterprise data sources has ballooned into the millions, feeding into specialized data aggregators and warehouses that could run just about anywhere in a hybrid IT infrastructure.
We’re in the midst of a data explosion, and mission-critical data has burst far beyond the scope of traditional data quality checks. A new approach is needed to ensure decision integrity: trust in high-stakes decisions that span multiple data sets, systems, and business units.
The market forces that got us here
So much of a company’s performance depends upon executive decisions. And executives depend on accurate data and business context to inform critical decisions.
Poor data quality and a lack of real visibility into the context of data underneath BI systems can lead to faulty decisions with huge impacts on earnings. Bad BI data is like bad debt and comes at a steep cost. An estimated $3.1 trillion is lost per year in the United States alone due to poor data quality, and the costs and labor involved in realigning or repairing the data that business leaders need.
Executives want confidence that they see exactly the data they need – and that it hasn’t been changed or altered inadvertently anywhere in its journey from its source to the dashboard.
An imminent data explosion
The clock is always running out on business decisions. Changing strategy based on the wrong data is disastrous. Failing to move by missing key indicators is an equally bad option.
Is the data entering BI systems a ticking time bomb for business leaders? It used to be common for a decision support system to subscribe to a handful of data providers – perhaps an internal accounting feed, economic indexes, an industry report.
Now, data can come from everywhere: from millions of connected users on apps, thousands of news feeds and sources, a universe of millions of IoT devices and automated reporting processes. We’ve seen massive expansion in the scope of data, thanks to decreased storage costs, increased computing power, and the move to storing everything in elastic public cloud resources and hyperconverged private clouds.
If the trend of 90% of the world’s data being created in the last 2 years continues, you can immediately see that we will run into problems directing this pipeline of data into productive channels. Once manageable repositories and data cubes will become less differentiated data lakes, and eventually data swamps.
Lighting the fuse with data migration
There are already utilities in place within the leading database, BI, and data management vendors for ensuring successful transport “inside the cylinder” of a data pipeline. For instance, you might use BigDataCo’s platform to extract/transfer/load (ETL) data from one data warehouse to another location, and it can do a pretty good job of managing data integrity of that ETL process on a record-for-record basis.
Standard migration tools are great for ensuring that data isn’t corrupted while inside the pipeline of one vendor’s approved systems. Unfortunately, businesses are demanding more dimensions of data, and a variety of sophisticated analytics and reporting tools alongside common ones for a competitive edge.
What happens in a BI scenario if you aren’t entirely sure that the data collected from multiple sources was classified and routed to its appropriate destinations? Data needs to arrive with a contextual sense to the decision process it was meant to support.
Data integrity may be a rote process by now, but data interpretation and routing in complex distributed environments is not. It’s no wonder that a majority of the world’s CDOs surveyed say they are unsatisfied with the quality of their business data, and that it is holding them back from faster upgrades.