By Allan Fromen
Big Data is big news. Not a day goes by without some article proclaiming how Big Data will solve long standing social and economic problems, in areas of education, healthcare, public policy, workforce efficiency, supply chains, and so on.
And make no mistake – Big Data is big business. IDC predicts the Big Data and Analytics market will grow 50%, from $40 Billion in 2014 to $60 Billion by 2019. Companies clearly see the potential to extract meaningful intelligence from Big Data. With innumerable sources of data, if we can only analyze it effectively, we will usher in a new dawn of actionable insights that will drive transformation, innovation, and profits.
So why have we not yet seen the Big Data revolution? Where are all the Big Insights we have been waiting for?
The problem isn’t the amount of data. In fact, the Digital Universe Study by EMC (in partnership with IDC – full disclosure) estimates that the amount of data is doubling every two years and will reach 44 trillion gigabytes by 2020. Yet this same study estimated that “less than 5% of the useful data was actually analyzed.”
So we are awash in data sets but are only utilizing a tiny fraction. Why is that?
Imagine the Marketing department wants to explore whether certain attitudinal measures drive tangible benefits to the business. Easy enough. Survey your customers and correlate the survey data with database metrics, such as number of visits, average wallet size, and so on. However, there will inevitability be numerous data issues to consider on the back end. Should we remove outliers from the survey? Does an atypical distribution of survey responses indicate an anomaly to be treated with suspicion or an important sub-segment of the market that we discovered? Our database metrics will pose an even greater challenge, as some customers will have data on certain variables and not others. Do we restrict our analyses to only customers with the full set of database metrics? If not, how do we treat missing data? What about the even greater challenge of integrating third party information?
Even in this fairly straightforward example, there are many decisions to be made. The process of cleaning and preparing the data for analysis would likely take many weeks. The challenges are exponentially greater with Big Data, as the data points are novel, numerous, diverse and – perhaps most importantly – in different formats. According to Dan Vesset, IDC’s lead for Big Data and Analytics, a data scientist spends a full 80% of their job on data preparation and cleaning. It’s no wonder Big Insights are so elusive, when so little time is spend on the actual analysis.
About a year ago, I wrote Why Big Data Will Never Replace Market Research. In that time, the enthusiasm for Big Data has certainly intensified. But what is missing from the conversation is a healthy dose of skepticism – not in Big Data’s potential, but rather in how easy it is to derive meaningful insights from multifaceted, and often unstructured data.
While the promise of Big Data is sexy and prone to attention grabbing headlines, the sober truth is that most Big Data work is boring and tedious. There is certainly no doubt that Big Data is the future. But the road to Big Data is paved with the challenging work of cleaning, preparing, and integrating systems that were designed in silos, before we understood the value of aggregating and analyzing disparate data points. And that is why Big Insights are still so elusive.