By Kevin Gray
There’s a revolution going on in Analytics. But first, what is Analytics?
Analytics has gotten a massive amount of buzz in recent years, lately in connection with Big Data and Data Science. The term “analytics” is by no means new but, perhaps surprisingly, there is a lack of consistency in what it means or implies. Sometimes it is used to designate a process, from problem identification through recommended actions. It also can refer to inferential statistics like standard errors, confidence intervals and t-tests or to basic measures of association, e.g., the Pearson product-moment correlation or chi square. At other times, though perhaps couched in esoteric claims, it merely refers to descriptive statistics such as frequency counts, means and standard deviations or to commonplace graphics such as line graphs, histograms and pie charts. More sophisticated data visualization is also at times called analytics.
In addition, analytics can refer to an extensive assortment of multivariate statistical methods and machine learning algorithms. That usage is the focus of this article. These techniques can be classified in various manners and one way is to characterize them either as Interdependence methods or Dependence methods. A second point of differentiation pertains to whether a method is intended for Cross Sectional data or Time Series data.
Factor Analysis and Cluster Analysis are probably the best known Interdependence methods, though there are many others. Put very simply, Factor Analysis groups variables and Cluster Analysis groups observations, respondents in a consumer survey for example.
Dependence methods differ in that there is one or more Target (Dependent) variable we would like to explain or predict from one or more Predictor (Independent) variable. Many kinds of Dependence methods see extensive use in Marketing Research. They can be further subdivided according to whether the dependent variables are quantities, counts, ordered categories or nominal categories that have no natural order or rank. Regression and Discriminant Analysis in particular are well known in Marketing Research; the former is used when the dependent variable is quantitative (or we decide to treat it as such) and the latter comes into play when we wish to differentiate groups (e.g., User/Non-User).
Actually, it’s not quite this simple. Partial Least Squares Regression and some varieties of Structural Equation Modeling are a blend of Independence and Dependence methods!
The techniques described thus far have been designed for cross-sectional data, data collected at one point in time. Time Series Analysis is used when the data have been collected over many time periods. Weekly sales data are an example of Time Series data. Exponential Smoothing, ARIMA, Dynamic Regression, State Space and GARCH models are just a few examples of Time Series Analysis Methods. They are household words to Econometricians but more opaque to most of us in Marketing Research. Time Series Analysis plays important roles in Marketing Mix Modeling and ROI analysis as well as in sales forecasting.
Once again, though, things are not quite this simple! There are also methods appropriate for Within-Subjects (Repeated Measures) and Longitudinal data. An example of when Within-Subjects designs are suitable is when consumers are asked to evaluate two or more products, real or hypothetical, as in an in-home product use test (real) or conjoint study (typically hypothetical). The venerable Repeated Measures MANOVA might be familiar to some of you. Longitudinal designs are useful when we observe consumers’ behavior over time. Survival Analysis is one such method and in Marketing Research is used in customer churn modeling.
That was just the Old Stuff
Out of breath? Well, these are mostly “trad” methods. It would not be exaggerating to say there has been an explosion in the number and variety of analytic methods in recent years. Advances in computer technology have taken many methods off the drawing board and put them right onto our laptops. Mixture Modeling (a.k.a. Latent Class) is one example that not long ago was impractical on the computers most Marketing Scientists were using. It is proving very useful in segmentation as well as other kinds of analyses, in part because its ability to model different kinds of data (e.g., quantitative and nominal) at once.
Bayesian methods, which can be intricate and are not easy to describe in a nutshell, are seeing increasing use in Marketing Research. Put very, very simply, in Bayesian statistics we incorporate prior beliefs about the problem we’re studying directly into our analysis and then update our understanding of the problem we’re investigating when new data become available. From the outset we are explicit about uncertainty. Bayesian methods have some important advantages in comparison with the more recognizable Frequentist methods. They are often more adept at handling sparse and messy data, for instance.
There are a lot of methods that are being developed outside of university Statistics departments, most notably by computer scientists. Many are termed Machine Learning, though the way that term is used is often ambiguous. Machine Learners are often much better at pattern recognition (e.g., in text analytics) than the Statistical Methods we are used to. Some examples of these new methods, including those developed by statisticians, are Neural Networks, Support Vector Machines, Bayesian Networks and approaches utilizing boosting, and bagging.
These “non-trad” techniques are core methods in Data Mining and Predictive Analytics, nowadays often lumped together under the vaguely-used labels “Big Data” and “Data Science.” There is now a vast array of these methods and many are also handy for analysis of consumer survey data, including segmentation and driver analysis. One important downside many of these methods share, however, is that their results are often difficult to interpret; while they are adept at prediction (the “What”) they are often not as useful for helping us understand the “Why” as traditional methods. Fortunately, in many cases the two can be used in combination to get the best of both analytic worlds.
Whatever the analytic methods used, it is also now easier than ever to perform various kinds of “What if?” simulations to make educated guesses about what might happen under various marketing scenarios, such as the introduction of a new product or competitor activity. Done prudently, simulations can help our models speak to us and guide decisions we need to make.
What does all this imply?
The foregoing is only a sample of the methods used in Marketing Research. Though many haven’t yet diffused very far into the Marketing Research mainstream, it should be evident that we have no shortage of tools for analytics! There is truly a gigantic number and brilliant academics around the world are developing new ones around the clock. And, due to space limitations, I haven’t even mentioned Social Network Analysis, Biometrics or many other newer kinds of analytics. True Artificial Intelligence still lies in the future but perhaps one day…
Some of you will have heard of R, open-source (free!) statistical software that is becoming a standard research tool for Marketing Scientists. Though not as user-friendly or well-documented as some statistical packages we’ve become accustomed to, there are now several thousand R packages and a large and increasingly sophisticated R user base. Many R packages perform cutting-edge analytics and, perhaps surprisingly, first-rate graphics, in addition to standard methods. There are also many other open-source tools besides R.
There’s a revolution happening in analytics…we are now able to give better answers to more questions more quickly than ever before. But, there are downsides. With more tools, there is more to master and more mistakes will be made if our skills sets become too thin. Increasing specialization will be needed and more silos among analytics professionals may emerge. We must also avoid using methods merely because they are new – newer is not synonymous with better.
More importantly, let’s not lose sight of our raison d’être. Who will be using our deliverables, and how and when they will be used is most critical. Let’s first focus on the decisions, not the technology.