## Kevin Gray

**By Kevin Gray**

There’s a revolution going on in Analytics. But first, what is Analytics?

Analytics has gotten a massive amount of buzz in recent years, lately in connection with Big Data and Data Science. The term “analytics” is by no means new but, perhaps surprisingly, there is a lack of consistency in what it means or implies. Sometimes it is used to designate a process, from problem identification through recommended actions. It also can refer to inferential statistics like standard errors, confidence intervals and t-tests or to basic measures of association, e.g., the Pearson product-moment correlation or chi square. At other times, though perhaps couched in esoteric claims, it merely refers to descriptive statistics such as frequency counts, means and standard deviations or to commonplace graphics such as line graphs, histograms and pie charts. More sophisticated data visualization is also at times called analytics.

In addition, analytics can refer to an extensive assortment of multivariate statistical methods and machine learning algorithms. That usage is the focus of this article. These techniques can be classified in various manners and one way is to characterize them either as **I****nterdependence **methods** **or **Dependence** methods. A second point of differentiation pertains to whether a method is intended for **Cross Sectional** data or **Time Series** data.

Factor Analysis and Cluster Analysis are probably the best known Interdependence methods, though there are many others. Put very simply, Factor Analysis groups variables and Cluster Analysis groups observations, respondents in a consumer survey for example.

Dependence methods differ in that there is one or more Target (Dependent) variable we would like to explain or predict from one or more Predictor (Independent) variable. Many kinds of Dependence methods see extensive use in Marketing Research. They can be further subdivided according to whether the dependent variables are quantities, counts, ordered categories or nominal categories that have no natural order or rank. Regression and Discriminant Analysis in particular are well known in Marketing Research; the former is used when the dependent variable is quantitative (or we decide to treat it as such) and the latter comes into play when we wish to differentiate groups (e.g., User/Non-User).

Actually, it’s not quite this simple. Partial Least Squares Regression and some varieties of Structural Equation Modeling are a blend of Independence and Dependence methods!

The techniques described thus far have been designed for cross-sectional data, data collected at one point in time. Time Series Analysis is used when the data have been collected over many time periods. Weekly sales data are an example of Time Series data. Exponential Smoothing, ARIMA, Dynamic Regression, State Space and GARCH models are just a few examples of Time Series Analysis Methods. They are household words to Econometricians but more opaque to most of us in Marketing Research. Time Series Analysis plays important roles in Marketing Mix Modeling and ROI analysis as well as in sales forecasting.

Once again, though, things are not quite this simple! There are also methods appropriate for Within-Subjects (Repeated Measures) and Longitudinal data. An example of when Within-Subjects designs are suitable is when consumers are asked to evaluate two or more products, real or hypothetical, as in an in-home product use test (real) or conjoint study (typically hypothetical). The venerable Repeated Measures MANOVA might be familiar to some of you. Longitudinal designs are useful when we observe consumers’ behavior over time. Survival Analysis is one such method and in Marketing Research is used in customer churn modeling.

**That was just the Old Stuff**

** **Out of breath? Well, these are mostly “trad” methods. It would not be exaggerating to say there has been an explosion in the number and variety of analytic methods in recent years. Advances in computer technology have taken many methods off the drawing board and put them right onto our laptops. Mixture Modeling (a.k.a. Latent Class) is one example that not long ago was impractical on the computers most Marketing Scientists were using. It is proving very useful in segmentation as well as other kinds of analyses, in part because its ability to model different kinds of data (e.g., quantitative and nominal) at once.

Bayesian methods, which can be intricate and are not easy to describe in a nutshell, are seeing increasing use in Marketing Research. Put very, very simply, in Bayesian statistics we incorporate prior beliefs about the problem we’re studying directly into our analysis and then update our understanding of the problem we’re investigating when new data become available. From the outset we are explicit about uncertainty. Bayesian methods have some important advantages in comparison with the more recognizable Frequentist methods. They are often more adept at handling sparse and messy data, for instance.

There are a lot of methods that are being developed outside of university Statistics departments, most notably by computer scientists. Many are termed Machine Learning, though the way that term is used is often ambiguous. Machine Learners are often much better at pattern recognition (e.g., in text analytics) than the Statistical Methods we are used to. Some examples of these new methods, including those developed by statisticians, are Neural Networks, Support Vector Machines, Bayesian Networks and approaches utilizing boosting, and bagging.

These “non-trad” techniques are core methods in Data Mining and Predictive Analytics, nowadays often lumped together under the vaguely-used labels “Big Data” and “Data Science.” There is now a vast array of these methods and many are also handy for analysis of consumer survey data, including segmentation and driver analysis. One important downside many of these methods share, however, is that their results are often difficult to interpret; while they are adept at prediction (the “What”) they are often not as useful for helping us understand the “Why” as traditional methods. Fortunately, in many cases the two can be used in combination to get the best of both analytic worlds.

Whatever the analytic methods used, it is also now easier than ever to perform various kinds of “What if?” simulations to make educated guesses about what might happen under various marketing scenarios, such as the introduction of a new product or competitor activity. Done prudently, simulations can help our models speak to us and guide decisions we need to make.

**What does all this imply?**

The foregoing is only a sample of the methods used in Marketing Research. Though many haven’t yet diffused very far into the Marketing Research mainstream, it should be evident that we have no shortage of tools for analytics! There is truly a gigantic number and brilliant academics around the world are developing new ones around the clock. And, due to space limitations, I haven’t even mentioned Social Network Analysis, Biometrics or many other newer kinds of analytics. True Artificial Intelligence still lies in the future but perhaps one day…

Some of you will have heard of R, open-source (free!) statistical software that is becoming a standard research tool for Marketing Scientists. Though not as user-friendly or well-documented as some statistical packages we’ve become accustomed to, there are now several thousand R packages and a large and increasingly sophisticated R user base. Many R packages perform cutting-edge analytics and, perhaps surprisingly, first-rate graphics, in addition to standard methods. There are also many other open-source tools besides R.

There’s a revolution happening in analytics…we are now able to give better answers to more questions more quickly than ever before. But, there are downsides. With more tools, there is more to master and more mistakes will be made if our skills sets become too thin. Increasing specialization will be needed and more silos among analytics professionals may emerge. We must also avoid using methods merely because they are new – newer is not synonymous with better.

More importantly, let’s not lose sight of our raison d’être. Who will be using our deliverables, and how and when they will be used is most critical. Let’s first focus on the decisions, not the technology.

Found this interesting and helpful to scope the meaning of analytics.

I get the feeling this kind of definition is quite static analytics. It does nit seem to take acount of dynamics….ie when the world changes.

I am suprised MR has not taken much notice of system dynamics approaches a la Jay Forrester. But maybe this is synthetics as it seeks to capture the whole.

Hi Martin, indeed this was only a quick overview but, broadly speaking, I personally see dynamic analytics as mostly subsumed under time series analysis and simulations. Systems Dynamics and Multi Agent Simulations have received attention in marketing research but I will admit our attention spans can be brief… 🙂

Agent-Based Modeling is a powerful was to simulate not only external (i.e., market) shifts, but also deliberate marketing actions. As a probabilistic method, it vividly demonstrates that you can’t precisely predict the future. By providing an understanding of the variety of potential outcomes, and their likelihoods, ABM makes for better-informed choices between alternative business strategies and tactics.

Kevin’s point that “Bayesian methods have some important advantages in comparison with the more recognizable Frequentist methods” is very well taken. Having just returned from a seminar on advanced Bayesian network modeling, I’m more convinced than ever that Bayesian techniques can greatly enhance the accuracy and effective actionability of marketing research findings. This is especially true regarding multivariate modeling; for example, applications that fall under the broad label of “drivers analysis.” Bayesian methods make it possible to explore much more of the “search space” of potential explanatory models to find the alternatives that most accurately describe the “mind of the market,” which business decision makers must precisely understand if they’re to plan effectively.

Hi Mick, agreed. Bayes is now a must. Perhaps it was mentioned in your seminar but in case not FYI the new edition of the Gelman et al classic will soon ht the shelves. There’s also a book by Fenton and Neil (Risk Assessment and Decision Analysis with Bayesian Networks) on using Bayesian thinking for making decisions that I think managers in just about any field would find useful. Lots of good material on Bayes now, finally.

Kevin – very nice summary of the types of analytical tools. I’m wondering (and so throwing this out for discussion) how many marketers understand the probabilistic nature of a Bayesian-type analysis? We’ve all heard researchers cry, “I made it as simple as possible but marketing didn’t get it!” and that usually refers to results from much simpler techniques. Are the results from these newer techniques too hard to understand for the average decision maker?

With respect to a number of advanced analytics applications, the information yielded by Bayesian techniques answers similar questions to those we’ve traditionally answered using “parametric” techniques. For example, Bayesian methods can yield information similar in purpose to that delivered by factor analysis, cluster analysis or structural equation modeling. Bayesian techniques often provide more accurate and revealing results due to certain important “technical” advantages. However, those Bayesian-vs.-parametric technical differences are largely “under the hood,” and so don’t necessarily need to be explained to stakeholders; at least, not in detail. As with all good advanced analytics work, the deliverables should focus not so much on describing the method as on conveying the most actionable findings to the decision makers.

Based on my experience, I think Steve’s concern is valid. Compared with Frequentist methods, computer run times can be much longer and the greater complexity of Bayesian necessarily increases the risk of error. It’s also of course more difficult to find analysts with extensive experience and competence in Bayesian methods. For much of what we do, the “classical” ways will continue to serve us well and I don’t think they’ll die out. I follow a pick and choose strategy of using whatever method (or combination of methods) I think will work best under the circumstances.

I concur with Mick that with most clients it’s not necessary to get into the details of the mechanics – though there are those who want to manage the process…

Fantastic article, Kevin. However, let’s not lose sight of the fact that the methods (including Bayes) is the means and not the end. The focus should always be “show us how we can make better and more profitable decisions with this stuff”.

Many thanks, Mike. Indeed, let’s not lose sight of our raison d’être. We should not allow our industry to be dominated by mathematics and software.

We are reaching a point at which computer speeds and available expertise are increasing the pragmatic accessibility of Bayesian analytics for marketing research applications. And I expect this trend will accelerate moving forward. While I do agree that the “classical” methods will be with us for some time to come, I expect Bayesian capabilities will become increasingly “de rigueur” for strong marketing research advanced analytics teams before much longer.

Agreed Mick. A marketing scientist who believes Bayesian methods (or machine learning) is a flash in the pan is playing the odds badly.

On the other hand, just because we have computer speeds and available expertise does not mean they are necessarily the right tool. Mike Wolfe’s point is right on. And for those over 50, remember, Mr. Natural says, “Use the right tool for the job”.

Well, I don’t contend that a Bayesian approach will always be the right tool; just that it will often be a better tool. I’m 59, by the way. I recall that Mr. Natural’s most frequently given advice was to remember that, “It don’t mean s*&!.” But I don’t think the average marketing decision-maker would respond well if we offered that as a research finding.;-)

Hi Kevin,

You simply reminded me of the old university days!!

Anyway, Bayes method is absolutely good and fantastic and to corroborate on what Mwolfe said, it is the means and not the end, this definitely plays an important role in decision making and most especially when considering ROI.

Highly thoughtful post, thumb up!!

Thank you very much, Dare!

In our industry and, probably others as well, there are those who are almost cavalier about quality and, at the opposite extreme, the Geeks. Both segments are off target in my view and thus I closed the article with “More importantly, let’s not lose sight of our raison d’être. Who will be using our deliverables, and how and when they will be used is most critical. Let’s first focus on the decisions, not the technology.” Getting these points across is not always easy, unfortunately.

You’re very much on target, Kevin. As in most things, the challenge is in achieving the best balance. We need to put appropriate time and effort into developing and maintaining highly effective methods while keeping sufficiently in mind that the reason we’re doing this is to provide our clients with actionable information that will enable them to make better decisions toward achieving their business objectives.

Great article, Kevin! As you rightly point out, the term “analytics” gets thrown around a lot these days without any clear understanding of what it means.

To Steve Needel’s point, I find that a lot of users of research are reluctant to tangle even with means (preferring top-two box percentages instead) or regression coefficients, so the challenge for researchers is to interpret results in a manner that is comprehensible to the people who are ultimately going to use them for decision making.