Recently terms like Machine Learning (ML), Deep Learning (DL) and now increasingly Artificial Intelligence (AI) have permeated the marketing world. These terms are often talked about as more recent and “hence” better alternatives to statistical modeling (SM). AI especially comes with an aura that somehow it can perform magically without human intervention. As a disclaimer, any blog on these topics needs to vastly simplify things as these are all massive topics, but let’s look at it from a practical marketing lens.
Techniques that probably would be classified as ML include Decision Trees (DT), and Random Forests (RF), the latter generates multiple trees and a majority vote is used for prediction. Techniques that fall under say traditional SM include linear regression, logistic regression, etc. First, differences between an ML versus an SM approach are not always super clear cut. However, there are a few differences that most academics in SM and ML would agree on. First, SM is based on the specification of an explicit model (e.g. a linear function, or a logistic function) along with some distributional assumptions that give the estimators some nice properties. ML methods such as DT, RF, etc. do not have a model structure. Concretely, SM is generally easier to interpret than DT and DT are easier to interpret than RF. Second, in ML the model selection process happens automatically.
For example, it will pick up interaction effects automatically. In SM the analyst runs and compares different models: i.e. if one wants to test interaction effects these need to be specified before running the model. As the set of variables grows larger searching for interaction effects becomes almost impossible. This is probably the reason why ML-like approaches are sometimes called AI because in ML estimation of the model and model building occur simultaneously. With respect to prediction accuracy, the performance of ML and SM vary. We have seen (in our own work and in published papers) cases where ML significantly outperforms SM (see the paper by Breiman for examples) and the other way around. One hypothesis is that if the theory about what generates the data lacks ML may do better, but in more structured situations (say predicting housing values; where we have some theory about what is causing a home to sell for more or less) SM may do better.
Practical Requirements for Successful Models
Most (not all) predictive analytics applications in marketing prediction accuracy, understanding, and simulation (looking at what-if scenarios). Prediction accuracy is the degree to which a model can predict, ideally in new situations (e.g. time periods after the period on which the model was based). Understanding is the ability to interpret the model, preferably easily, in such a way that actionable recommendations can be made. Models also need to be viewed as credible: i.e. does the model seem intuitive. For example, the presence of strange effects can jeopardize credibility. Simulation refers to the ability to define new scenarios and have the model calculate the likely business result. These requirements are sometimes at odds with each other and ML and SM differ on these criteria.
Breiman pointed out in his classic paper (Statistical modeling: the Two Cultures), that in situations with many variables (say several hundred) and lack of any meaningful theory ML approaches can yield very good predictions where SM cannot. In marketing research, there are many situations where the set of variables is relatively modest and where we have at least some theory, some notion as to what the data generating mechanism might be. Think of your typical customer satisfaction driver modeling project and your brand drivers project. However, there are exceptions. For example, when firms run a segmentation project and then want to develop a predictive model (typing tool) that allows them to score their entire customer database on these segments. In developing typing tools, clients will want to see high predictive accuracy: i.e. the typing tool model needs to predict segment membership with high accuracy. This is quite often hard to achieve. Decision Trees can sometimes be a useful alternative, especially if the possible typing tool variables set is large (e.g. we have internal customer database info on that).
Understanding and Actionability
A difference between ML and SM is the explicitness of the underlying model. For example, in both linear and logistic regression, the underlying model is straightforward and easy to interpret, and the results often make intuitive sense. In ML this benefit gets lost quickly. Decision trees, for example, can become unwieldy with lots of non-sensical branching and becomes very hard to interpret. Often it includes “effects” that make no intuitive sense and have no utility beyond that they seem to help with the prediction. Random Forest (a set of Decision Trees) or ensemble models (averaging across different types of models) are even worse. It is almost impossible to wrap one’s heads around the data generating mechanism and hence actionability becomes cumbersome. Even, when the sole purpose is a prediction, not understanding, such non-sensical “effects” can cause a client to not want to use the model.
The third requirement, simulation, also creates problems for ML. In Decision Trees, for example, the analysis is based on splits. Say, at some level, it may split based on satisfaction: respondents who scored an 8 and 9 versus those less than 8. This means that any simulation that involves an improvement in the 1-7 range will show zero impact. One could say, that is because the model shows that only a transition from 7 to 8 makes a difference. However, that is just not realistic and will be a hard-sell to managers. Note, that we are talking about differences in independent variables in predictive models. In other situations (e.g. NPS) discrete jumps are sometimes acceptable or even explicitly assumed.
ML and SM are both set of incredibly powerful tools. Each has its unique strengths and it is fine to use both approaches in combination. For example, if you believe interaction effects might exist, run a Decision Tree first, and then include any identified interaction into a regression model. Both approaches need to be applied with care and skill.
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, Vol. 16, No. 3, pp. 199-215