Editor’s Note: There has been a lot of discussion online in recent weeks about an article in Nature that argued for eliminating the use of statistical significance. Needless to say, while some of the online discussion was sympathetic to the authors, it also aroused vehement disagreement among many. My friend Joel Rubinson recently posted some thoughtful comments about the original article in his personal blog, and with his kind permission, we reprint it below.
The premise: we should abandon stat testing and embrace uncertainty!
Abandoning stat testing altogether is a bad idea for reasons I will explain. Yet, the way researchers use stat testing needs to become more thoughtful, less auto-pilot.
Stat testing marketing data has never been about immutable laws of statistics…it is a means to an end… objective and simplified rules for marketing decision-making in a practical world. It also is a key tentpole for methodologists designing sample sizes for marketing studies and experiments.
So, when does stat testing go wrong?
High confidence intervals can be paradoxical
Setting ultra-high confidence levels sound like they are intended to reduce business risk when they actually can have the opposite effect… increasing business risk.
Consider how high confidence levels can lead to product denigration. Imagine a margin improvement formula change to an existing brand. Null hypothesis: performance difference is not noticeable. When tested at the 95% confidence level, it might take 10% or more reduction in product satisfaction to be statistically significant. But isn’t a 5% denigration enough cause for concern? The paradox is that setting the confidence interval too high can INCREASE the risk that a poorer performing product formulation will make it to market and harm brand equity.
There are times when high confidence (95% even 99%) might be appropriate, e.g. for legal issues such as claims substantiation or safety testing of medications. However, when clients mandate the same rigid and extreme confidence intervals for marketing decisions (“we make decisions based on 95% confidence intervals…”) it leads to inaction. Advertising and new products are notoriously hit or miss…half of ad campaigns fail to show acceptable ROI and 80% of new products fail; isn’t 80% certainty enough evidence you might have something better?
Stat testing is also needed by methodologists because it directs the design of a study. It tells us how big sample sizes need to be to detect a difference between options that would be meaningful to the business and to be able to call that difference statistically significant. However, if sample sizes are TOO big, they become dysfunctional…small, irrelevant differences become statistically significant and also the studies cost much more than they should. Yes, studies based on samples sizes that are too big are just as bad as sample sizes that are too small!
What is your real sample size
Now, let’s dig a little deeper…when you weight data, are you aware there is something called a RIM efficiency statistic that says the effective sample size (for stat testing) is less than the nominal sample size? In other words, if you weight data on demos and your raw sample is not reflective of target population demos, the degree of imbalance affects statistical precision. I have seen RIM efficiency statistics as low as 25%…meaning the sample was so bad that it was only worth 25% of the nominal sample size in terms of stat testing. Are you considering this? Not setting quotas for interviewing and not capping weights (its own artform) to save money might be more expensive all things considered.
Don’t forget the problem of bias
…A different kind of statistical problem. In the world of media effectiveness measurement, we often create a virtual control cell to match the exposed cell by “twinning” on demos. That DOES NOT ensure the control cell is really balanced to the exposed cell. It doesn’t matter how big your sample sizes are…if you do not balance test vs. control on the pre-existing propensity to convert for the brand, you will have an estimate that is BIASED UPWARD of the lift in conversion. This is not a statistical confidence issue but a bias issue…a silent killer.
The Research Methodologist
The research methodologist is the guardian of best practices, artfully applied. Sadly, it is becoming a lost profession, but you better have one. The methodologist sees all of these issues…sample size, stat testing, data weighting, ask the right questions the right way, like a Rubik’s cube that they repeatedly can solve. When I was at the NPD Group for 25 years someone named John Caspers served this role so well we called him Yoda. His studies always produced the best damn survey data I ever saw.
Yes, stat testing is often applied without much thought, or incorrectly (forgetting RIM efficiency). You could even argue the basics aren’t in place because most marketing data is based on some form of convenience sampling. But let’s remember that stat testing done right is an essential aid to fact-based and streamlined marketing decision making. Without stat testing, we would degenerate into a polarized debate based on ungrounded opinions. Don’t we have enough of that now from politics?