PureSpectrum - Schedule A Demo
Our new GreenBook Directory site is live!
COVID-19 guidance, tips, analysis - access full coverage here

Raise Your Hand If The Truth Starts At .05

When we run crosstabs and other common tests of significance, these tests assume normalized populations and samples drawn randomly. I argue this scenario is a rarity in real-world conditions.

Mark Twain Statistics


By Scott Weinberg

My first day of graduate school began with the instructors telling me and my fellow first-year classmates, “there are two acceptable reasons for being late with an assignment: hospitalization and incarceration.” Welcome to grad school, kid. We had three core instructors for my I/O Psych track, and all were newly minted PhDs under the age of 30. If you’ve ever had a new PhD for an instructor, you know they are the toughest. They just went through heck, and now you are too. They told us they were going to cram as much PhD material into us for the two years they had us in captivity. Good times.

Within these conditions, one tends to retain a few things, some of which I’ve been reminded of from time to time relative to the market research space and their residents. I’m going to throw a few out here and see what happens.

The what is easy. The why is not.

I recall two years of 700-level statistics coursework, always at 8am. Stats are always taught at 8am. I recall a quote from my textbook, “if I only had one day left to live, I would spend it in a statistics class, because the day would seem so much longer.” Working in the MR space I’ve met many clients and colleagues in this space, as we all have. I notice how many people new to the industry are taught how to do things, but not the why behind it. For example, we rotate concepts because it ‘reduces bias’ (actually it’s due to phenomena called the primacy & recency effect). Or, we ask these particular questions for all concept tests regardless of category because that’s how we do things here at (insert Honomichl name). Or, we’re not shrinking our 25 minute survey because we know people enjoy shaping the products of the future. Or, we can run t-tests and ANOVAs on any data set, regardless of how the sample was recruited & drawn, or considerations for compounding error…

So about this .05…

Let’s consider bread and butter significance testing: crosstabs. How often are insights created via PowerPoint by looking for the asterisks and mini-font letters indicating a significant difference? Anyone want to bet the word ‘significant’ is never misunderstood?  More to the point: why .05? Ever wonder what is so magical about that particular threshold? Based on what I was taught, .05 is an arbitrarily agreed to compromise that splits the chances of making a Type 1 and Type 2 error.

Lest we forget, a Type 1 error is rejecting the null hypothesis when it is in fact true (i.e. believing you have a difference in samples when there isn’t) and a Type 2 error is the opposite (i.e. there is a difference in samples but your measurement instrument isn’t detecting it). Ergo, there is nothing special about .05. Could be .04 or .06 or .08, etc. Sometimes you’ll see .01, a more stringent threshold, but the point I’m trying to make is this: please don’t assume ‘the truth’ magically kicks in at .05. It doesn’t. Yes it helps to have a threshold; however the specific boundary holds no inherent path to insights.

Non parametrics, where art thou? 

Are analyses which originate via online and similar convenience samples making a fundamental assumption that the population is distributed normally? I believe yes. Is this in fact the case? I argue: not likely. I’m not going to deep dive into the reasons, and this isn’t a quality discussion (I addressed that in my prior post). Rather, from a statistical point of view, when we run crosstabs and other common tests of significance, these tests assume normalized populations and samples drawn randomly. I argue this scenario is a rarity in real-world conditions. More to the point, how many of us are implementing chi-square tests and similar? Non-parametrics are tests of significance that assume ‘real-world’ sampling. I find them both fascinating, and apparently invisible. Is anyone out there using them for your analyses?      

In case you’re curious…

I think what’s amazing about our profession is the abundance of learning opportunities and continuing education. From the MRA and similar organizations, Research Rockstar, the many groups on LinkedIn, the streaming Research Business Daily Report, to this very blog, we enjoy convenient, accessible, expert instruction, on demand. In particular I hope the managers out there encourage and support their younger employees to devote a few hours a week to participate in these opportunities. Thank you for reading and I hope you found this worthwhile.

Please share...

16 responses to “Raise Your Hand If The Truth Starts At .05

  1. This may be the best article on research and statistics I have ever read. Scott my magnum opus just flew out of the window!! It was worth it, I haven’t laughed so much in years.

  2. Oh and whilst your at it, lets get stuck into interval scaling and the assumption that there are somehow equal units between scalar points. Honestly, market research has been the most serial abuser of statistics of any industry I know. I just love it when newbie MBA’s start spouting standard errors and confidence limits. When was the last time you saw a bell shaped distribution in anything in the social research arena? Try dealing with bi-modal data in sensory testing. Thank god clients don’t ask. Scott there truly is a place in parametric hell for you my friend!! See you there.

  3. Great piece, Scott. Still laughing, too.

    Build thought?

    …and not to mention what happens to ‘normal’ distribution curves when 20% of ‘gen pop’ panelists may account for the majority of completes on a study. Or, we don’t control adequately for speedsters, straighliners, etc. Yikes.

    Good stuff, Scott.

  4. A great piece, Scott. Your ‘out-of-the-box’ viewpoints are sharp and critical. This is the kind of article that researchers should read, read again, and then print out and tape to their bulletin boards. A set of good reminders that our use of statistical science applied to human nature and survey research must be moderated by a good dose of reality! Assumptions are always meant to be checked, right? Nicely done!

  5. Great article, and I truly appreciate Chris’ comment about interval scales (BTW–I ALWAYS talk abut the interval data debate when I teach quant classes). And if anyone in this thread would like an opportunity to present on this topic and related issues (Boston area event this November), please see this call for speakers:

  6. These are all critical issues. Sadly, even after all these years, many of the fundamentals of research are poorly understood in our business. “Old” is still pretty new to many marketing researchers…Keep up the good fight!

  7. As I recall, a lot of these parametric stat tests are surprisingly robust. You can violate many of the assumptions of ANOVA (within limits) and they still work. Maybe it’s not so bad after all. Anyway, sampling error is probably the least of threats to study validity in MR.

  8. Great piece. Right-on with the arbitrary nature of .05, although a tip of the hat to Statistical Power wouldn’t be out of place. Another consideration, for your next article, is the idea of the Sampling Distribution of the statistic at sufficiently large n’s. If you look at the Sampling Distribution of the Sample Mean (or Proportion, or any statistic), it is not required that a variable be normally distributed in the Pop., because inferential stats actually operates at the Sampling Distribution level, not the Population level.

    I’m not defending the robotic nature of some to blindly make decisions at .05…just some other inputs for your consideration. Thanks for this piece.

  9. Here’s the bigger issue. We are in a business of hunches where 80% of new products fail, 50% of ad campaigns show no sales lift and yet we researchers stat test things looking for 90% confidence?? I mean, what is wrong with this picture?

  10. The chi-square test used for cross tabs is non-parametric because it makes no assumptions about the underlying distribution of the data. So don’t say we don’t use non-parametric statistics — we do all the time.

    You’re right, though, there are other non-parametric tests people seldom run: Spearman and Kendall correlations, for example.

    Also, Hart’s comment above is a good one. Simulations have shown that parametric tests, often used on interval data, are quite robust to violations of underlying assumptions. Interval data, such as those generated by well defined, equal-appearing scales, are a good example.

    Also, many distributions found naturally among humans — height, weight, IQ — are about normally distributed, so a normal distribution does apply to a number of variables. But even if it doesn’t, you might want to review the Central Limit Theorem which is the basis for running parametric tests.

  11. This is a great article. The defect in thinking may in part reflect a defect in many of the textbooks used in undergraduate and graduate business and statistics courses. Students are lucky if they see anything on nonparametrics.

    Another defect is use of t-tests is scanning tabs to find and report anything that is “statistically significant.” That’s not how these tests are supposed to be used, and virtually guarantees reporting of fall positives.

    However, neither method of testing compensates for sample bias, which is one of the most compelling issues today. This is especially true in online research. While details are confidential, I had occasion to see data on the incidence of ad exposure for some online campaigns earlier this year. In one case that I recall, site visits were tracked and visitors were invited to an online survey. Average ad exposure among respondents was more than 4x exposure among consumers as a whole, with some “outliers” more than 50x higher. Now would you consider the data to be “representative”?

  12. A quick note of appreciation to the readers and commenters of my article. I spend so much time discussing mobile research that writing this was a nice change of pace. The challenge was avoiding all the natural tangents; I had to remind myself to keep it short and sweet.

    Noshing on a few ideas for my next article, leaning towards online sample quality issues; a topic I sadly know too much about, sigh.

    Thanks again,
    Scott Weinberg

  13. I still hear comments along the lines of “If we have a lot of respondents, it will be a representative sample”. Setting aside questionnaires that look like homework from a high school project…

Leave a Reply to joel rubinson Cancel reply