It’s on everybody’s lips these days. BIG DATA!!! How can you make the most of it, how can you analyze those data, how can you make sense of them?
But before you embark on the big data adventure, maybe it’s a good idea to check with others who have dealt with big data for a long time already. This time, big data inspiration comes from an unexpected side: neuroscience.
Yes, neuroscience has dealt with big data for a long time, and the data sizes are becoming larger by each year.
Put it this way, a typical data set for a functional MRI scan takes up over 500 MB per person, providing a new data point in thousands of minuscule subregions of the brain every 2 seconds or so. With an EEG, you get a new data point every 1 millisecond or so for typically 10 to 128 electrodes, and you can look at five or more different frequencies, producing millions of data points per person. If you want to think big data, neuroscience can take you there.
But with big data, you also get big challenges. This has been one of the largest issues in neuroscience. As any statistician will tell you, if you have an enormous statistical power, any test you run can easily turn out to be significant. Everything is significant in the land of Big Data!
Take one example, a scientific paper entitled “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction” (and winner of the 2012 IgNobel Prize). Here, researchers reported that they could find brain activation during interspecies social interaction – in a dead salmon. You read right: brain activation in a really dead salmon! Just by doing the analysis wrong and not correcting the data, the researchers got a strong false positive effect.
Obviously, Big Data can also become Big Problems. So what do we do?
Based on the experiences from neuroscience, here’s a few rules of thumb.
Kill the dead salmon!
Seems harder than you might think. When you have big data, you need to correct for multiple comparisons. The traditional method is to use Bonferroni correction. In neuroimaging we can use, e.g. Family-Wise Error correction. Find the appropriate way of minimizing the chances that your significant results could have happened by pure chance!
Look at the smaller samples!
This is a very simple rule. If you cannot find your effect in a smaller sample, is it really interesting? If you need to include thousands of people to find a significant effect, chances are that it’s not a very interesting phenomenon…
Look at the extremes!
A classical way in neuroscience is to look at aberrant behavior. If you observe an effect in a normal sample, what about the extremes? Do experts, niches, special segments, all behave according to your newfound law? When does the rule break down? Sometimes you might learn more from the extremes than the boring mean.
Use different sources!
If you’re using surveys, try other measures. Combine methods. If the phenomenon is robust with one method, you should be able to find it in many different sources. If your results are only for one particular domain, what does that mean? While using surveys, why not see if neuromarketing measures show the same results?
Kill your ideas!
A long forgotten ethos in commercial research is the idea of trying to kill your darlings. With Big Data, this becomes more important than ever. If you are pursuing an idea that X is related to Y, don’t try to find that relationship – try to kill the whole idea! Throw your worst tests on it, beat it with a bat, fry it and throw it out. If it sticks, chances are you are on to something.
Enjoy your Big Data!