Many fMRI studies could be giving false-positive results according to an important new paper from Anders Eklund and colleagues:
The authors examined the SPM8 software package, probably the most popular tool for analyzing neuroimaging data.
Their approach was beautifully simple. They wanted to check how often conventional analysis of fMRI would "find" a signal when there wasn't really anything happening. So they took data from nearly 1,500 people who were scanned when they were just resting, and saw what would happen if you looked for "task related" activations in those scans, even though there was in fact no task. It's a very clever use of the resting state data.
Eklund et al ran the analysis many thousands of times, under various different conditions. This is the key finding:
This shows the proportion of analyses which produced significant "activations" associated with various different "tasks". In theory, the false positive rate should be way down at the bottom at 5% in each case. That's the error rate they told SPM8 to provide. As you can see, it was often much higher. Oh dear.
The error rate depended on two main things. Most important was the task design. Block designs were much worse than event-related designs (see the labels at the bottom: B1,2,3,4 are block, E1,2,3,4 are event.) The longer the blocks, the more errors. B4, the most error-ridden design of all, corresponds to 30 second blocks.
That's bad news because that's a very common design.
Secondly, the repeat time (TR) mattered, especially for block designs. The TR is how long it takes to scan the whole brain once. The longer the TR, the better, the data showed: 1 second TRs are really dodgy. Luckily, they are rarely used. 2 seconds is OK for most event-related designs, but block designs really suffer. 3 seconds is even better.
Because most fMRI studies today use 2-3 second TRs, this is somewhat reassuring, but for block design B4 the error rate was still up to 30% even with TR=3. Oh dear, oh dear.
So what went wrong? It's complicated, and you should read the paper, but in a nutshell the problem is that fMRI data analysis assumes that there are only two sources of data: the real brain activation signal, and white noise. The key assumption is that it's white noise, which essentially means that it is random at any moment in time: knowing about what the noise did in the past tells you nothing about what it will do in the future. "Random" noise that's actually correlated with itself over time is not white noise.
Now noise in the brain is certainly not white, for various reasons, including the effects of breathing and heart rate (which of course are cyclical, not random.) All fMRI analysis packages try to correct for this - but Eklund et al have shown that SPM8's approach doesn't manage to do that, at least for many designs.
And the really big question: does this mean we can't trust published SPM8 results? Does SPM stand for Spurious Positive Mapping? Well, that's also not clear. All of Eklund et al's analyses were based on single subject data. But most fMRI studies pool the results from more like 20 or 30 subjects. Averaging over many subjects might make the false positives cancel out, but we don't yet know if that would solve the problem or only lessen it.
Eklund, A., Andersson, M., Josephson, C., Johannesson, M., and Knutsson, H. (2012). Does parametric fMRI analysis with SPM yield valid results?—An empirical study of 1484 rest datasets NeuroImage DOI: 10.1016/j.neuroimage.2012.03.093