Academic bunfight ahoy! A new paper from Nick Brown - famed debunker of the "Positivity Ratio" - and his colleagues, takes aim at another piece of research on feel-good emotions.
The target is a 2013 paper published in PNAS from positive psychology leader Barbara Fredrickson and colleagues: A functional genomic perspective on human well-being. The disputed paper claimed to have found significant correlations between questionnaire measures of human happiness, and the expression of a set of 53 stress related genes in blood cells. In their critical article, which is out now in PNAS, Brown et al. criticize many aspects of the Fredrickson paper, but their most serious charge is that the headline results are likely to be a false positive. The key statistical analysis, a method that they dub "RR53", is flawed, say Brown et al.:
Even when fed entirely random psychometric data, the "RR53" regression procedure generates large numbers of results that appear, according to these authors’ interpretation, to establish a statistically significant relationship between well-being and gene expression. We believe that this procedure is, simply put, totally lacking in validity.
Cole and Fredrickson are defiant in their PNAS response
, and say that "Brown et al.'s reanalysis itself contains major statistical and factual errors". The reply is only 500 words, but a more detailed rebuttal is posted here. So who's right? I should point out at this point that I am cited in the Acknowledgements of Brown et al. and I was involved in the paper in an advisory capacity. In my view, there is no doubt that RR53, and hence the Fredrickson et al. paper, is flawed. The rebuttal focuses on deflecting one aspect of Brown et al.'s critique, a so-called 'bitmapping' procedure. But I believe that RR53 can be shown to give high false-positive rates without using 'bitmapping' at all - and I'll now demonstrate this with the help of some simulations.
First, I ran 10,000 RR53 simulations using the Fredrickson genetics data
and parameters, but using randomly generated predictors in place of the two happiness questionnaire scores. I found a false positive rate of 55% per predictor - far higher than it should have been. In theory, the false positive rate should be 5%
. Why so high? I ran more simulations, this time with 53 randomly generated outcome variables ('genes') instead of the actual gene expression data. This revealed that the false positive rate is correct (5%) so long as the outcome variables are uncorrelated with each other. But if they are inter-correlated - and they are in Fredrickson et al's data - the procedure gives spurious positive 'associations'. Here's a plot of the false positive rate as a function of outcome inter-correlation, with 53 outcome variables. There's a clear relationship:
Fredrickson et al.'s 53 genes have an inter-correlation "MCM" of 0.415 (see the end of the post for details on my "MCM" metric). The graph above shows that this corresponds to a false positive rate of approximately 55%. My conclusion here is that RR53 on Fredrickson et al.'s data produce false positives because the gene data is inter-correlated. Even more simulations suggest that neither the degree of correlation between the predictors variables, nor the number of predictors, has any effect on the rate of false positives (per predictor). On the other hand, the false positive rate increases with the number of outcome variables ('genes'):
In summary, I believe that the RR53 procedure on which Fredrickson et al.'s PNAS paper is based is prone to false positives. I believe that with the dataset Fredrickson et al used, their chance of observing a false positive association between each happiness-score predictor and average gene expression, was 55%. So what went wrong? I think the answer is deceptively simple. RR53 is based on a t-test
and an assumption of the t-test is that all of the observations in the sample are independent. If the outcome variables are correlated, this assumption is violated. It is essentially the problem of auto-correlation
. I may expand on this in a future post. In my opinion, whatever else may be right or wrong with Fredrickson et al.’s paper, their central analysis was flawed and their headline results are probably false positives. For what it's worth I think the flaw is an insidious one, one that's not obvious at first glance, and I'm not saying that Fredrickson et al. are to be blamed for making this mistake. To err is human. But I believe that a mistake was made.
Gory details: in my simulation (Matlab code on request), I follow the Fredrickson et al procedure as explained by Brown et al. All random data are unit normally distributed
. To generate two vectors of correlated random numbers, X and Y, I generate X, then generate a second random vector Z, and then set Y = wX + (1-w)Z, where w is the weight, from 0 to 1, that determines how correlated X and Y are. To simulate "RR53", I first generate two sets of random but correlated predictors (the correlation r=0.79 in the Fredrickson et al. questionnaire data) I then generate a set of (usually 53) random but variably correlated outcome variables ('genes'). I then run Repeated Regressions i.e. for each outcome variable in turn, I regress both predictor variables against the outcome variable to obtain two regression coefficients. I then use a one sample t-test for each of the sets of coefficients, with the null hypothesis that the mean is zero. Fredrickson et al. used a bootstrap to estimate the standard error of the mean; I use simple parametric t-tests after verifying that the differences are negligible (bootstrapping is slow.) I quantified the inter-correlation of the outcome variables by first calculating the mean of all of the outcome variables and then calculating the mean of all of the correlation coefficients between each variable and the mean. I call this quick and dirty metric the mean-correlation-with-mean, "MCM".
Brown, N., MacDonald, D., Samanta, M., Friedman, H., & Coyne, J. (2014). A critical reanalysis of the relationship between genomics and well-being Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1407057111