Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Mind

Is Medical Science Really 86% True?

Neuroskeptic iconNeuroskepticBy NeuroskepticJanuary 25, 2013 5:39 AM

Newsletter

Sign up for our email newsletter for the latest science news

The idea that Most Published Research Findings Are False rocked the world of science when it was proposed in 2005. Since then, however, it's become widely accepted - at least with respect to many kinds of studies in biology, genetics, medicine and psychology.

Now, however, a new analysis from Jager and Leek says things are nowhere near as bad after all: only 14% of the medical literature is wrong, not half of it. Phew!

But is this conclusion... falsely positive?

I'm skeptical of this result for two separate reasons. First off, I have problems with the sample of the literature they used: it seems likely to contain only the 'best' results. This is because the authors:

  • only considered the

    creme-de-la-creme

    of top-ranked medical journals, which may be more reliable than others.

  • only looked at the Abstracts of the papers, which generally contain the best results in the paper.

  • only included the just over 5000 statistically significant p-values present in the 75,000 Abstracts published. Those papers that put their p-values up front might be more reliable than those that bury them deep in the Results.

In other words, even if it's true that only 14% of the results in these Abstracts were false, the proportion in the medical literature as a whole might be much higher.

Secondly, I have doubts about the statistics. Jager and Leek estimated the proportion of false positive p values, by assuming that true p-values tend to be low: not just below the arbitrary 0.05 cutoff, but well below it.

It turns out that p-values in these Abstracts strongly cluster around 0, and the conclusion is that most of them are real:

But this depends on the crucial assumption that false-positive p values are different from real ones, and equally likely to be anywhere from 0 to 0.05.

placeholder

"if we consider only the P-­values that are less than 0.05, the P-­values for false positives must be distributed uniformly between 0 and 0.05."

The statement is true in theory - by definition, p values should behave in that way assuming the null hypothesis is true. In theory.

But... we have no way of knowing if it's true in practice. It might well not be.

For example, authors tend to put their best p-values in the Abstract. If they have several significant findings below 0.05, they'll likely put the lowest one up front. This works for both true and false positives: if you get p=0.01 and p=0.05, you'll probably highlight the 0.01. Therefore, false positive p values in Abstracts might cluster low, just like true positives.

Alternatively, false p's could also cluster the other way, just below 0.05. This is because running lots of independent comparisons is not the only way to generate false positives. You can also take almost-significant p's and fudge them downwards, for example by excluding 'outliers', or running slightly different statistical tests. You won't get p=0.06 down to p=0.001 by doing that, but you can get it down to p=0.04.

In this dataset, there's no evidence that p's just below 0.05 were more common. However,

in many other sets of scientific papers, clear evidence of such "p hacking" has been found

. That reinforces my suspicion that this is an especially 'good' sample.

Anyway, those are just two examples of why false p's might be unevenly distributed; there are plenty of others: 'there are more bad scientific practices in heaven and earth, Horatio, than are dreamt of in your model...'

In summary, although I think the idea of modelling the distribution of true and false findings, and using these models to estimate the proportions of each in a sample, is promising, I think a lot more work is needed before we can be confident in the results of the approach.

    2 Free Articles Left

    Want it all? Get unlimited access when you subscribe.

    Subscribe

    Already a subscriber? Register or Log In

    Want unlimited access?

    Subscribe today and save 70%

    Subscribe

    Already a subscriber? Register or Log In