Scientific peer review is based on the idea that some papers deserve to get published and others don't.
By asking a hand-picked team of 3 or 4 experts in the field (the "peers"), journals hope to accept the good stuff, filter out the rubbish, and improve the not-quite-good-enough papers.
This all assumes that the reviewers, being experts, are able to make a more or less objective judgement. In other words, when a reviewer says that a paper's good or bad, they're reporting something about the paper, not just giving their own personal opinion.
If that's true, reviewers ought to agree with each other about the merits of each paper. On the other hand, if it turns out that they don't agree any more often than we'd expect if they were assigning ratings entirely at random, that would suggest that there's a problem somewhere.
Guess what?Bornmann et al have just reported that reviewers are only slightly more likely to agree than they would be if they were just flipping coins: A Reliability - Generalization Study of Journal Peer Reviews.
The study is a meta-analysis of 48 studies published since 1966, looking at peer review of either journal papers or conference presentations. In total, almost 20,000 submissions were studied. Bornmann et al calculated the mean inter-rater reliability (IRR), a measure of how well different judges agree with each other.
Overall, they found a reliability coefficient (r^2) of 0.23, or 0.34 under a different statistical model. This is pretty low, given that 0 is random chance, while a perfect correlation would be 1.0. Using another measure of IRR, Cohen's kappa, they found a reliability of 0.17. That means that peer reviewers only agreed on 17% more manuscripts than they would by chance alone.
Worse still, the bigger the study, the
the reliability it reported. On the other hand, the subject - economics/law, natural sciences, medical sciences, or social sciences - had no effect, arguing against the common sense idea that reviews must be more objective in the "harder" sciences.
So what? Does this mean that peer review is a bad thing? Maybe it's like the police. The police are there to prevent and punish crime. They don't always succeed: crime happens. But only a fool would argue that, because the police fail to prevent some crimes, we ought to abolish them. The fact that we have police, even imperfect ones, acts a deterrent.
Likewise, I suspect that peer review, for all its flaws (and poor reliability is just one of them), does prevent many "bad" papers from getting written, or getting submitted, even if a lot do still make it through, and even if the vetting process is not itself not very efficient. The very fact that peer review is there at all, makes people write their papers in a certain way.
Peer review surely does "work", to some extent - but is the work it does actually useful? Does it really filter out bad papers or does it on the contrary act to stifle originality? There are lots of things to say about this, but I will just say this for now: it's important to distinguish between whether peer review is good for science as a whole, and whether it's good for journals.
Every respectable journal relies on peer review to decide which papers to publish: even if the reviewers achieve nothing else, they certainly save the Editor time, and hence money (reviewers generally work for free). It's very hard to see how the current system of scientific publication in journals would survive without peer review. But that doesn't mean it's good for science. That's an entirely different question.
Bornmann L, Mutz R, & Daniel HD (2010). A reliability-generalization study of journal peer reviews: a multilevel metaanalysis of interrater reliability and its determinants. PloS ONE, 5 (12) PMID: 21179459