Science fraud has been in the news again lately, and it got me thinking about whether it would be possible to fake data with no chance of getting caught. Would it be possible to carry out the perfect scientific crime? How can we help make life more difficult for fraudsters?
Here's how fraudsters seem to get caught. There is usually a two stage process. Firstly, someone notices something suspicious about the published results. In the case of numerical results, the suspicious thing is often that the data are too 'neat' or 'too good to be true' - this was what caught Stapel (Edit: but see comments), Smeesters, and others. Uri Simonsohn has exposed several frauds in this way. Other times, the data are shown to be copied from another source, as in LaCour. Data that take the form of images (blots) have their own set of 'tells', namely duplication and splicing of parts of the image. PubPeer is full of literally hundreds of these accusations. Suspicions about data are the first stage of a fraudster's downfall, but there's a second phase: suspicions that the data collection could not have happened as described. For instance LaCour claimed that his data came from a survey firm, but they denied any knowledge of his study. This is the pattern: suspicious data leads to investigation of the source of the data. I'm not aware of any example in which fraud was discovered by an outside party, purely through investigation of the source of the data, without concerns first being raised about the data itself. I can't see how that would happen. There are no routine audits in science. No-one digs into studies at random. So this leads me to the following, rather disturbing conclusion: the perfect scientific fraud would simply consist in making up data which was convincing and unexceptional enough not to attract suspicions. You wouldn't need to forge a paper trail to explain where the data came from. So long as the data are sensible, no-one will ever ask. It's said that many eyes will spot any bug. In science, there are many eyes on the published data, but not on the production of data. Scientists only care 'how the sausages are made' when the sausages look funny. There is only one problem with making up a study out of nowhere - your colleagues. To publish a single-author study would, in most scientific fields, be very unusual, and might attract the kind of attention that a fraudster would not want. So the perfect fraud would need coauthors. But this raises problems of its own. An almost undetectable fraud would be to conduct a real experiment, and involve other people in it, but to control data management yourself, and substitute convincing fake or edited data for the real measurements. This is apparently what Marc Hauser did, and notably, Hauser was caught out because a whistle-blower within his own lab could compare the raw and the published data. I don't think an outsider would have had a chance of catching him. His data were not suspicious. This post is not my practical advice for fraudsters. Rather, I am trying to suggest fraud-prevention tips. First, coauthors are the people best placed to detect fraud. In fact, in many cases, they are the only people who have a realistic chance of catching a skilled fraudster. Therefore, we should remember that in putting our names to a paper, we are endorsing its probity. Secondly, given that published data start most fraud investigations, I'd suggest that requiring authors to publish the raw data alongside the summary results would be very helpful, because it is generally harder to fake raw data than statistical summaries of data. At the very least it gives fraudsters more chances to slip up. As Simonsohn said, Just Post It.