In a post on the issue of preregistration in science, statistician and political scientist Andrew Gelmanwrites (my emphasis) that:
I support proposals in psychology and political science to allow preregistration to be done in an open way. I just wouldn’t want preregistration to be required, indeed the concept of preregistration would seem to me to be just about impossible to apply in the analysis of public datasets such as we use in political science.
What Gelman is saying is that preregistration - getting scientists to publicly announce what experiments they will conduct ahead of time, to defeat publication bias - would not be possible in the case of reanalysis studies. Rather than collecting new data, such research consists of taking a new look at old data. There is widespread concern that, because these kinds of studies can't be preregistered, this kind of research would become denigrated or even unpublishable, were registration to become the norm.
Now, reanalysis is immensely valuable (even I do it), and I've yet to meet anyone who wants it abolished. Luckily, I do not think that the rise of preregistration would threaten such studies, even if they were unpreregisterable. But in this post I want to go further than that - or, maybe, off the deep end - and say: maybe they could be preregistered.
Suppose scientists agree that if anyone is going to carry out a reanalysis of a dataset, they ought to tell everyone else about it first. This registration doesn't need to be exhaustive: it just needs to be enough to give experts in the field a good idea of what's being attempted. So when Andrew Gelman (let's say) is going to start using a new approach, he goes on Twitter, or on his blog, and posts a bare-bones summary of what he's going to do. Then he does it. If he finds something interesting, he writes it up as a paper, citing that tweet or post as his preregistration. If the analysis doesn't reveal anything new, he just moves onto something else - but the community now have reason to believe that this line of inquiry didn't work out, because Gelman told everyone that he was going to do it, and then he said no more about it. Now that they know it, no-one else need waste their time trying it. And the fact that it turned up nothing might be revealing in itself. Doing this would be, voluntary but the community would be suspicious of any set of results that just popped up out of nowhere, because they would have no way of knowing how many other approaches the author tried until he or she got the result they wanted.
Now, that is preregistration. That's all it would need to involve: a tweet or a blog post, along the lines of "Today I'm going to try factor analysis on all of the X variables from Y dataset". So long as your peers know what you mean, that would be enough. Preregistrations for analyses could be minimalistic compared to those for full experiments. No-one has ever said that preregistration should be bureaucratic. Would it work in practice? Well, it would be very easy to cheat. You could do an analysis, look at the results, then if you like the look of them, 'preregister' it, wait a decent interval, and publish it. If it's not what you want, you could pretend it never happened. In the worst case scenario, everyone would cheat in this way, and only results that are liked would see the light of day. But notice this is what currently happens anyway. We currently live in that worst case scenario.