Lately I've been talking a lot about the question of whether scientists should preregister their research protocols.
One question that often arises in these discussion is: "what about exploratory research?" The argument goes like this: sure, preregistration is good for confirmatory research - research designed to test a particular hypothesis. However, some research (perhaps most) is exploratory, meaning that it's about collecting data and seeing where it leads. Exploratory studies have no prior hypothesis or set protocol. Preregistration would hamper or stigmatize such open-ended hypothesis-generating research, which would be a bad thing. Now, I don't think that preregistration would hurt exploratory research, but in this post I want to ask: what exactly makes research 'exploratory'? In particular, I'm going to explore the question: can research be called exploratory if it uses p-values? P-values are everywhere. In neuroscience, psychology and many other fields, the great majority of published empirical research uses them. Now a p-value is"the probability, under the assumption of the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed." So every p-value implies the existence of a hypothesis - the null hypothesis. How, then, can any study that results in p-values be considered purely hypothesis-generating? Surely every p-value represents a hypothesis being tested? One answer would be as follows: maybe a study is only confirmatory if it involves a positive hypothesis, as opposed to the a null hypothesis. The null 'hypothesis' in an exploratory study might be "that there is nothing new and interesting going on here." We might decide that this doesn't count as a hypothesis for the sake of deciding whether a study is confirmatory. My impression is that this is the assumption behind many discussions of exploratory science. My concern is that this approach makes 'exploration' purely a matter of the researcher's intentions. The very same analyses on the very same data could be either exploratory, or confirmatory, depending on what is going on in the researcher's mind when they do it. This is unsatisfactory to me. So what if we bite the bullet and declare that anything involving a p-value is a confirmatory study? Taken to its logical conclusion, this could mean that all confirmatory (p-value) analyses should ideally be preregistered, while non-preregistered analyses could use descriptive statistics, but not inferential ones. A "no preregistration, no p-values" rule also ensures that p-values can be taken at face value. A p-value is the chance of finding a result as extreme as the observed result, under the null hypothesis. But what if you run lots of different statistical tests to address the same hypothesis? Then your chance of finding an extreme result in at least one test is higher than the p-values indicate. (Multiple comparisons correction solves this problem, but only if it's applied over all of the tests that were ever tried, not just all of the tests that are published, and preregistration is the only way to ensure this.) However, it would be impractical to "ban" exploratory research from using p-values altogether. Even if we could, doing so would undermine the principle that exploratory research is open and free-form. But maybe we could re-brand exploratory p-values so that they can't be mistaken for (preregistered) confirmatory ones. We could call them p(e) or p* or p' values. We could even call them o-values, because they are "not quite p's".