Over at Edge, they've posted a provocative article by Chris Anderson, editor of Wired magazine: "The End of Theory -- Will the Data Deluge Makes the Scientific Method Obsolete?" We are certainly entering an age where experiments create giant datasets, often so unwieldy that we literally can't keep it all -- as David Harris notes, the LHC will be storing about 15 petabytes of data per year, which sounds like a lot, until you realize that it will be creating data at a rate of 10 petabytes per second. Clearly, new strategies are called for; in particle physics, the focus is on the "trigger" that makes quick decisions about which events to keep and which to toss away, while in astronomy or biology the focus is more on sifting through the data to find unanticipated connections. Unfortunately, Anderson takes things a bit too far, arguing that the old-fashioned scientific practice of inventing simple hypotheses and then testing them has become obsolete, and will be superseded by ever-more-sophisticated versions of data mining. I think he misses a very big point. (Gordon Watts says the same thing ... as do many other people, now that I bother to look.) Early in the 17th century, Johannes Kepler proposed his Three Laws of Planetary Motion: planets move in ellipses, they sweep out equal areas in equal times, and their periods are proportional to the three-halves power of the semi-major axis of the ellipse. This was a major advance in the astronomical state of the art, uncovering a set of simple relations in the voluminous data on planetary motions that had been collected by his mentor Tycho Brahe. Later in that same century, Sir Isaac Newton proposed his theory of mechanics, including both his Laws of Motion and the Law of Universal Gravitation (the force due to gravity falls as the inverse square of the distance). Within Newton’s system, one could derive Kepler’s laws – rather than simply positing them – and much more besides. This was generally considered to be a significant step forward. Not only did we have rules of much wider-ranging applicability than Kepler’s original relations, but we could sensibly claim to understand what was going on. Understanding is a good thing, and in some sense is the primary goal of science. Chris Anderson seems to want to undo that. He starts with a truly important and exciting development – giant new petascale datasets that resist ordinary modes of analysis, but which we can use to uncover heretofore unexpected patterns lurking within torrents of information – and draws a dramatically unsupported conclusion – that the age of theory is over. He imagines a world in which scientists sift through giant piles of numbers, looking for cool things, and don’t bother trying to understand what it all means in terms of simple underlying principles.
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show.
Well, we can do that. But, as Richard Nixon liked to say, it would be wrong. Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomena. But it doesn’t mean we shouldn’t try. Hypotheses aren’t simply useful tools in some potentially outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about.