The fourteenth-century theologian William of Occam was so fond of critiquing his colleagues’ complex arguments that his rule of thumb--that the simplest solution to a problem is probably the correct one--became known as Occam’s razor. Over time this preference for uncluttered elegance became one of the most trusted tools in the scientific workshop. But despite its intuitive appeal, Occam’s razor has been a difficult proposition to test rigorously. Geoffrey Webb, a computer scientist at Deakin University in Geelong, Australia, has now found that at least in some cases, Occam’s razor may not be the best guarantor of accuracy.
Webb worked with computer programs called machine learning systems, which typically operate according to Occam’s guidelines. For example, say you wanted a computer to learn how to pick out, from a large set of medical test results, just those that belong to pregnant women. The programmer would first supply the computer with the medical records of all patients, male and female, as well as with a subgroup of pregnant women preclassified by the programmer. The computer would compare the two groups (the general hospital population and the pregnant women) and try to determine which characteristics--say, visits to obstetricians, or records that referred to a fetal heart rate--set the pregnant women apart. After going through a few hundred samples, the computer would come up with what it regarded as the simplest criterion for selecting pregnant women from the entire sample. Computer scientists call this problem-solving strategy a decision tree, which essentially consists of a series of yes or no questions posed by the computer as it sifts through data.
Webb fed a computer different sets of real-life data, such as credit ratings and medical records, with some containing more than 3,000 examples. In each test, Webb used two approaches. One had the computer emulate Occam’s razor: After examining 80 percent of the data, the program would learn which attributes seemed to be important to whatever goal Webb had set, and which were irrelevant. The computer would then create a decision tree with the fewest branches and finally use that tree to try to classify the remaining 20 percent of the examples.
In the second approach, the computer again initially examined 80 percent of the examples. But instead of having the computer construct the simplest decision tree possible, it was programmed to consider additional decision-making criteria if doing so would help in the classification. These criteria were somewhat abstract and mathematical, but were roughly analogous to standards computers often apply when they check someone’s credit history. A credit card company, for example, might have a simple rule that it doesn’t issue cards to anyone with an income under $15,000. This rule would completely eliminate credit cards for most college students unless the computer generated a new category that selected college students whose parents have a certain income.
Comparing the results, Webb found that for 12 of 13 problems analyzed by the computer, the more complex decision-making process gave more accurate results. Should Occam’s razor be replaced with some other rule? People are potentially missing out on useful patterns because they’re just looking for the simple ones, says Webb. Occam’s razor influences and limits what science can do with information.