Who's the greatest sports person of all time?
Clearly, whoever it is has to also be the best in the history of their particular sport. But there are lots of sports, and equally many best players. Who's better - the greatest footballer, or the greatest golfer, or...?
You could just pick your personal favourite. If you like soccer, then Pelé is probably the greatest sportsman. Those who prefer baseball would most likely plump for Ty Cobb. But is there a way of objectively deciding who's the greatest of the great?
Yes, and the answer is Australian cricketer Sir Don Bradman (1908-2001). Why? It's all down to ?, the standard deviation - a fantastically useful, yet often misunderstood, mathematical technique. Time for a quick stats lesson.
For any collection of numbers, it's easy to calculate the mean - the sum of the numbers divided by how many there are. For example, the mean of 10, 20 and 30 is 20 : 10+20+30=60 / 3. This is what's commonly called the "average", although statisticians don't like that word.
The mean of 19,20, 21 is 20 as well. But there's obviously an important difference between these two sets of values. The first is more variable, or "spread out", than the second. How can we measure the "spread" of a set of numbers?
A convenient way would be to calculate the mean difference from the mean of the numbers. For 10,20, and 30, the differences from the mean are 10, 0, and 10, and the mean of the differences is 6.66 : 10+10=20 / 3. For 19,20,21, the differences are 1,0 and 1, and the mean difference is 0.66. The numbers are evil, but the principle is straightforward.
The standard deviation (aka the "s.d." or ?, "sigma") is similar to the mean difference, but it's calculated using a slightly more complicated method. First, work out the differences from the mean, then square them all, and calculate the mean of the squared values. This is called the variance. The square root of the variance is the standard deviation, ?.
For 10,20,30, the deviations are 10,0,10. Squared, that's 100,0,100, and the mean is 66.6, which is the variance. The square root of 66.6 = 8.16, so that's the standard deviation, ?. This is higher than the mean difference, but in most cases it's fairly close to it. ? turns out to be more useful in many ways, so it's generally what we use.
? allows us to compare very different kinds of numbers in terms of how "unusually high" or "unusually low" they are. Imagine that the height of men has a mean of 180 cm, with a ? of 10 cm. In that case, a man who stands 200 cm tall would be 2 ? above the mean.
Now imagine that this man has an IQ of 145. IQ has a mean of 100 and a ? of 15, so this man's IQ is 3 ? above the mean. He isboth tall and smart, but in an important way, he's smarter than he is tall, even though it obviously makes no sense to compare a height to an IQ score directly.
This brings us back to sports and Don Bradman. If you calculate the mean and the ? for some measure of sporting achievement, you can work out how many ? above (or below) the mean any given player is. In soccer, you might pick goals scored per match. In baseball, you might go with the batting average.
It turns out that if you do this, Don Bradman is the greatest sportsman of all time: his batting average was 4.4 ? above the mean for professional cricketers. Pelé comes second, as his goals-per-game was 3.7 ? above average, while Ty Cobb's batting average was 3.6 ? high. Bradman was the best cricketer ever, and Pelé was the best footballer ever, but Bradman was the best by a much larger margin than Pelé was.
Of course, we probably shouldn't take this too seriously. There are lots of assumptions here - it assumes that goals-per-match is the ultimate measure of a footballer's ability, which rules out defenders entirely, for example. And the statistician responsible for this work, Charles Davis, only looked at cricket, soccer, baseball, golf and basketball. And he was Australian, like Bradman, which may not be coincidence.
But still, it's an interesting result, and a good illustration of the power of ?. In science, ? has manifold uses. Whenever you see a picture of "brain activation" measured with fMRI, for example, those colourful patches actually represent areas where neural activation is unusually highly correlated with something.
For example, if you show people a picture, and activation in a certain area increases whenever you do, this is unusually highly correlated (relative to the rest of the brain where activation is random). The "hotter" colours correspond to higher ? values, specifically z scores. When you see "blobs on the brain", 9 times out of 10, you're literally looking at statistics.