Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Mind

Help I'm Being Regressed To The Mean

Neuroskeptic iconNeuroskepticBy NeuroskepticAugust 25, 2010 2:30 AM

Newsletter

Sign up for our email newsletter for the latest science news

"Regression to the mean" was the bane of my undergraduate statistics class. We knew that it was out there, and that the final exam would have a question about it, but no-one understood it or had ever seen it. A bit like unicorns or fairies.

placeholder

The lecture notes were unhelpful. They told us what it did - make things wrongly appear to change over time when actually stuff stayed the same - but not what it was. Some people claimed to get it, but they couldn't explain it to others.

I now see that our mistake was in thinking that there's some thing called "regression to the mean". There isn't. It's just a rather unhelpful term for what happens in a certain kind of situation, and once you understand those situations, there's nothing more to learn.

Suppose there's a number, which varies over time, and at least some of this variation is random. It could be anything from the number of sunspots to rates of cancer. You get interested in this number whenever it gets very high (or very low). Whenever it does, you start tracking the number for a while. Maybe you even try to change it. You notice that the number always seems to be falling (or rising). Why?

Because you only get interested in the number when it's, by chance, unusually high. The chances are, the next time you look at it, it will be lower: not for any interesting reason, or because "what goes up must come down", but just because if you take an unusually high number and then generate a new number at random, it'll probably be lower. That's why the first number was "unusually high".

Suppose that you take some people and give them an IQ test twice, a week apart. Call the first test "X" and the second test "Y". Suppose it's a crap test that gives entirely random results. Here's what might happen if you gave the test to 100 people, with each dot a person:

placeholder

There's no correlation, because X and Y are both random junk. Nothing to see, move along. But wait a second...

placeholder

Here's X, first test score, plotted vs Y-X i.e. the change in score between the first test and the second. There's a strong negative correlation: people who did well on the first test tended to get worse, and people who did badly, tended to improve. Wow? No. This is a purely statistical effect. It's meaningless: the "correlation" exists only because we're correlating X with itself (in the form of Y-X).

It's a fundamental mistake, and it's obvious when you look at it like this, yet it's a surprisingly easy one to make without noticing. Imagine that you'd invented a pill that you think can make people smarter. You decide to test it on "stupid people", because they're the ones who need it most. So you give lots of people an IQ test (X), select the worst 10%, and give them the drug. Then you re-test them afterwards (Y). Whoa! They've improved! The drug works!

There's only one stupid person involved in this experiment.

This remains true, even if the IQ tests aren't entirely random. A test that measures real intelligence will also have an element of luck. By selecting the bottom 10% of scores, you're selecting people who are both unintelligent and unlucky when they took the test. They'd have scored 11% if they were lucky. So the same problem applies, albeit to a lesser degree.

That's really all there is to "regression to the mean". The regression of high or low scores towards the mean score is inevitable, given our definition of "high" and "low" scores, to the extent that scores are random. This is why I said it's unhelpful to think of it as a thing. The trick is being able to spot it when it happens, and to avoid being mislead by it. If you're not careful, it can happen anywhere.

Interestingly, the reason why it's thought of in this unhelpful way is probably because the "discoverer" of regression-to-the-mean, Francis Galton, misunderstood it. He observed this "effect" in some data he'd collected about human height, and he wrongly interpreted it as a real biological fact about genetics. Eventually, people noticed the statistical mistake, but the idea of "regression to the mean" stuck, to the dismay of undergraduates everywhere.

Link: This was inspired by a post on Dorothy Bishop's blog, Three ways to improve cognitive test scores without intervention.

    2 Free Articles Left

    Want it all? Get unlimited access when you subscribe.

    Subscribe

    Already a subscriber? Register or Log In

    Want unlimited access?

    Subscribe today and save 70%

    Subscribe

    Already a subscriber? Register or Log In