Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Not all causal relationships are created equal

Gene ExpressionBy Razib KhanMay 11, 2009 11:00 PM


Sign up for our email newsletter for the latest science news

You might have already see this chart relating obesity to time spent eating in The New York Times:


The commentary accompanying the chart goes like so:

On Monday, in posting some of the data from the Organization for Economic Cooperation and Development's Society at a Glance report, I noted that the French spent the most time per day eating, but had one of the lowest obesity rates among developed nations. Coincidence? Maybe, maybe not.

Jim Manzi dug deeper into the data and found something very interesting:

I recreated the original analysis (minus the inclusion of the OECD average as a data point in the regression, for what I assume are obvious reasons). I get pretty much the same picture, and using a log regression form, get what looks to be the same trend line. The R-Squared on the regression (not noted in the original post, as far as I could see) is 26%. Without the U.S. and Mexico, it goes to about 6%, and becomes statistically insignificant. But what was really interesting is that there are five other time categorizations provided at the source website. Here's the same data plot, but using "Time Spent Doing Unpaid Work" instead of "Time Spent Eating and Drinking": ... Huh. This relationship, produced from the same data source, is about twice as strong (R-Squared = 52%) as the one that was reported. It took me literally five minutes of work to discover it. Why do you think that one was reported but not the other? This appears to be a textbook example of the human tendency to accept correlations as "not definitive, but part of the overall picture of evidence for causality" when such data serves to confirm pre-existing beliefs, and to ignore it otherwise.

R-squared here refers to the proportion of variation of Y explained by the variation in X. It is a problem of dredging through data that you selectively pick out relationships of "interest" and dismiss those which you don't want to highlight as of less interest, or simplifying the "underlying complexities." More generally, it is always an interesting verbal experience dealing with someone who is the king of nuance and subtly shadings when they are making a negative case against a hypothesis, but become forceful advocates of black & white inferences when making a positive argument.

    3 Free Articles Left

    Want it all? Get unlimited access when you subscribe.


    Already a subscriber? Register or Log In

    Want unlimited access?

    Subscribe today and save 70%


    Already a subscriber? Register or Log In