Even Correlations Based On Billions Of Data Points Do Not Prove Causation

Readers may have already heard about a recent study by Tim Althoff and colleagues from Stanford University, published in Nature, that analyses physical activity data collected from smart phones consisting of 68 million days of physical activity for 717,527 people, in 111 countries (only 46 of which were included in the study).

As one may expect, not only do activity levels vary widely across countries but also substantially within countries (which in general terms, the authors refer to as “activity inequality”).

It turns out that activity inequality and not actual levels of activity predict obesity rates (based on BMI).


“By quantifying the relationship between activity and obesity at the individual level, we were able to determine why a country’s activity inequality is a better predictor of obesity than average activity level. We find that the prevalence of obesity increases more rapidly for females than males as activity decreases. And while lower activity is associated with a substantial increase in obesity prevalence for low-activity individuals, there is little change in obesity prevalence among high-activity individuals. So given two countries with identical average activity levels, the country with higher activity inequality will have a greater fraction of low-activity individuals, many of them female, leading to higher obesity than predicted from average activity levels alone. These findings are analogous to the phenomenon revealed in past studies of the effects of income inequality on health, whereby a relatively small change in income (in our case, activity) for an individual at the bottom of the distribution can lead to substantial improvements in health. On the basis of our model relating activity inequality to obesity prevalence, we also performed a simulation experiment which, assuming perfect information (Methods), suggests that interventions focused on reducing activity inequality could result in a reduction in obesity prevalence up to four times greater than in population-wide approaches.”

The authors go on to discuss various limitation of their study but fail to mention the biggest limitation of all, the simple fact that correlations, no matter how strong or how large the data set, simply cannot prove causality.

Thus, while the data does prove the point that you can do all sorts of interesting analyses when you have large data sets, it simply does not not prove that activity levels (or activity inequality for that matter) actually has much to do with obesity at all.

Indeed, one could think of a number of confounders that would otherwise differentiate countries with high activity inequality that happen to have high obesity rates from countries that have low activity inequality and low obesity rates (let’s not even mention reverse causality).

Thus, as nice as the figures presented in the paper may be, it is really hard to follow the author’s conclusion that,

“Our findings can help us to understand the prevalence, spread, and effects of inactivity and obesity within and across countries and subpopulations and to design communities, policies, and interventions that promote greater physical activity.”

This is not to say that designing communities, policies, and interventions would not be of substantial health benefits – given all of the known benefits of physical activity.

Unfortunately, whether or not, these policies would do anything to prevent or reverse obesity is another matter altogether and remains as unclear after this study as before.

Edmonton, AB