An Econometrics Lesson

I received an email from a reader who was very excited to find that over the past 70 years the correlation between excess health care inflation (the price of health care relative to the overall CPI) and the proportion of health care spending paid for by third parties was 0.92 (out of a maximum of 1.00)

I wrote back saying that correlation does not imply causation. He replied that he understood that, but still, with a correlation that high there must be something.

I’m sorry, but the inability to infer causation from correlation has nothing to do with the size of the correlation coefficient. It reflects the process generating the data. In a controlled experiment, you often can say something about causation. When you just observe some data, you cannot.

In addition, time series data (data that cover long time periods) are very subject to spurious correlation. Over time, data tend to follow trends. Any two trends are automatically correlated, whether there is a causal relationship or not.

When you look at data over time, it is important to ask yourself how many data points you really have. With a strong trend, you probably should just think of yourself as having two data points–the beginning and the end point. If there are a few sharp swings in the data, then you might have three or four effective data points. The fewer the number of effective data points, the harder it is to distinguish among alternative sources of causality.

That is why most macro-econometrics is junk science. That is one reason I would tend to suspect that Larry Bartels’ work on Presidential party and income inequality is junk science.

It is possible to do useful empirical work. Amy Finkelstein used a natural-experiment approach to look at the effect of Medicare on health care spending. I suspect that there are actually a lot of pitfalls in her approach, but what she did is far more reliable than plotting two time series, calculating a correlation coefficient, and proclaiming that you have proven that third-party payments are a major cause of health care inflation. I certainly believe that this could be true, but my opinion is not swayed at all by a crude time-series regression.

READER COMMENTS

READ COMMENT POLICY

Jim Manzi

Jun 17 2008 at 3:27pm

Great post.

I did a very long post on why the Larry Bartels correlation is not persuasive evidence of causality:

http://theamericanscene.com/2008/04/10/storks-bring-babies

The title of the post is “Storks Bring Babies”, and I’m sure you get the joke.

Stephen Gordon

Jun 17 2008 at 4:22pm

If your preferred world view predicts that those two variables should be uncorrelated, then you cannot simply dismiss an observed correlation coefficient that easily. Yes, there are many (in principal, an infinite number) of causal relations that will generate that correlation. But if none of them is your preferred model, then you have to revise your beliefs.

dearieme

Jun 17 2008 at 5:06pm

“That is why most macro-econometrics is junk science.” Ditto Global Warming. Not that I am implying that most macro-economics has declined from ineptness to dishonesty in quite the way that Global Warmmongering has.

Blackadder

Jun 17 2008 at 5:07pm

If correlation does not imply causation, what does?

KDeRosa

Jun 17 2008 at 6:38pm

A properly conducted controlled experiment (with the usual qualifications)

Oxonian

Jun 17 2008 at 8:22pm

Take two groups of people, A and B. When learning about a correlation between two events, people in group A would believe more strongly in the existence of a causal relation between those two events, in proportion of the strength of the correlation. People in group B, by contrast, would ignore such correlations–dismissing such an approach as “junk science”–and would only update their beliefs after learning the results of some properly conducted controlled experiment. Give members of both groups equal amounts of money, and ask them to bet on various propositions on the basis of their beliefs. I’m willing to bet that if such a controlled experiment were carried out, people in group A would end up richer than people in group B.

kderosa

Jun 17 2008 at 8:28pm

Is that proposition based on a correlation?

Blakeney

Jun 17 2008 at 9:43pm

Correlation does imply causation (or make the inference of causation more plausible, if you prefer), but the mere presence of a correlation doesn’t tell you what is causing what. Did A cause B, or did B cause A, or (as is almost always the case) did some unsuspected, many-times-removed factor X start a chain of causation that affected both A and B?

Identifying the causal chain requires further evidence, of course (like that produced by a controlled experiment). The question is, who’s going to go to the trouble of setting up such an experiment if they haven’t seen some correlation to make them suspect that a causal chain exists?

“Correlation doesn’t imply causation” is a nice, pithy way of dismissing junk science that’s based on little more than a plausible-sounding story backed up by hand-picked correlation statistics. My only objection to using the statement for this purpose is that the statement is incorrect, or at least it’s incomplete in a misleading way. I’d be happier with “correlation is not the same as causation, but sometimes they’re related, and here’s why…”

Okay, so it sacrifices some pithiness… sue me. The original version just sets up a straw man for folks like Oxonian to knock down.

thebastidge

Jun 18 2008 at 5:40am

Correlation absolutely DOES imply causation; but it doesn’t prove it. Implications can be wrong. A good point was made above about the direction of causality as well.

However, while Oxonian’s comment may be true, I highly doubt it would be that clear. Some people’s intuitions are quite accurate, even if not formally, and rigidly logical. There are reasons for why our brains take the shortcuts they do: and a good plan executed now is better than a perfect plan which never transforms into action.

Jun 18 2008 at 9:08am

Question: What do the results of a controlled experiment show? Is it not a correlation (or lack thereof)?

Compare: People who use product x have a higher rate of getting disease y (proves nothing, because correlation does not prove causation); so we do a controlled experiment and find that people who use product x have a higher rate of getting disease y (this does prove causation, because, you know, when did this controlled experiment, and it found a correlation).

Also, if the only way to show causation is through a controlled experiment, how did people manage to know anything at all about causation prior to the advent of modern science. Presumably people knew of the connection between, say, sex and babies or getting mauled by a tiger and dying long before there were controlled experiments on the subject. Were they all just making foolish inferences? (Actually, have there been controlled experiments on the subject? Maybe we’re the only making the foolish inferences).

Floccina

Jun 18 2008 at 10:45am

A strong correlation may not be much evidence of causation but a strong lack of correlation can be strong evidence of a lack of causation.

David Jinkins

Jun 18 2008 at 10:51am

Correlation does not imply causation. If you create pairs of completely unrelated random time series then compare them, many will have high correlation coefficients. This is the famous result of Nelson and Plosser (I can’t find a copy online):

“Trends and Random Walks in Macroeconomic Time Series” (with Charles R. Nelson), Journal of Monetary Economics, 1982.

More intuitively, if I go data mining and find that historically every time the Rolling Stones have played in Chicago a panda has died, it doesn’t mean that environmentalists should boycott Stones concerts.

mobile

Jun 18 2008 at 1:53pm

An oldie-but-goodie concrete example: Ice cream production and forcible rape rate monthly time series have a correlation coefficient of 0.84.

Jun 18 2008 at 3:52pm

If panda deaths are correlated with Rolling Stones concerts, or ice cream sales are correlated with rapes, people aren’t generally going to be inclined to say that the concerts caused the deaths or the sales caused the rapes (note though, a person’s skepticism in this regard has very little to do with the fact that the correlation didn’t emerge out of a controlled experiment. If you did a controlled experiment that found the same results, most people would still be hesitant to attribute causation).

The reason for this hesitancy, I think, is two fold. First, we have certain ideas about how the world works, and according to those ideas, playing Ruby Tuesday doesn’t kill pandas and selling ice cream doesn’t cause rapes. Second, the less data points we have, the more likely we are to chalk a correlation up to mere coincidence. Presumably, if the connection between panda deaths and Stones concerts happened often enough, we would start looking for a causal explanation even if it weren’t readily apparent (perhaps one of the zoo keepers is demented, and whenever the Stones come to town he hears voices telling him to kill a panda). In the case of ice cream and rape, for example, it turns out there is a connection: warm weather.

Byron Schlomach

Jun 18 2008 at 10:27pm

Well, here’s another little econometrics lesson. Let’s talk macroeconomics. Total Real GDP is correlated to population. Because we tend to see population rise over time, we also tend to see Real GDP rise over time.

Now suppose I run an OLS regression on a time series of population and Real GDP for a given nation and it shows a strong correlation between these two variables. I suppose the “Kling judgment” would be that causation and correlation are two different things. These are time series and, therefore, suspect. I suppose this sounds sophisticated, but it would also be….WRONG.

Bill Drissel

Jun 23 2008 at 10:28pm

Dr Kling,
The reason correlation isn’t causation is because every monotonic time sequence is correlated (positively or negatively) with every other monotonic time sequence that happened at the same time. Furthermore, if you allow:

*time shifting (CO2 rises xxx years after an increase in temperature) and

*time scaling (gasoline prices rise quickly in response to crude oil price increases but decline much more slowly in response to crude price declines)

then every monotonic sequence is correlated with every other monotonic sequence.

I believe this observation is original.

Regards,
Bill Drissel

Comments are closed.