Arnold Kling  

Deadly Chicago Econometrics

PRINT
Macro Reading List... Medical Guidelines Commission?...

Bryan's post on the statistical findings about parenting and health care reminds me of what we at MIT used to call "Chicago econometrics."

A fundamental fallacy in classical statistics is to say, "Therefore, we accept the null hypothesis." The classical statistical tests are designed to make it difficult to reject the null hypothesis (innocent in until proven guilty). See this lecture.

So, if you set up a classical test with the null hypothesis as "health care has no effect," you are giving yourself a low probability of rejecting that hypothesis. Getting a statistically insignificant result is no more than that--an insignificant result. All claims that you have shown zero effect of the independent variable are fallacious, because you gave yourself a high probability of showing zero effect to begin with.

Why we called this "Chicago econometrics" I don't know. Maybe it was just a way of venting at our rivals--kind of like those Maryland basketball fan T-shirts that say "Duck Fuke."

In addition to the logical flaw, there is the practical problem that measurement error biases regression coefficients toward zero. So, if your measure of schooling, or parenting, or health care, is a poor proxy for the correct variable, then you will get a low coefficient for that reason alone.

One of my pet peeves is lazy econometrics, where someone tries to estimate an aggregate relationship for a disaggregated process. For example, you can show a significant relationship between many specific medical procedures and longevity for people with the relevant conditions. Yet if you take an aggregate proxy for medical care and an aggregate measure of longevity, there is no relationship. I think of the latter as lazy econometrics rather than a description of the real world.

Similarly, with parenting, my guess is that there are specific parenting practices that have significant effects on certain types of children. But trying to look at aggregate effects of "parents" on "children" while controlling for genetic effects gets you low coefficients due to lazy econometrics. Add to this the "Chicago econometrics" fallacy of saying "therefore we accept the null hypothesis, and you have a recipe for poor scientific practice.


Comments and Sharing


CATEGORIES: Economic Methods



TRACKBACKS (1 to date)
TrackBack URL: http://econlog.econlib.org/mt/mt-tb.cgi/274
The author at Exploit the Worker in a related article titled Econometrics Blogging writes:
    Bryan Caplan and Arnold Kling are going back and forth and back and forth on topics related to aggregation and rejecting null hypotheses. Sure, that sounds boring, but these issues have important implications for the conclusions we draw from empirical... [Tracked on June 3, 2005 4:09 AM]
COMMENTS (10 to date)
Chris writes:

Go Arnold! Thanks for clearing up this issue.

Chris writes:

BTW, what do you mean by "classical" statistics? Is there some other form of analysis for evaluating the significance of results?

Conchis writes:

Chris: The alternative to Classical Statistics is the (unfortunately initialed) Bayesian Statistics, which tries to attach probabilities to relationships given the data (e.g. "given the data, what are the odds that there's no relationship between healthcare and health?"), as opposed to the classical approach which focuses on the probablility of observing particular data given a certain relationship between the variables. (E.g. "what are the odds that we'd see this data if there were no relationship between healthcare and health?") At least to my mind the Bayesian approach makes a lot more sense, and is much easier to work into standard decision theory, but is often tougher to implement.

Arnold: Is your beef with estimating aggregate relationships per se, or just with those who estimate only aggregate relationships and don't dig any deeper? The fact (assuming hypothetically that that the claims were sound) that there's no aggregate medicine-health relationship, but that there are a number of positive relationships at the disaggregated level would suggest exactly the sort of process Bryan posits. That seems to me to be an interesting finding...

Arnold Kling writes:
The fact (assuming hypothetically that that the claims were sound) that there's no aggregate medicine-health relationship, but that there are a number of positive relationships at the disaggregated level would suggest exactly the sort of process Bryan posits.
That would be true if there were no measurement error. But if you have disaggregated data that is measured accurately and aggregate data that consists of poor proxies for what you want to measure, you get biased estimates in the aggregate data.
conchis writes:

Arnold,

Sure. But again, that seems to be a criticism of poorly-implemented aggregation in econometrics rather than aggregation per se, so am I correct in assuming that your problem is with the former rather than the latter?

eve11 writes:

As a quick aside... I'd never heard of measurement error biasing estimates toward zero, though I can see how it is true. So is the existence of unrecorded measurement error part of the reason that shrinkage estimators (ie James-Stein) work so well in practice? I know there are theoretical results that show that overall risk is decreased when you shrink estimates toward zero even without the presence of measurement error; but I'm wondering if in practice, shrinkage estimators do well partly based on theory and partly based on the presence of measurement error. Also, can they estimate measurement error by the amount of shrinkage that is needed?

Right, but what I wanted to say was that, while I'm more of a Bayesian than a frequentist, it's not really fair to place the blame of "accepting the null" on the frequentist's shoulders: even in stat 101 people are taught that "failure to reject" does not mean "accept".

So, if you set up a classical test with the null hypothesis as "health care has no effect," you are giving yourself a low probability of rejecting that hypothesis.

I don't know that I agree with this statement. You automatically give yourself a low probability of rejection if the null is true. The overall probability of rejection still depends on whether or not H0 is true, and on how far from the truth it is. I'll certainly admit it's easier to think of this in Bayesian terms because then you can say things like "the probability H0 is true" as opposed to just saying "it's either true or not."

Then of course there's the other side of the coin with hypothesis tests: given a large enough sample size, you will ALWAYS reject the null hypothesis. Then you have statistically "significant" results, but they don't necessarily mean anything in context.

conchis writes:
Then of course there's the other side of the coin with hypothesis tests: given a large enough sample size, you will ALWAYS reject the null hypothesis.

This is probably nitpicking, but presumably you don't mean "ALWAYS" to include the case where the null is actually, exactly true - but rather that, even if it's arbitrarily close to being true, there will be a sample size large enough to guarantee rejection, no?

eve11 writes:

Conchis: from a Bayesian perspective, since the null hypothesis is just a single value, it has probability 0 of actually being true. So, technically I should say "almost surely always" (ie, true except on a set of probability 0).

Of course if you're a frequentist, then any talk of "the probability that the null is true" doesn't really make any sense. In that case I still have to disagree with the notion that a hypothesis test automatically sets you up with a small probability of rejecting the null. It depends on whether the null is true or not. If you wanted to use Bayes rule, you can write out the overall probability of rejection as a function of the marginal probability that the null is true: P(reject H0) = P(reject HO| H0 false)P(H0 false) + P(reject H0| H0 true)P(H0 true). But in frequentist circles, you are kind of stuck with those conditional statements P(reject H0 | H0 true) (ie, the alpha level that is set beforehand) and the associated power P(reject H0 | H0 false), which depends on the sample size.

conchis writes:
conchis writes:

Actually (to be even more pedantic) while it's perhaps unlikely that you'd do so, what's to stop you specifying a prior that places a strictly positive probabilty on the null?

Comments for this entry have been closed
Return to top