Garett Jones  

In Praise of Calibration

PRINT
Evil and Support for the Welfa... Firing Aversion: A Human Resou...
Nobelist Thomas Sargent said this, about early tests of rational expectations macro models back in the 70's: 

I recall [future Nobelists] Bob Lucas and Ed Prescott both telling me that those tests were rejecting too many good models. 

That phrase, "[T]hose tests were rejecting too many good models," has become a bit of a litmus test.  Either you chuckle slightly because you've seen a clear example of myside bias; or you treat it like a Zen koan, worthy of further contemplation.

I'm in the latter category.  

That's because the tests they were talking about were designed to see whether these skeletal macro models were missing something.  And the data kept telling Prescott, Sargent, and Lucas, "Yes, your model is missing something. The world is not precisely like your model." 

But who cares whether the world is exactly like your model?   The point of a model is that it's simple.  You remember Steven Wright's old joke

I have a map of the United States...actual size.  It says, 'Scale: 1 mile = 1 mile.' I spent last summer folding it. 

And now it's time to pull out the quote by Box:

..all models are wrong, some are useful.

The rise of calibration was one solution to the 1:1 scale problem in macroeconomic theory.  The calibration approach is simple: Build a causal chain with reasonably strong links, and see if the completed chain is able to bear substantial (but not infinite) weight in the real world.  It didn't always play out that way in practice (strong links are sometimes in the eye of the beholder), but we should have ideals to live up to.  

This isn't just about real business cycles (making hay while new technology shines)--it's a story that applies to New Keynesian models too: Is price stickiness big enough to explain how spending shocks can shift real output? [Answer: Depends on the kind of stickiness.  So be sure to grab the right weapon when you're in any-weapon-to-hand mode.]  

Sargent again:

The idea of calibration is to ignore some of the probabilistic implications of your model but to retain others. Somehow, calibration was intended as a balanced response to professing that your model, though not correct, is still worthy as a vehicle for quantitative policy analysis.

Among the successes of this approach--success at encouraging good science, not at providing correct answers to the fourth significant digit--I'd include:

1. Sargent and Ljungqvist's work on the link between high European unemployment and rising "economic turbulence": Helps explain why Europeans have high unemployment only post-1973 even though Europeans had generous unemployment benefits decades earlier.  Hint: Post-73, long-accumulated job skills sometimes become worthless, something that didn't happen before.

2.  Mehra/Prescott's Equity Premium Puzzle: A few numbers went a long way in shocking the profession into realizing that stocks aren't risky enough to justify their high returns unless people are terrified of modest losses. Another claim: The kinds of people who are terrified of modest losses are the kind of people who need a big incentive to shift their consumption from one period to another--yet the safe interest rate is very low.  You barely need to pay people anything to shift consumption across time.  

A puzzle with hundreds, maybe thousands, of resolutions, all of them probably wrong and some of them still useful. 

3.  Lucas's estimate of the cost of business cycles in terms of human well-being: Consumer spending wiggles little over the business cycle, so basic estimates showed that the average person wouldn't pay that much for an insurance policy that offered to stabilize average consumer spending.  That's a sign typical business cycles (not this one) don't influence human well-being very much.  An early extension (discussed p. 10 here) showed that in a country with a U.S.-sized safety net you can't use the "people might die" angle to boost the cost of business cycles by much.  A "quit your bellyaching" paper that spurred strong responses partly because it was a "quit your bellyaching" paper, most mostly because the numbers just grated.  

True enough to annoy: A sign Lucas did things right.  

In all 3 cases, there were fights about the numbers, about the setup.  None of these issues are settled (except for #3, Lucas is totally right there, not least because recessions are good for your health); they instead reframed issues in excellent ways.  

Calibration--which also goes by the name "quantitative theory"--is a move forward because it shifts the debate away from "positive or negative?" to "big or small?"  Lots of things might matter a little, but we should focus our attention on things that matter a lot.  

Does your favorite idea matter a little or a lot?  How would you know? 

Coda: It was only in 1971 that medical researchers began to understand why aspirin works.  Beforehand, people knew that it worked as a pain reliever in some settings but not in others, but they didn't know why it worked. Vane, author of the 1971 paper in Nature, won a Nobel for showing how aspirin works.

Causation first, comprehension later. 

Both are wonderful, but comprehension both sates and invigorates the mind. 



Comments and Sharing





COMMENTS (6 to date)
Andrew writes:

It's a shame you are still considered a "guest" blogger here.

Ritwik writes:

I see calibration as suffering from two issue s- one the opposite problem as the McCloskey critique, and another in line with the McCloskey critique. Calibration measures the 'oomph' not the mere existence. So far, so good. But then there's the precision. How robust is the model? How much does Prescott's estimation of the effects of European taxes on unemployment depend on labour elasticities that may not hold water?

Why no confidence intervals? That isn't that hard. Why no out-of-sample tests? Those aint that hard either.

Now even if the confidence intervals are unknown or very broad, if the oomph is big enough (or in Lucas's case, small enough)we might pay attention to calibration results decision-theoretically.

But the second part of the McCloskey critique now applies. Why no measurement? Why no simulation, at least? Calibration breaks the assumption-proof of existence - repeat paradigm of macro theory, only to replace it by assumption-estimation-repeat paradigm of macro theory.

If one stops there, it is not even clear what has been added to the intuition - which already existed - by the estimation which follows almost slavishly from the intuition. The equity premium puzzle, yes. Lucas and business cycles, probably yes. Prescott's work on Europe & taxes, and recessions and interest rates and labour productivity, not so much.

Of course, these problems are not restricted to RBC or calibration theorists. But do there already exists better methods?

How does calibration compare with Bayesian VARs, for example? The ones that tell us that we don't know half as much as we sometimes pretend to know. And that our models aren't that good in the first place.

Ken B writes:
It was only in 1971 that medical researchers began to understand why aspirin works. ... Causation first, comprehension later.
It was only after Pauli discovered the exclusion principle that we learned why we don't fall through the floor!

Although with Quantum Theory "comprehension" might be an exaggeration ...

john hare writes:

I would like to see a few outlaw zones set up to test a few theories. A zone in some area that residents voted90+% for.

A few city blocks for a zone to test the theory that the AMA and government are the causes of high health care costs. No regulations on who practices, what procedures are allowed, or what drugs can be given. I would predict a high death and malpractice rate paralleled by low costs and some successful experimentation.

A zone with no minimum wage or unemployment benefits. I think there would be high employment, multiple businesses trying to enter that zone, and a few people that couldn't get out fast enough.

A zone with no law enforcement. Either people would learn to value it, or find other solutions to the problem of crime.

A zone with zero subsidy free enterprise education.

A zone with no drug laws.

None of this is feasible in the real world, just things I would like to see proven or falsified with real data.

Eric Falkenstein writes:

Calibration just highlights debates, it doesn't solve them. With all the degrees of freedom in any theory, they can always fit the data. Macro economists are now patently irrelevant, why banks no longer have Economics Departments unlike in the 1970s. The forecasts are invariably partisan, driven more by perhaps accurate but nonetheless very non-rigorous prejudices.

Mike Rulle writes:

"A few numbers went a long way in shocking the profession into realizing that stocks aren't risky enough to justify their high returns unless people are terrified of modest losses".

A lot of heavy stuff in this post. I will just pluck out the small piece above and comment.

I know this was a point made by some at the peak of the tech bubble (one of the few bubbles one could almost mathematically prove was such before it burst---not literally prove of course, but by inference show what else would also have to be true for it not to be a bubble and its virtual impossibility).

But putting bubble predictions aside, this claim always struck me as absurd. We do not know the future. We only know the past. Equities are residual values. One must hypothesize high probability numbers regarding beating both operating and financial leverage, to conclude that only fear of "modest losses" is what keeps equity premiums "so high". Losing half your money in 17months (2007-2009), or losing more than half in real terms from 1968-1982 can keep one reasonably fearful. If one has infinite wealth and infinite life, then maybe no need to fear. Otherwise plenty of reasons.

Further, if we "out of sample" all equity markets from 100 years ago to today, as many have suggested, would we still think equity premiums are too high? I don't think so.

No biggie here. Just saying that little bit of attempted calibration was way wrong.

Comments for this entry have been closed
Return to top