Arnold Kling  

The Methodology is Flawed

PRINT
Atlas Shrugged--And I Didn't... How Could the Draft Not Be Sla...

My father always said that there were three iron laws of social science.

1. Sometimes it's this way, and sometimes it's that way.
2. The data are insufficient.
3. The methodology is flawed.

On politically sensitive topics, the tendency is to apply (3) to the other guy's methodology, never to your own. But if my comments here were not sufficiently clear, let me say that I do not believe that the Conley-Dupor paper is reliable. If I thought that they were, I would not have said that this is an important research question going forward. Having said that, I give them credit for doing an actual study, as opposed to cranking out a pre-existing computer model.

In the stimulus-debunking category, I tend to find the work of John Taylor and John Cogan more convincing than Conley-Dupor. Taylor and Cogan claim that the states used stimulus money mostly to reduce borrowing. That would imply that the stimulus affected neither public sector nor private sector jobs, in any direction.

On another issue, Gifted and Talented programs, people are attacking the method of looking at students who were borderline. But the whole point is that it is not helpful to make an observational study comparing the performance of very bright students in GT and very weak students not in GT. The best way to measure the impact of the GT program itself is to look at students who were on the borderline, and compare what happens to those in GT and those not in GT.

If that methodology does not satisfy you, then run a controlled experiment, in which you randomly assign some very bright students into GT classes and others into non-GT classes. My guess is that the long-term educational outcomes will not differ. But don't just tell me stories that bright students are bored if they are not in GT as if that invalidates the methodology of looking at students on the borderline.


Comments and Sharing





TRACKBACKS (1 to date)
TrackBack URL: http://econlog.econlib.org/mt/mt-tb.cgi/4996
The author at Process Revolution in a related article titled Sapiens Lacuna (wise words from Arnold Kling) writes:
    from Arnold Kling The methodology flawed   My father always said that there were three iron laws of social science. 1. Sometimes it’s this way, and sometimes it’s that way. 2. The data are insufficient. 3. The methodology is flawed. On... [Tracked on May 16, 2011 2:59 AM]
COMMENTS (11 to date)
chipotle writes:

If that methodology does not satisfy you, then run a controlled experiment, in which you randomly assign some very bright students into GT classes and others into non-GT classes.

This idea suggested above is plainly unethical.

Ted writes:

The issue is one of data quality. You would need very sophisticated data and experimental control to really figure out what ARRA did, something that just isn't feasible. Setting aside "evidence" from absurd large-scale econometric models discredited in the 1970s, I think a reasonable interpretation of the evidence thus far is that the stimulus probably had a modestly positive effect on employment and output - but not anywhere near was liberals told us it would be. What I find amusing though is how irrelevant all of these studies are for policy. The only reason to use fiscal stimulus is because the zero lower bound has become "binding." Except, it's not binding. Why not pursue Lars Svensson's "Foolproof Way" out of a liquidity trap? It would undoubtedly accomplish what fiscal stimulus was allegedly suppose to do, and it would have done it a lot more effectively and without spending nearly $1 trillion.

Also, I want to bring up one issue that isn't discussed nearly enough. Just because government spending increases output, doesn't mean it increases welfare - which is the relevant policy target. For example, even in Baxter and King's classic 1993 "Fiscal Policy in General Equilibrium," a temporary increase in government purchases will increase output, but it will also reduce welfare. So, merely observing that output or employment rose is hardily enough to conclude it was desirable. We first have to figure out what caused the shock and what's it's persistence is all about. If it's all about nominal rigidities (the New Keynesian story), undoubtedly the ARRA increased output and welfare. If the issue has to do with a shortage of interest-bearing assets brought about by, say, a risk shock (a New Monetarist-type story), then fiscal stimulus probably mildly increased output and probably increased welfare (this is because of the greater quantity of public debt created reduces the liquidity premia and promotes financial exchange of liquid assets - not the government spending itself. For example, the Fed could have sold government bonds in their portfolio and done the same thing fiscal stimulus did). If the problem is something along the lines of a Pigou Cycle / News Shock, sectoral reallocation, or a technology shock (all variations of a multi-sector Real Business Cycle story), then fiscal stimulus may or may not have increased output, but definitely decreased welfare. Unemployment in any of these stories can technically go either way. Observation of a rise in output or unemployment is hardily enough to conclude the policy was desirable.

Lord writes:

The overall conclusion that it did little to create jobs is not surprising given that there was no net stimulus and saving jobs is not the same as creating them. Not much evidence on private sector employment, both because it isn't mostly private sector and the errors are so great. Since we have already been seeing state and local governments cut employment, it seems unlikely they wouldn't have cut sooner without it even if they had to borrow to cover the gap while they did so.

Lee Kelly writes:

Is it the method or the methodology that is flawed?

The Man Who Was . . . writes:

But the whole point is that it is not helpful to make an observational study comparing the performance of very bright students in GT and very weak students not in GT. The best way to measure the impact of the GT program itself is to look at students who were on the borderline, and compare what happens to those in GT and those not in GT.

Uh no, the best methodology would also compare very bright kids in GT and very bright kids not in GT.

It is very possible that GT doesn't provide much to borderline students, but does provide something to kids at the top.

Eric Morey writes:

The Gifted and Talented study doesn't seem to be measuring the purpose of the programs. Their purpose is not to help students excel at standardized tests, which are themselves imperfect (to say the least) tools of measuring academic performance. Why then use that as a measure of the programs' success? And only for those students that marginally qualify? I wouldn't be surprised if a significant number of borderline students actually ended up worse later because their assignment to such a program was a reach.

chipotle hit the nail on the head in questioning the ethics of the random assignment of the brightest students to G&T and non-G&T classes.

Doc Merlin writes:

The Man Who Was is correct.

tom writes:

Arnold,

The study's authors have a different conclusion than you do about meaning of the marginal students' difficulties:

Hence, we argue that difficulties with the advanced material for marginal GT students (in the RD analysis) and an invidious comparison model of peer effects arising from a loss of relative rank in the within-class achievement distribution likely offset any gains from better peers helping each other and better teachers. While we cannot test this directly, we provide evidence that student’s course grades fall dramatically in both the RD and lottery samples. Given that grades have a substantial relative component, this is indicative or a drop in a student’s relative ranking in their class, a necessary condition for invidious comparison. Such a drop could demoralize the student in a way that hampers their performance on achievement exams.

The authors seem to think that their study shows that the bottom kids couldn't handle being at the bottom, which says nothing about the value of the program as a whole. (I would also guess that the program may have been too hard for them, which would be hard to separate out from 'peer effects'.) Why is your conclusion different from the authors'?

I also commented on your previous post.

tom writes:

To correct my post above, the authors themselves say that the Houston GT marginal students fail for two reasons (a) the material is too hard for them and (b) being at the bottom of a group is hard. (I had included the 'too hard' as my own guess but forgot that the authors explicity said it.)

Seth writes:

Another potential reason for the study findings. G&T teachers favor the best students and give them the benefit of doubt and tend to be harder on the supposed marginal students.

I think the problem with any study of academics is the assumption that grades or test scores are valid measures to judge effectiveness and that they should be how we judge whether we should have a program or not.

I put would put much more emphasis on what parents prefer. Do parents prefer to send their kids to schools with G&T programs, given the cost? Do parents with kids not in G&T programs prefer to send their kids to schools with them?

J Mann writes:

I don't disagree with Arthur's point here, but am interested in a few other things.

1) There's an interesting problem, which is that studies seem to generally show that no social or educational program works: abstinence education, sex education, headstart, holding kids back a year, moving kids forward a year, gifted & talented programs, foriegn aid, etc. Prof. Caplan is doing great guns arguing for bon temp roulez parenting.

It seems like from that, you can have a few reasonable responses: (i) it could work, but we're doing it wrong; (ii) the beneficial effects are swamped four years out, but that doesn't mean there aren't actual beneficial effects; (iii) it doesn't work, so we should stop public funding for it.

Speaking for myself, if you showed my that my school's G&T program wasn't producing results, or was the type of program that wasn't producing results, my response would be that we need a new G&T program, not "what the heck, just park my kids in a lowest common denominator curriculum for 12 years."

2) I'm still kind of offended by Arnold's second point in the original post:

"Either you believe your bright kids should experience going to class with students who are not so bright, or you don't. If you don't, then pay for private school. G&T allows you to send your kids to private school while claiming they are still in public school."

- Again, my kids are in a G&T program at a private school, so Arnold's point just leaves me befuddled. If you want to argue that we shouldn't do them because they don't work, I'm game for that. But if you want to argue that there's a moral component to having different options at public school, I don't get it. Are we allowed to have advanced music courses? Advanced language? AP?

Comments for this entry have been closed
Return to top