Bryan Caplan  

Teachers and Income: What Did the Kindergarten Study Really Find?

PRINT
The Legacy of Sargent and Sims... Business Innovation in a Free ...
"But what about the kindergarten study?"  I hear this question all the time.  Questioners are referring, of course, to Chetty et al.'s "How Does Your Kindergarten Classroom Affect Your Earnings?  Evidence from Project STAR."  Fans of the paper often claim that it shows that spending more on better teachers passes a cost-benefit test with flying colors: "One good kindergarten teacher adds $782,000 in value per year!"*

If you actually read the paper, though, a different story emerges.  The authors are extremely careful and transparent.  Here's what they really find about the effect of early schooling on adult income:

1. Project STAR was an experiment designed to test the effect of class size.  The experiment found that students assigned to small classes earned $4 more per year.  If you add demographic controls, students assigned to small classes earned $124 less per year. (more)

2. You can use Project STAR's data to (non-experimentally**) test for other effects.  When you do, almost all measures of teacher quality fail to increase adult earnings:
The few other observable teacher characteristics in the STAR data (degrees, race, and progress on a career ladder) have no significant impact on scores or earnings.
3. There is one measure of teacher quality that does matter: Whether the teacher has more than 10 years of experience.  Chetty et al. find that students assigned to a kindergarten teacher over this experience cut-off eventually earn $1093 extra dollars per year.  But bear two reservations in mind.  (a) The t-stat is only 2.4 - extremely low for a non-experimental test with 6005 observations.  (b) If you measure experience in years, rather than using their binary "more than 10 years of experience" variable, the point estimate is a statistically insignificant $57 per year.

4. Teacher experience only matters in kindergarten:
The effect of teacher experience on test scores is no longer statistically significant in grades 1-3. Consistent with this result, teacher experience in grades 1-3 also does not have a statistically significant effect on wage earnings.
By my count, Chetty et al. have five measures of teacher quality (degrees, race, progress on a career ladder, >10 years experience, and years of experience), study four different grades (K, 1, 2, and 3), and discover precisely one statistically significant effect on income.  With 20 different measures, you'd expect one to be statistically significant at the 5% level by chance alone.  And these are just the non-experimental results.  Experimentally, Project STAR finds that class size does not raise adult income.

"How Does Your Kindergarten Classroom Affect Your Earnings?" is one of the most impressive empirical papers ever written.  The data collection is amazing.  The empirical analysis is clear and careful.  Above all else, the authors' methods are transparent.  Lesser authors would have buried the conflicting findings.  Chetty et al. took the high road. 

Unfortunately, the world is so eager for stories about the power of early education that their paper is being badly misinterpreted.  Chetty et al. don't confirm romantic hopes about teachers that change young minds forever.  Despite their pro-education tone, they expose these hopes as wishful thinking.

* This is the paper's back-of-the-envelope calculation of the present discounted value of 1 SD improvement in "classroom quality," not teacher quality.  But the figure has a life of its own.

** Raj Chetty, who kindly previewed this post, argues that all their results are "experimental," but this is a stretch.  The STAR experiment was randomly assigning kids to big or small classes.  Using the same data to isolate other effects remains observational.  Any of the measured effects could ultimately reflect correlated confounding variables - as they could in any observational study.  E.g. maybe experienced teachers have nicer classrooms, or more teacher's aides.


Comments and Sharing





COMMENTS (12 to date)
Ryan M writes:

Do you feel the same way about Heckman and the effect of preschool?

David R. Henderson writes:

When Chetty et al say that race is a measure of teacher quality, what do they mean? How do they divide the races and which race or races do they say is (are) higher quality?

Eric Falkenstein writes:

Can you state the reasoning used by the many who make the claim about the $700k value for good teachers? It clearly isn't a take-away for you, but there's some slice that extrapolates that way.

floccina writes:

Just on logical grounds it is hard to take the results seriously. It is difficult to imagine how could a kindergarten teach possibly cause such an effect. I would have an easier time believing that an intervention in the teen years would make a significant effect than in K. I thought from when I first heard it that it was a spurious result. I would need to see 10 such studies with the same result to firmly believe it.

ciro curbelo writes:

Bryan: I would love to hear your take on this Hanushek's paper: http://www.nber.org/papers/w16606.pdf?

mark writes:

it is noteworthy that none of the education industry lobby who embrace the paper for what it allegedly proves about what "good kindergarten teachers" should be paid acknowledge that it equally proves that a bad teacher is worth nothing at all. Maybe babysitting wages.

Arthur_500 writes:

We have often heard the correlation that increased teacher pay means smarter students and if you don't pay teachers well the inverse must be true. I see this on picket signs every time a school district goes on strike.

However, a Harvard study many years ago found an inverse correlation between teacher income and student test scores.

What this means is that you can't take simple variables and extrapolate much from them.

A teacher who inspires students to greater learning is the best teacher one could ask for. As with any profession there are those who are excellent, most are average, and some that are rather poor.

With the need for so many teachers I fail to believe that we can find a simple item such as pay or experience that predicts excellence in the classroom. It really comes down to good management - proper tools, good oversight, quality support. Even a poor teacher can do a reasonably good job with the appropriate oversight and quality support.

Roger Sweeny writes:

With 20 different measures, you'd expect one to be statistically significant at the 5% level by chance alone.

As the comic xkcd showed, green jelly beans cause acne.

http://xkcd.com/882/

David O writes:
With 20 different measures, you'd expect one to be statistically significant at the 5% level by chance alone.

This is actually clearly false, at least in finite samples. The statistical significance of any variable depends on the standard error, which is a function of the sample variance and the number of observations. Data with sufficiently high variance and low numbers of observations could have in expectation 1 out of 1000 irrelevant variables being significant by chance alone.

Max writes:
By my count, Chetty et al. have five measures of teacher quality (degrees, race, progress on a career ladder, >10 years experience, and years of experience)...

People will tsk at you for calling race a measure of teacher quality... perhaps you meant that Chetty et al. measure five qualia of teachers :)

jonathan writes:

sorry, david o, but you're the one that's wrong. in any set of data sets under the null, ie, all 0 results, p values are uniformly 0-1 distributed irrespective of variance or sample size. thus the expectation is that 5 percent will be significant at the 5 percent level

KevinH writes:

I haven't read the primary paper, but I think one important stat to publicize is the amount of variance any model that we have can explain. That gives us an idea how much or how little of an idea about how much we understand the relationship between early education and long term success. There's always the problem of what they didn't know to model either. Did they try any of the budding methods of scoring teachers on different types of interactions?

Comments for this entry have been closed
Return to top