From Richard Arum and others. They find that employment outcomes of college graduates are positively related to student performance on the Collegiate Learning Assessment (CLA).

*Who cares?*. As Bryan points out, this sort of study confounds ability with learning, which makes it uninteresting. What would be more compelling would be a finding that students whose CLA scores rose the most during college had the best employment experience. That would come closer to saying that the students *learned* something, and that what they learned affected employment outcomes.

What adds to the frustration is that the authors had the data. That is, they had the data on growth in CLA. However, they stuffed it into another variable:

Academic engagement/growth is a summary measure including taking courses with reading and writing requirements, hours studying and demonstrated growth on the CLA.

So instead of a variable that comes closer to separating learning from ability, they combined it with other variables that easily could be correlated with ability. Frustrating.

I should add that, in general, creating an index out of variables instead of entering the variables separately is bad practice. You are starting from a situation in which the dependent variable, Y, might be determined by X and Z (assuming linearity) as

Y = aX + bZ

where a and b are unknown coefficients. When you create a "summary measure" that combines X and Z, you are imposing a ratio a/b that is based not on the data but instead on your arbitrary assumptions. Unless you have some strong theoretical or empirical reason to impose a specific ratio (which is very unlikely to be the case here), doing so produces statistically biased results.

That's a very charitable way to put it. To pick a technical nit, there is a difference between bias in a statistical method, and a failure in the formulation of the model to address the actually interesting question.

Frustrating indeed.

On second thought, maybe I was too terse in my first comment.

I'm persuaded by the substance of the post, but the general note against index variables in regressions goes a bit too far. An index variable can be valuable when the variables combined are imperfect proxies for the same underlying variable, that we wish we had but can't measure directly. Combining the proxy variables into a summary is then not a matter of imposing arbitrary coefficients, but just averaging out the errors in the proxies as measures of the effect we want to model. Including the proxies directly would waste degrees of freedom and introduce noise.

The "strong theoretical reason" required is not about the ratio (combining coefficients) but the assumption that the combined variables are measures getting at the same thing.

The problem here is that the author's "engagement" summary measure is not a useful construct but a conflation of several effects, including the one that we really should care about.

Sure.

So what? *Some* people learn a lot in college, and gain benefit from it. (Me, for example, albeit decades ago.)

Others don't seem to gain much marketable skill, but get lots of life growth and social skills (some people close to me.)

Some people apparently gain little skill or knowlege in college, but get a "signaling benefit".

For other's it's a complete waste.

Is there some other outcome that anybody really expects? Does anybody really think college is a waste for *everybody*? Does anybody really deny that college *is* a waste for some percentage? Is anybody foolish enough to think it can be reliably predicted in advance which category anybody will fall into?