Elaborating on a point I raised , I am going to make the following conjecture:

In an evaluation scale (e.g, rate this professor on a scale of 1 to 5), the mean evaluation is biased toward the middle.

Is this conjecture false? Is it true, but widely known? If it is true but not published, then someone should formally prove it and submit it to a journal.
When someone is asked to respond to a survey with an answer based on a subjective point scale, the response may not be the person's true best response. The respondent may be reacting to a particular mood, or using a particular interpretation of the question, or making the wrong mark with a pencil.

The average measurement error on points scales is not zero, because the scales are truncated at the endpoints. On a scale of 1 to 5, a person whose true evaluation is 1 can only make an error in one direction--toward a higher number. Similarly, a person whose true evaluation is 5 can only make an error in the other direction--toward a lower number.

If the true group mean is 4.6 out of five, then the measured group mean is likely to be lower, because low-side errors will be included but high-side errors will be truncated. Conversely, if the true group mean is 0.4 out of five, the measured group mean is likely to be higher.

If the scaled evaluation is used as a dependent variable in a regression, the slope of the line will be biased toward zero. For example, suppose that in truth there is a positive relationship between X and Y. If Y's values are biased toward the middle because of measurement error, there will be fewer observations with high-X, high-Y or low-X, low-Y than there would be the case if Y were measured without error.

Businesses use these type of questions all the time and should have very good knowledge of how the answers are biased.

Check with someone in the business or marketing school.

For professor evaluations, a lot of my classmates would give straight "excellent" ratings to all professors regardless of their actual opinion of the professors' performance. I would expect that in cases where the evaluation is perceived to have a real impact on the evaluatee, but there's no cost to the evaluators for overly generous evaluations, the evaluations will be biased towards the top of the scale out of a desire to be nice.

I look forward to responses to this question. I do not know what the answer is, but I would venture that the mean is biased toward four simply because most people have no real criteria for making the evaluation and would rather err on the positive side than negative. People like to be generous and a 4 gives them that opportunity while leaving room both above and below for exceptional or poor ratings. To phrase this differently, for most questions people really only have three opinions (excellent, average or poor) and will throw in some generosity to give a rating of three, four or five.

I don't spend my time working on these problems, so this is a serious question: Is there a practical use to which such information is put for which this observation, if correct, would make any difference?

Your answer here

[I think maybe Karl Smith meant here:

http://modeledbehavior.blogspot.com/2007/04/evaluation-scales.html

Econlib Editor]

What do you mean "biased?" For a measure to be "biased," there has to be some "true" value that it is biased away from. I'm not sure what it would mean, conceptually, to talk about the "true" values of the responses in these surveys.

Of course if we measure a continuous variable and then censor it on the ends, a formula for the bias is trivial to calculate, given the distribution of the underlying variable.

But in this case, I'm not sure there is any underlying variable that is even defined. If the only variable we are interested in is "how do people answer the question on a five-point scale," then there can't be any bias---the average of the answers in a random sample must be an unbiased estimate of the average answer in the population.

The mean is always biased by outliers. Example: 5 software engineers are at a lunch table. Each earns $85,000 a year. The mean earnings are $85,000. Bill Gates sits down at that table. Now mean income is now $85 million a year. Is the new mean biased? You bet! It is wildly unrepresentative of earnings for 5 of the 6 people at the table.

This point is more than obvious. When polling institutes ask a question, depending upon how the question is framed, people respond differently.

Secondly, people like to confirm or be nice. They judge what others are expecting as an answer and provide that answer.Thats why most new product launch survey gives a very positive answer while the product launch fails terribly in most cases.

Thirdly, the answer can be culture specific. I had a law professor who would give at max 60 out of 100 for the best student. He reasoned that despite being top of his class at his student years, he never got above sixty and thus giving more than 60 isn't right. Just look at the way professors/ teachers grade their students. I knew a professor in English who would read the first answer and from second answer on award marks based upon the word count.

In my case i always put 3 or 4 out of 5 in any evaluation test where i have no personal feelings as its easy to cross. No need to think

Item response theory addresses this and a number of related questions: http://en.wikipedia.org/wiki/Item_response_theory.

The items presented in evaluation scales may be of poor quality.

Arnold makes some very good points about rating scales. Another issue is the composition of the audience. The more homogenous the group taking the test is, the more likely it is that you'll get mean scores above or below the median of the rating scale. In my limited teaching experience, I rarely had homogenous groups in which everyone liked me or no one liked me. Usually, a small group liked me and the material very much, a small group hated both, and the rest were asleep. So the average score on the evaluation was in the middle of the scale. Comments reflected the extremes.

I agree that this is a poor use of the word "biased", especially compared to the sense that the word is used with in econometrics. And what Les is talking about is skewness, not bias.

So, on a 5-point evaluation, does the mean tend to be 3, with the corollary question, is the distribution symmetric, and thus, more or less normal?

My personal experience is no. At three different schools I've worked at the mean is typically closer to 4, with very, very few 1's and 2's. As mentioned above, grade inflation is basically costless, and any prof with half a clue knows how to game the system.

I work A LOT with survey data based on a 1 to 5 scale, and we rarely see 3 as the mean. It's usually, as another poster said, much closer to 4. In general, different questions will obviously have different mean answers, but from my experience it's usually closer to 4 than to 3.

Yes, you can only make an error in one direction at the edges, but what percentage of people make errors on any given question? That's where the Law of Large Numbers gets involved - the number of errors is usually insignificant relative to any given question, so it can't pull the mean significantly in any direction.

Even though people at the edge can only err in one direction, this only effects the mean if there are differences in the number of people at the edge (or their error rate). If 5 people make an error and write 2 instead of one, and 5 people make an error and write 4 instead of 5, when calculating the mean these errors cancel each other out. The same reason that if 5 people who would have said 3 say 2, and 5 who would have said 3 say 4, the mean doesn't change.

What do you mean "biased?" For a measure to be "biased," there has to be some "true" value that it is biased away from. I'm not sure what it would mean, conceptually, to talk about the "true" values of the responses in these surveys.I think Arnold means that the effect of a variable on ranking will be biased downards in maginitude.

The basic assumption that the errors are uncorrelated with the determinate variables is violated.

If for example speaking clearly tended to give someone a high ranking then the errors would be negatively correlated with speaking clearly.

Why?

Because if someone spoke clearly they would tend to get more 5s which have no possibility of a positive error.

If someone did not speak clearly they would tend to get more 1s which have no possibility of a negative error.

Thus the errors move in the opposite direction of the determinate variable. This will tend to downardly bias beta-hat.

The reverse will be true for a determinate variable that is negatively correlated with the ranking.