Teaching a class with 90 students gives me a newfound appreciation of multiple-choice tests. One of my objections to them has been that questions are not robust, in that students’ answers may not reflect their knowledge.

My concern is that multiple-choice tests are subject to type I and type II errors. A student who knows the answer can get a question wrong by misreading it or giving it a more “nuanced” reading (type I error). Conversely, a student can get it right by being lucky (a type II error). One way to deal with this is to ask a lot of questions and hope that the law of large numbers is on your side.

Another approach is what I would term rank-order multiple choice. You give students a group of three or four questions, each one of which has a different answer. For example,

Of the three statements below about increased trade between a country with a highly-educated work force and a country with a poorly-educated work force, one is true, one is false, and one is uncertain. Select the correct answer for each question.

1. The country with a highly-educated work force will have a comparative advantage in some goods and services, but not in others.

2. The country with the highly-educated work force will tend to run a trade deficit.

3. The country with the highly-educated work force will suffer a decline in productivity and wages.

My claim is that grouping these questions together and effectively asking students to “rank-order” them in terms of truth or falsehood should reduce at least the type I errors. So I’m thinking that this way of asking questions would lead to results that are more robust than simply asking three separate questions.

Here is another example:

In questions 4-7 below, rank the standard of living of the following people. Put the highest-ranking person in 4, the next-highest ranking person in 5, etc.

(A) someone in the 50th percentile of the Mexican income distribution in 2000

(B) someone in the 20th percentile of the U.S. income distribution in 2000

(C) someone in the 20th percentile of the U. S. income distribution in 1970

(D) someone in the 75th percentile of the U.S. income distribution in 1850

Another example might be to rank four goods in terms of their likely elasticity of demand.

My hypothesis is that putting questions in this format will cause students to see possible errors and correct them, thereby reducing type I errors.

For Discussion. What suggestions do you have for giving multiple-choice tests that are robust?