Data Molesters

In my opinion, the authors of The Logic of Political Survival should not be criticized for mishandling data.

They should be arrested. Imprisoned. Only released back into the community with warnings to neighbors to protect your children.I ordered the book after listening to an econtalk interview with one of the authors, Bruce Bueno de Mesquita. He and his co-authors argue that political dynamics differ between broad-based democracies and narrow-based governments. In narrow-based governments, the leader buys off a narrow coalition with private goods. In broad-based democracies, the leader needs to provide public goods, such as property rights.

The authors describe the determinants of these dynamics in terms of three quantitative variables:

W = the size of the winning coalition, the minimum number of supporters that the leader needs to obtain power

S = the size of the “selectorate,” the group of people eligible to participate in the winning coalition. If the winning coalition is a majority of voters, then the selectorate is the eligible voting population.

W/S = the ratio of the two

Throughout the book, I struggled with how these concepts might be given precise definitions that permit real-world measurement. The authors present many regression results involving these variables, but the first time I skimmed through the book I did not look closely at either the regressions or the descriptions of the variables. When I did take a closer look, here is what I found on pages 134-135:

The POLITY IV collection of data…include a number of institutional variables…We use another POLITY variable, Legislative Selection (LEGSELEC), as an initial indicator of S…

POLITY codes this variable as a trichotomy, with 0 meaning that there is no legislature. A code of 1 means that the legislature is chosen by heredity or ascription or is simply chosen by the effective executive. A code of 2, the highest category, indicates that members of the legislature are directly or indirectly selected by popular election…We divide LEGSELEC by its maximum value of 2 so that it varies between 0 and 1.

When I teach statistics in high school, one of the basic concepts is the difference between a quantitative variable (something like inches or dollars) and a categorical variable (something like Democrat, Republican, or independent). The authors’ theory of W and S describes quantitative variables. Instead, to obtain S, they use a categorical variable that has three categories. To get from a categorical variable to a quantitative variable, they treat the coding convention (0, 1, or 2) as if it were a scale.

Among other problems, this elementary error means that when you create W/S, there is a division-by-zero problem. Eventually, the authors appear to have figured this out, for they write,

We therefore construct a variable…by dividing W by (log((S+1)*10))/3. For convenience, we refer to this variable as W/S. We make this transformation of S to avoid division by zero…

It also turns out that W is not a quantitative variable, either. Rather, it is the sum of five binary variables. For example,

When REGTYPE is not missing data and is not equal to code 2 or 3, so that the regime type is not a military or military/civilian regime, we award one point to W.

My point is this: The correct way to use categorical data on the right-hand side of a regression is to preserve it as categorical data, using what are called dummy variables. In the case of LEGSELEC, you would create two variables, call them S1 and S2. S1 would have a value of 1 when the legislature is selected by heredity or ascription, and 0 otherwise. S2 would have a value of 1 when the legislature is popularly elected, and 0 otherwise. Then enter S1 and S2 as separate variables into the regression, and let the data select the coefficients on the two variables. What the authors have done amounts to arbitrarily constraining the coefficient on S1 to have a value that is twice the value of the coefficient on S2. Even the authors regard this constraint as implausible when they say,

It should be evident that in reality the size difference between a selectorate score of 0 and a selectorate score of 0.5 is smaller than the size difference between a score of 0.5 and a score of 1.

Similarly, by constructing W as the sum of the five binary variables, the authors are arbitrarily constraining their coefficients to equal one another. Instead, each of the five variables that make up W ought to be treated as a binary variable, and entered separately into the regression equation. That way, you would have some idea whether the data support the constraint that each of the coefficients ought to be equal.

Of course, it is only interesting to tests constraints on the data that are imposed by theory. In this case, the constraints are being imposed by simple incompetence.

The Logic of Political Survival is a stimulating and provocative book. I was impressed by the authors’ use of historical examples, particularly the use of King Leopold’s different approaches to governing Belgium and the Congo as a “natural experiment” demonstrating that institutional characteristics matter more than the leader’s personality. However, in my view, the attempts to introduce formal game theory and econometrics did more harm than good. Rather than bridge the gulf between political scientists and economists, they widened it–as far as I am concerned–by their shameful and unseemly conduct with the data.

READER COMMENTS

READ COMMENT POLICY

dearieme

Mar 4 2007 at 9:35am

A friend keeps telling me that “All medical research is rubbish”. Should we extend it to “Medical research is useless and much Social Science research is worse than useless”?

Martin

Mar 4 2007 at 11:11am

Arnold,

This is why Economics fails the scientific smell test.

Economic mathematical modelling seems to involve way too much use of the word ‘If’.

There are no ‘Ifs’ in ‘1 + 1 = 2’, no ‘Ifs’ in ‘E = MCsquared’.

Economics’ ONLY mathematical, scientific absolute is the law of supply of demand. Everything else is social, but not science.

Buzzcut

Mar 5 2007 at 12:36pm

I don’t know about E=mc^2, but a lot of mathematical physics modeling “doesn’t pass the smell test” under certain conditions. That’s why modern physics came about, because Newtonian physics failed at the subatomic level.

The difference between physics and economics is that physicists can do controlled experiements, so their confidence in their math models is much, much higher.

Mar 5 2007 at 3:33pm

Thank you.

I rest my case.

E

Mar 5 2007 at 9:15pm

To the authors’ credit, the data from the book are posted here:

http://www.nyu.edu/gsas/dept/politics/data/bdm2s2/Logic.htm

From the replication data, it should be possible to tell whether the implicit constraints are reasonable or not. And that’s a good thing. For much empirical research in economics and political science, it is quite challenging (to be kind) to obtain replication data.

Mar 6 2007 at 4:11am

You might be right – but in Newtonian physics, even although much of it fails at ‘Buzzcut’s’ sub-atomic level, if you drop an apple it will still always fall to the ground.

Comments are closed.

Arnold Kling

READER COMMENTS

dearieme

Mar 4 2007 at 9:35am

Martin

Mar 4 2007 at 11:11am

Buzzcut

Mar 5 2007 at 12:36pm

Martin

Mar 5 2007 at 3:33pm

E

Mar 5 2007 at 9:15pm

Martin

Mar 6 2007 at 4:11am

RECENT POST

Rent-Seeking and Global Warming

Arnold Kling

Shimer on Acemoglu

Arnold Kling

Data Molesters

Arnold Kling