Freeze - Statistics Police!

Heard of the study that finds that beautiful women have more girls and intelligent men have more boys? Andrew Gelman finds that at least the first finding is suspect on multiple levels.

  • You get a statistically significant effect if you compare the top quintile of beauty to the bottom four quintiles. The effect goes away if you simply estimate the sex ratio as a function of beauty.

  • The author apparently had three measures of beauty, but only used one. He should have averaged them instead of picking one. Why didn't he?

  • The author miscalculated the implied probabilities of his logit model. His estimate implies an 8% effect, not a 26% effect. (Journalists then somehow inflated 26% into 36%!)

    Gelman avers that he doesn't want to set himself "up as some sort of statistical policeman." Why not, I ask? The academy could use a good, honest cop on the beat.

    Alex writes:

    Why don't more people just use probit? If you know any statistics, the models are much easier to interpret, and with modern stats packages the algorithms are about as reliable and fast as logit.

    Andrew Gelman writes:


    One reason I don't want to set myself up as the statistics police is that I don't want to discourage people from doing innovative research--that is, I'm sensitive to Andrew Oswald's comments discussed here.

    The bigger reason is that I'm more interested in positive findings (my own research and others'). It would be pointless to go through thousands of research articles looking for statistical flaws. Some of this is good (for teaching purposes, to discover some general principles of statistical reasoning, and occasionally to deflate some overhyped claim) but spending too much effort on this would reduce my time for better work.

    zoevans writes:

    Until I read this thread I hadn't ever given any thought to why beautiful women seemingly have more daughters or why intelligent men tend to have more sons, then unattractive women and average thinking men. I don't know if I agree with the author. I think that in any attempt to measure beauty or intelligence, which in both cases neither has a standard or scale to be measured precisely or accurately, the “label” that a person obtains is purely subjective and completely relative. Overall the idea is definitely feasible but somewhat far fetched.

    rvman writes:

    Andrew Gelman writes:

    >The bigger reason is that I'm more interested in >positive findings (my own research and others').

    Even false positives? The purpose of the statistical policeman is to separate the true positives from the false ones.

    This preference for positive findings results in a potential problem - since only positive findings get published, reported significance statistics are skewed. If everyone uses a 95% significance level, then only stuff which reaches that threshold gets published. Wrong results will be significant 5% of the time. If 5% of ideas are right, and 95% wrong, then about half of all positive results will be true positives, half false positives. Potentially, half of all published material will be wrong. This result is even worse if people fudge the statistics, or if the value for 'good ideas' is lower.

