ARNOLD KLING
August 14, 2011
The Top Political Contributors
August 11, 2011
Gender and the New Commanding Heights
August 11, 2011
Jamie Galbraith Makes an Assumption
August 11, 2011
Macroeconometrics: The Science of Hubris
August 10, 2011
Real and Nominal Bond Yields
BRYAN CAPLAN
August 14, 2011
The Effect of Thumb Sucking on Income
August 12, 2011
The Voice of Cold, Hard Truth to All Would-Be Educators
August 12, 2011
Ability, Morality, and Prosperity: A Paper and a Report
August 11, 2011
The Theory of Time and Frittering
August 10, 2011
Male Variance and the Remnants of the Gender Gap
DAVID HENDERSON
August 9, 2011
Hayek in "Unbroken", Part Two
August 8, 2011
Hayek in "Unbroken"
August 5, 2011
James Bovard on the Peace Corps
August 4, 2011
Summers Way Off on FDR and 1941
August 3, 2011
The "Amazon" Tax


I was pretty skeptical of the notion of "forecasting" anything using Google data, but the Google article is not really talking about forecasting as such, but rather getting data on current happenings more quickly than you could get it by waiting on the official statistics. That seems a lot more plausible.
My first thought was to wonder how well Google searches really correlate with actual behavior. I've only skimmed the paper, but it looks like their model contains some adjustable parameters, and the phrase "cross-validation" doesn't appear anywhere in the text. That isn't a good sign, but one would have to read the procedure more carefully to untangle the subtleties.
My second thought was, how easily could these statistics be manipulated? If people come to rely on Google Trends and its models, then the operators of, say, a large bot-net could generate a bunch of bogus searches to create the appearance of a fake recovery in, say, retail sales. Stocks of retailers would presumably surge, creating an opportunity for the perpetrators to profit using short sales or judicious options purchases.
Google is justifiably proud of its mammoth data set, but I don't think they've given too much thought to quality controlling the data. QC can be a real headache even in cases where the data source is well understood. For example, in meteorology bad ASOS and radiosonde observations sometimes make it into models, and good ones are sometimes erroneously rejected. Both types of error have been known to compromise forecasts, and I would aver that human users are even less predictable than weather instruments. Therefore, I would conjecture that the QC problem will be a deal-breaker for using Google Trends data as a significant economic indicator.
Dear Dr. Kling,
Thanks for the excellent blog. Related to this post, you might want to have a look here:
http://messymatters.com/2009/03/21/the-future-is-yesterday/
Many of the series that correlate with Google trends data can often be forecast just as well or better using standard data and simple techniques.
Thank you.
My old team at Google :). (Although I worked on other stuff - auctions, competitiveness of the search market)
rpl - you are totally wrong about the QC. Keep in mind that Google Searches => Google Ads => Google revenue and billing Google's advertisers. False searches mean false billing for ads. As a company that cares about providing long-term value, Google goes to enormous effort to identify many kinds of fraudulent search ("search spam", "ad spam") and eliminate it from their records.
Sure, it is imperfect, but QC is *not* ignored. Lots of effort goes into QCing that data because advertisers are billed based on it.