January 5, 2010
The Economics of the Microsoft Case
January 5, 2010
The Economics of Illegal Drugs
January 5, 2010
Intellectuals and Society
January 5, 2010
Thinking Outside the House
January 5, 2010
FP2P Watch
January 5, 2010
The Books I Wish My Colleagues Would Write
January 4, 2010
Predictably Irrational or Predictably Rational?
January 4, 2010
My Sowell-mate on the Knowledge-Power Discrepancy
January 4, 2010
FP2P Watch


I was pretty skeptical of the notion of "forecasting" anything using Google data, but the Google article is not really talking about forecasting as such, but rather getting data on current happenings more quickly than you could get it by waiting on the official statistics. That seems a lot more plausible.
My first thought was to wonder how well Google searches really correlate with actual behavior. I've only skimmed the paper, but it looks like their model contains some adjustable parameters, and the phrase "cross-validation" doesn't appear anywhere in the text. That isn't a good sign, but one would have to read the procedure more carefully to untangle the subtleties.
My second thought was, how easily could these statistics be manipulated? If people come to rely on Google Trends and its models, then the operators of, say, a large bot-net could generate a bunch of bogus searches to create the appearance of a fake recovery in, say, retail sales. Stocks of retailers would presumably surge, creating an opportunity for the perpetrators to profit using short sales or judicious options purchases.
Google is justifiably proud of its mammoth data set, but I don't think they've given too much thought to quality controlling the data. QC can be a real headache even in cases where the data source is well understood. For example, in meteorology bad ASOS and radiosonde observations sometimes make it into models, and good ones are sometimes erroneously rejected. Both types of error have been known to compromise forecasts, and I would aver that human users are even less predictable than weather instruments. Therefore, I would conjecture that the QC problem will be a deal-breaker for using Google Trends data as a significant economic indicator.
Dear Dr. Kling,
Thanks for the excellent blog. Related to this post, you might want to have a look here:
http://messymatters.com/2009/03/21/the-future-is-yesterday/
Many of the series that correlate with Google trends data can often be forecast just as well or better using standard data and simple techniques.
Thank you.
My old team at Google :). (Although I worked on other stuff - auctions, competitiveness of the search market)
rpl - you are totally wrong about the QC. Keep in mind that Google Searches => Google Ads => Google revenue and billing Google's advertisers. False searches mean false billing for ads. As a company that cares about providing long-term value, Google goes to enormous effort to identify many kinds of fraudulent search ("search spam", "ad spam") and eliminate it from their records.
Sure, it is imperfect, but QC is *not* ignored. Lots of effort goes into QCing that data because advertisers are billed based on it.