BRYAN CAPLAN
May 7, 2013
Keynesian Bets: What's Out There
May 6, 2013
Keynesian Bets Bleg
May 6, 2013
The Pyramid of Macroeconomic Insight and Virtue
May 2, 2013
A Natalist Provision
May 1, 2013
I Was a Teenage Misanthrope
DAVID HENDERSON
May 5, 2013
John Thacker on Vaccinations and the Sequester
May 3, 2013
Chef Rudy's Virtues Project
May 2, 2013
My take on Reinhart and Rogoff
May 1, 2013
Medicare Kills a Program


I was pretty skeptical of the notion of "forecasting" anything using Google data, but the Google article is not really talking about forecasting as such, but rather getting data on current happenings more quickly than you could get it by waiting on the official statistics. That seems a lot more plausible.
My first thought was to wonder how well Google searches really correlate with actual behavior. I've only skimmed the paper, but it looks like their model contains some adjustable parameters, and the phrase "cross-validation" doesn't appear anywhere in the text. That isn't a good sign, but one would have to read the procedure more carefully to untangle the subtleties.
My second thought was, how easily could these statistics be manipulated? If people come to rely on Google Trends and its models, then the operators of, say, a large bot-net could generate a bunch of bogus searches to create the appearance of a fake recovery in, say, retail sales. Stocks of retailers would presumably surge, creating an opportunity for the perpetrators to profit using short sales or judicious options purchases.
Google is justifiably proud of its mammoth data set, but I don't think they've given too much thought to quality controlling the data. QC can be a real headache even in cases where the data source is well understood. For example, in meteorology bad ASOS and radiosonde observations sometimes make it into models, and good ones are sometimes erroneously rejected. Both types of error have been known to compromise forecasts, and I would aver that human users are even less predictable than weather instruments. Therefore, I would conjecture that the QC problem will be a deal-breaker for using Google Trends data as a significant economic indicator.
Dear Dr. Kling,
Thanks for the excellent blog. Related to this post, you might want to have a look here:
http://messymatters.com/2009/03/21/the-future-is-yesterday/
Many of the series that correlate with Google trends data can often be forecast just as well or better using standard data and simple techniques.
Thank you.
My old team at Google :). (Although I worked on other stuff - auctions, competitiveness of the search market)
rpl - you are totally wrong about the QC. Keep in mind that Google Searches => Google Ads => Google revenue and billing Google's advertisers. False searches mean false billing for ads. As a company that cares about providing long-term value, Google goes to enormous effort to identify many kinds of fraudulent search ("search spam", "ad spam") and eliminate it from their records.
Sure, it is imperfect, but QC is *not* ignored. Lots of effort goes into QCing that data because advertisers are billed based on it.