Study: Google Flu Trend faulty
HOUSTON — As part of research funded by a grant from the National Science Foundation, a recent study found that Google Flu Trend may not be all it's cracked up to be. The study, published in the journal Science, also suggested that aggregated big data tools may have their faults.
"Google Flu Trend is an amazing piece of engineering and a very useful tool, but it also illustrates where 'big data' analysis can go wrong," said Ryan Kennedy, University of Houston political science professor. He and co-researchers David Lazer (Northeastern University/Harvard University), Alex Vespignani (Northeastern University) and Gary King (Harvard University) detailed new research about the problematic use of big data from such aggregators as Google.
Even with modifications to the GFT over many years, the tool that set out to improve response to flu outbreaks has overestimated peak flu cases in the United States over the past two years, the researchers noted.
"Many sources of 'big data' come from private companies, who, just like Google, are constantly changing their service in accordance with their business model," Kennedy said. "We need a better understanding of how this affects the data they produce; otherwise we run the risk of drawing incorrect conclusions and adopting improper policies."
GFT overestimated the prevalence of flu in the 2012-2013 season, as well as the actual levels of flu in 2011-2012, by more than 50%, according to the research. Additionally, from August 2011 to September 2013, GFT over-predicted the prevalence of flu in 100 out of 108 weeks.
The team also questioned data collections from such platforms as Twitter and Facebook (like polling trends and market popularity), as campaigns and companies can manipulate these platforms to ensure their products are trending.
Still, the article contends there is room for data from the Googles and Twitters of the Internet to combine with more traditional methodologies, in the name of creating a deeper and more accurate understanding of human behavior.
"Our analysis of Google Flu demonstrates that the best results come from combining information and techniques from both sources," Kennedy said. "Instead of talking about a 'big data revolution,' we should be discussing an 'all data revolution,' where new technologies and techniques allow us to do more and better analysis of all kinds."