WITH THE continuous growth of the Internet and social media, the term "Big Data” has become the new buzzword in business and technology, shoving over "data mining” and other similar terms associated with the use of information from huge databases.
The term "Big Data” was first coined in 1997 by National Aeronautics and Space Administration (NASA) researchers Michael Cox and David Ellsworth to describe the challenge of processing and analyzing vast amounts of information generated by supercomputers.
Although Big Data doesn’t pertain to any specific quantity, most tech industry players say it often refers to petabytes or 1,000 terabytes of data.
Research firm McKinsey Global Institute defined Big Data as any database "whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.”
Thus, for data to be considered Big Data, they must have volume, velocity and variety.
Jose Ramon G. Albert, a research fellow at the government think tank, Philippine Institute for Development Studies, compared Big Data and the traditional sources of official statistics in a research note titled "Big Data for measuring progress and development: Big insights or big issues?”
"There is undoubtedly a growing enthusiasm about this data revolution and its possibilities for making use of Big Data, especially for measuring and monitoring progress in societies,” Mr. Albert said, citing "Google Flu Trends” as an example.
In 2008, Google established a near real-time flu tracker called "Google Flu Trends” that monitored Google searches on the flu. Jeremy Ginsberg and other researchers, writing in the science journal Nature in 2009, reported that Google’s flu incidence estimate correlated strongly with the official statistics released by the US Centers for Disease Control and Prevention (CDC).
The article noted that what was astonishing was that the Google statistics on flu incidence were based on aggregates of searches related to flu with a delay of just one day, while CDC’S official statistics were based on administrative reports from hospitals and took a week to put together.
Mr. Albert noted that while official statisticians are taking note of this emerging alternative data source, "there is, however, some apprehension about Big Data since these are not tailor-made for statistical purposes and thus could yield inaccurate statistics.”
"Data sources in official statistics have been tried and tested mechanisms for ensuring credibility… Big Data, however, are like a tsunami of digital exhaust that can be a messy collage of data points collected for distinct purposes but whose accuracy is difficult to establish,” Mr. Albert explained.
Big Data are largely unstructured, unfiltered "data exhaust” from digital products such as online searches and social media, are unregulated, have high variety and velocity, and cost little or nothing. On the other hand, official statistics are structured and planned, regulated, based on high-volume primary data, and costly.
Nonetheless, Mr. Albert pointed out that the country has started to use Big Data in disaster risk management.
In June 2012, the government started a flagship project called Nationwide Operational Assessment of Hazards (NOAH), which involved the development of Hydromet sensors, which are state-of-the-art weather tracking equipment, and high-resolution geo-hazard maps.
Project NOAH could give government lead-time of about six hours or less to act and thus could minimize the damage to lives, property, and livelihood from natural disasters.
Mr. Albert, in his research note, pointed out that these high-velocity and high-volume data have helped national and local governments prepare better for disasters.
"There is evidence of how better access to information has saved lives. In 2011, Typhoon Sendong led to 676 deaths in Cagayan de Oro City, but a year later, a typhoon with similar strength (Pablo) had only one associated death reported,” he said.
While there is reason to be excited about Big Data, there are issues that need to be examined given that much of Big Data include personal data with precise, geo-location-based information.
"While users of technology routinely tick a box to consent to the collection and use of web-generated data and may decide to have some information put on public view, it is unclear whether they actually consent to having their data analyzed, especially if it can be to their disadvantage,” Mr. Albert said.
Still, he explained that while Big Data are here to stay, there is "a need to identify legal protocols and institutional arrangements to access Big Data holdings for development purposes.”
"The data revolution and the use of Big Data do not mean the end of official statistics, but the challenge is to explore how to make use of nontraditional data sources to complement traditional data sources,” Mr. Albert said.//