WHAT IS BIG DATA?
While the idea of “big data” is well-entrenched in places like phone companies and stock exchanges, it is only beginning to gain traction in other sectors –especially the public and non-profit sectors. Where data comes from (sensors, social media, transaction records, etc.), what forms it takes (text, audio, video, sensor data, etc.), and how it is used (predictive analytics, data management, real-time data, etc.) all contribute to different understandings and uses of “big data”. What is sure is that technology plays an enormous role in collecting and analyzing the data, and as technologies develop, so will the definitions of “big data”.
IBM's Institute for Business Value issued a report about the real-world uses of big data in 2012 which points out that companies tend to use big data as a predictive tool for improving a customer's experience. This could mean Facebook ads, or product improvements in car-manufacturing, where driving data can be analyzed for insights. But the use of big data to target specifically the end-consumers needn't be –in fact, shouldn't be-- solely a private endeavor. The citizen is an end-consumer too, and big data can produce profound benefits in the public sector. There are many avenues for the civic application of big data, and so Tom Lee, director of Sunlight Labs, points out in his article for the Skoll World Forum debate about the social impacts of big data that: “participation in digital culture.. means volunteering as an experimental subject”.
WHY USE BIG DATA?
Ideally, big data makes it possible to take a “screenshot” of the things we know, either over time or at a given time, in one place or in many. As we get more data, and it becomes more varied, so too will that screenshot contain more information. In a networked world, this means we can quickly get a picture of what Derrick Harris of GigaOM calls “a latent stream of broad societal efficiencies”. And inefficiencies –this way, we can spot where effort is lacking, or where effort is needlessly duplicated.
With sensors in smartphones, computers, cars, homes, and even toothbrushes, we are quantifying more and more aspects of our lives. The collected data can then be subject to a host of treatments, such as remixing, mashing, layering, or linking. Sensor data from individual cars might be layered with city traffic data to show drivers where best to park, which route to take, or where to get gas. What is more, sensor data from cars can be relayed to manufacturers so that car troubles might be prevented before they happen. By combining and recombining different kinds of data, we might begin to see correlations between things we weren't once aware of. Big data showed, for example, that the stabilization of vital signs in prematurely born babies tends to be followed by infections. But this is a correlation, and not a causation –this is a particularity of big data analytics, where the “why” is not always obvious, but the “what” is there.
By using context, and by creating links and layers between data sets, we can ask meaningful questions of big data. Now, these links often have to be made by real people, but links between disparate data sets are also increasingly computerized through algorithms. This kind of work is what is building the semantic web, which uses meta-data describing the data, enabling computers to comprehend what data is about and its relationships to other data. Computerized or not, big data has revolutionary potential in how we make decisions in general, and, in turn, in how we deal with various policy-issues, especially as we move towards more collaborative systems of governance.
BIG DATA AND GOVERNANCE
Government is central to the collection and application of big data, first and foremost because governments hold massive archives of public data, but also because government can frame the social or economic policy-issues to which big data can be applied. The Obama administration, through new tools such as data.gov, the Federal IT Dashboard, and with initiatives like the Open Government Directive, has made clear the importance of collaborative governance –inviting citizens to use open information and data to solve problems for themselves and for each other. Last year, the administration announced the National Big Data Research and Development Initiative, which coordinates data across various federal agencies and departments. And this year, the White House Office of Science and Technology Policy (OSTP) and the National Science Foundation will host an event to highlight high-impact collaborations using big data (submit your own projects to BIGDATA@nsf.gov by April 22nd).
Big data collaborations between the public and private sectors can spur innovation in all sorts of areas, such as healthcare, public safety, job creation, education, transportation, environment, energy, etc. Speaking at the Government 2.0. Summit in 2009, Tim O'Reilly pointed out open data and collaborative platforms could revolutionize governance. He argued that government should act as a platform, partner, or launching pad for creating collective value through collaborative processes. For example, the U.S. Department of Defense's investment in GPS systems spurred countless other applications using that technology. In the same vein, government should now make efforts to open as much public data as is possible, where data itself is the utility to be harnessed by innovators. Proprietary data ownership policies should be re-worked for openness, reflecting the progression from government 1.0. to 2.0., and data itself should be released for the purpose of collaborative analysis and work –moving to government 3.0. In Information for Impact: Liberating Nonprofit Sector Data, Prof. Beth Noveck describes the “liberation” of Internal Revenue Service data on non-profits through the conversion of Form 990 tax data (detailing the financial, governance and organizational structure of America’s tax-exempt institutions) into machine-readable and easily sharable formats. Making this data free, open, and analyzable invites innovation into the non-profit sector. In his blog post on Wikinomics.com, Nick Vitalari argues that thinking about “less” or “more” government is now outdated, and suggests that collaboration around big data implies a governance structure in which citizens play a greater role in the development of policy.
USING BIG DATA
In Big Data as Society's Watchdog, Tom Lee makes the point that complicated mathematics, statistics, and computation may be dazzling, but the insights they deliver may be extremely subtle. On the other hand, basic human input when looking at disparate data sets may take little time, but lead to enormous discoveries of previously unrecognized connections. Big data analysis is relatively difficult in the public sector, where data tends to be “messier” than that which comes from the private sector, and this is why real people linking data is so important. For example, disparate data sets from social media, NGOs, universities, the UN, and commercial satellites are being layered by people to gain a better understanding of the humanitarian crisis happening between Sudan and South Sudan.
Sandy Pentland, MIT's “big data guy”, has a vision of a big data-world which he says is far more creative than the world Orwell painted in 1984. Big data, he argues in Reinventing Society in the Wake of Big Data, is information about real behavior, not about beliefs. This is the kind of granular information that you “leave behind like breadcrumbs” --transaction records, phone sensor data, location data, etc. With this kind of data researchers can find connections between how people behave and how the world works, and portray these relationships accurately. Pentland goes on: “Adam Smith and Karl Marx were wrong, or at least had only half the answers. Why? Because they talked about markets and classes, but those are aggregates”. With big data, a researcher can find the micro-patterns and individual records which cause things like political revolution and financial bubbles. This kind of fine-grain information allows us to really understand how our society operates, and, with those understandings, build better systems.
If the healthcare sector in the United States were to leverage big data, it could create $300 billion in value each year, two-thirds of which comes from reducing healthcare expenditure by 8 percent, writes James Mayika for the McKinsey Global Institute in Big Data: The Next Frontier for Innovation, Competition, and Productivity. This is the result of information transparency, information accuracy, preciseness in tailoring healthcare to individual patients, computerized analytics, and proactive preventative maintenance. But big data initiatives do not happen on their own, nor do they intrinsically benefit society. People must frame the issues and questions and draw connections between the data for big data to be meaningful. The data itself must be open, free, and analyzable. We must incentivize the opening of data, which could be done by offering subsidies or tax breaks to companies and NGOs which release their data. We must also incentivize the analysis of that data for the public good, and can do so through, for example, prize-backed challenges.
COLLIBERATION OF AND THROUGH BIG DATA
The potential of big data can be harnessed for enormous public good, but we will have to actively apply ourselves to this pursuit. On its own, technology guarantees nothing, only people do. This means, for example, that data protection laws which vary from State to State should be consolidated so that data from various places can be analyzed together –the whole point of big data. Connections have to be made. By collecting evidence, we can design accurate and effective policies. Big data allows for a fundamental paradigm-shift in policy-making: moving from what is best for a population, to what is best for a specific individual. In healthcare, for example, medication and treatment can be tailored for specific patients (see The Creative Destruction of Medicine by Eric Topol). In education, people might be able to better choose where best to go and learn (see, for example, the US Dept. of Education's College Affordability and Transparency Center). People can pick phone-plans better suited to their actual calling behaviors (see studies by the Citizens Utility Board). And entire communities can make use of big data for improvement –data collected from libraries, schools, businesses, people, etc., which, when made open, provides the utility for citizens to self-improve, or even compete with other cities (see The Quantified Community, by Esther Dyson, for the former point, and for the latter see the Smaller Cities Unite! project which partners Providence, R.I., and Copenhagen in an ongoing ideas-exchange).
Big data is still emerging, especially in the public and non-profit sectors. And yet already we see its implications –government policies which decentralize information; expanded opportunities for collaboration, where citizens can fix not only their own problems but also others' using evidence; increased intermixing between public and private data; streamlining public vs private capabilities for maximum effectiveness in serving the common good. Big data is a key feature of our technologically networked society, and makes collaborative governance not only possible but necessary, because, as the old adage goes, with great power comes great responsibility.