We are surrounded by data. Every day we generate petabytes of photographs, messages, emails, comments and mobile tracks. Eric Schmidt, chairman of Google, noted that we create as much information every two days as we did from the dawn of civilization up until 2003. The data can be real-time, linked, bi-directional, geolocated and sometimes even semantic.
However, while we have access to all of this data, we are lacking in the mechanisms to use it effectively at scale. We’re often behind on reading or responding to emails, we just peruse through Twitter messages, and we’re lucky if we read our RSS feeds. What we need are tools to provide higher-level analysis of thi s information — to churn the data and glean information and knowledge so that we can make more informed and better decisions. And we need platforms that are as easy to use as those that let us create all this data in the first place.
Data-Driven Decisions
Analysis tools exist in many forms. Some let users ask questions while others simply monitor general data streams to infer interesting answers. Google’s search interface is an analysis of implicit clickthroughs and page links in an attempt to provide us with simple answers to our questions. Software such as Mathematica, MatLab or newer tools such as R and Hadoop MapReduce provide amazing capability to aggregate, collate and operate on data. But these analysis tools are highly specialized and require immense training and expertise to use.
So we’re at an interesting crossroads. The web is a platform where anyone can easily publish and consume data, and analysis has the possibility to utilize...