What is Big Data?

This is a topic we have explored several times.

20130911-012258.jpgIn Why the 3V’s Are Not Sufficient to Describe Big Data, Mark van Rijmenam describes what has become the ‘classic’ definition of “Big Data” as put forth by Doug Laney in 2001. Laney defined big data as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). These three dimensions (Volume, Velocity and Variety) have since taken on the name of “the 3V’s”.

After establishing the bona fides of the 3V’s, van Rijmenam goes on to say, “I do think that big data can be better explained by adding a few more V’s”. The other V’s he proposes are: Veracity, Variability, Visualization and Value.

imageIn an InformationWeek article by Seth Grimes titled Big Data: Avoid Wanna-V Confusion, a contrary argument is made. Grimes counters van Rijmenam by saying that the original 3V’s, while not perfect, work well. Grimes goes on to say that it is his opinion that the drive behind ‘wanna-V’ is rooted in causing division and spin.

Big-Data1

A few months ago, I wrote in an article titled In Search of a Definition for Big Data that the implied demand for an arbitrary size specification when creating a definition of “Big Data” makes the definition obsolete the moment it is formulated. This article went on to review an academic paper published by Jonathan Stuart Ward and Adam Barker of the University of St. Andrews in Scotland. Their paper surveyed the definitions of “Big Data” from a number of industry relevant and significant organizations such as Garter (3V’s), Intel, Microsoft and Oracle, leading Ward and Barker to offer their own composite definition of “Big Data”:

“Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

I ended that article by sharing my personal definition of “Big Data”:

Big Data is that data, which because of its complexity, or its size, or the speed it is being generated exceeds the capability of current conventional methods and systems to extract practical use and value.

This is the definition we’ll go with from here on forward– at least until a better one comes to mind.