Characteristics of Big Data

Doug Laney is the original creator of the 3V’s definition of Big Data – referring to volume, velocity or variety of data that is hard to handle with traditional data management tools and techniques. In August last year I proposed a better definition of Big Data as Data growing faster than Moore’s Law. Many others have talked about extending the 3V’s definition of big data, and one of the additions is to insist on a fourth V: “Value”. In my personal view this is somewhere between irrelevant and dangerous. Any data may or may not have value, and that value is highly context sensitive. If you want to know the weather tomorrow, then knowing the stock market closing price from 1897 is of no value. The beauty of big data is that while most of it may be irrelevant, the patterns that can emerge are of real interest and value. Furthermore, the value of big data may not become clear until long after it is created (only once we had collected uncountable tweets from the early years of Twitter did someone realize you might find information relevant to stock prices buried in the stream of “valueless” data).

D. Robinson posted a great article in December called Big Data- The 4 V's - The Simple Truth; Part 4 - Making Data Meaningful. This talks about the need for Veracity (is the data reliably recording what is going on) and the problems of Variability (where a system may record different values for the same physical activity on different occasions). However, even these are not defining characteristics of big data, but are interesting attributes of any data collections.

Instead, let me offer some other extensions to the 3V’s definition. You don’t need all of these to have big data, but the more you have, the more likely it is you are dealing with big data.

Characteristics of big data

1 comment:

cyberH said...

Richard, nice infograph. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quick and simple. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. More info at