What is Big Data?

This is the first post of a series on Big Data. Watch for more!

The success of information technology until now has been built on our ability to comprehensively record transactions, events, or changes of state. We have made great use of this transactional data, optimizing inventory, streamlining processes, automating activity. Now, however, we can track and record the behavior that leads up to, or follows from, these transactions. People use computers, phones, the internet to do more and more. Each click and call is recorded and makes up a web of behavioral patterns. Computers are used for designing, making, selling, buying, trading – each step along the way is recorded and makes up a web of behavioral patterns. Markets are now computerized and each bid, each offer, each trade is recorded. Each item of news about a market, a company, a financial asset is also recorded and cross-correlated to the market activity. Systems are observed in finer detail, and instrumented in real time, not just when a transaction occurs or a state changes. By analyzing all this behavior we hope to be able to diagnose, to predict, to intervene; we hope to sell more, or price better, to make more efficiently, to diagnose disease and design treatments. We want this behavioral data because it promises to unlock value commensurate with its volume, velocity and variety, and this behavioral data, is, um, big.

This data is so big, in fact, that it is causing problems in the technology world. That’s why it has this name: big data. You might not have heard this term until now, but now you have read it here, expect to read or hear about it three more times in the next few days. What exactly is big data? All the definitions seem based on a notion that the problems of size make it noteworthy. Wikipedia offers “In information technology, big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools.”

This doesn’t tell us why we have so much data or really why we should care. That is why I started this series on Big Data with this observation: we used to track transactions, and now we are tracking behavior.

Make no mistake, Big Data is about behavior – of people, systems, markets and machines.

At some stage, technology companies will solve the problems that make this data hard to ingest, handle, process, analyze, understand. It will no longer be big data, because it won’t be too big to manage.

However, without any doubt, behavioral data is here to stay!


Doug Laney said...

Great post. Cool to see the industry finally adopting the "3V"s of big data over 11 years after Gartner first published them. For future reference, and a copy of the original article I wrote in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. --Doug Laney, VP Research, Gartner, @doug_laney

stefano said...

Nice post, Richard. I agree with you: one distinguishing characteristic of Big Data in common usage is that it describes a broader swath of 'behavioral' data than ever before, which directly relates to both the size of the datasets and the implications of analytics.

You also mention 'intervention' in real-time. IMO THIS is the big challenge of Big Data: it's being used to discover correlations and predictive patterns on GROUPS of individuals better than ever before, but it can be used at the same time to predict INDIVIDUAL behavior and intervene in real time.

Sandy Pentland of MIT says, "Your phone company knows when you are about to buy a Starbucks before you do!" And he means YOU, not someone like you.