My friend Judah Levine sent me a link to this great article in InfoWorld “Big data's pitfall: Answers that are clear, compelling, and wrong”.
This reminded me of an article which I really enjoyed from The Atlantic “The Data Vigilante” which is about small data as much as big, but clearly is relevant for the big data world.
All this is to say that big data amplifies the problems of garbage-in, garbage-out, but introduces other more complex problems too. Big data, almost by definition, requires statistical analysis (it’s too big to just look at). How many of us, however, really know enough statistics to know the right analysis to conduct. The chart below from the Psychology Department web page at Muhlenberg College shows some simple examples.
If you read about big data proving this, that, or the other result, can you get a feel for whether any of these problems might be obfuscating the truth?