Forget YOLO: Why ‘Big Data’ Should Be The Word Of The Year
by Geoff Nunberg
December 20, 2012
Whatever the sticklers say, data isn’t a plural noun like “pebbles.” It’s a mass noun like “dust.”
It’s only when all those little chunks are aggregated that they turn into Big Data; then the software called analytics can scour it for patterns
You idly click on an ad for a pair of red sneakers one morning, and they’ll stalk you to the end of your days.
It makes me nostalgic for the age when cyberspace promised a liberating anonymity.
I think of that famous 1993 New Yorker cartoon by Peter Steiner: “On the Internet, nobody knows you’re a dog.”
Now it’s more like, “On the Internet, everybody knows what brand of dog food you buy.”
In some circles, Big Data has spawned a cult of infallibility — a vision of prediction obviating explanation and math trumping science.
In a manifesto in Wired, Chris Anderson wrote, “With enough data, the numbers speak for themselves.”
The trouble is that you can’t always believe what they’re saying.
When you’ve got algorithms weighing hundreds of factors over a huge data set, you can’t really know why they come to a particular decision or whether it really makes sense.
When I was working with systems like these some years ago at the Xerox Palo Alto Research Center, we used to talk about a 95 percent solution.
So what if Amazon’s algorithms conclude that I’d be interested in Celine Dion’s greatest hits, as long as they get 19 out of 20 recommendations right?
But those odds are less reassuring when the algorithms are selecting candidates for the no-fly list.
I don’t know if the phrase Big Data itself will be around 20 years from now, when we’ll probably be measuring information in humongobytes.
People will be amused to recall that a couple of exabytes were once considered big data, the way we laugh to think of a time when $15,000 a year sounded like big money.
But 19 out of 20 is probably still going to be a good hit rate for those algorithms, and people will still feel the need to sort out the causes from the correlations — still asking the old question, what are patterns for?