The Best of the Week: Big Data Zone
How do you check if your pizza’s done? You look at it. But what if
you’re mass producing pizza? In "Multivariate Bayesian cognitive
modeling for unsupervised quality control of baked pizzas," the authors
propose using computers specifically trained to look at a pizza and say,
“yup, perfect! Done.”
While helping a MongoDB user with a sharding issue - his chunks
weren't splitting - the author learned an important lesson about big
data and tactfulness.
Hadoop 2.0 is here, and with it come some big changes. The most notable, as detailed by a recent article from InfoWorld, is MapReduce 2.0, which is now incorporated into a larger system called YARN (Yet Another Resource Negotiator). Take a look and see what may be in store for Big Data.
The HyperLogLog counter is a probabilistic data structure that estimates the count of unique elements in a list. What if we created a HyperLogLog counter for each Reddit user, and for every upvote, updated the corresponding HyperLogLog counter with the voter's id. Given this setup, here’s how we detect the voting ring.