The Best of the Week (Nov. 1): Big Data Zone
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 1 to Nov. 7). Here they are, in order of popularity:
Hadoop users might be interested in Hunk, an analytics tool recently released by Splunk that allows users to analyze and visualize data in Hadoop. Big data isn't worth much, after all, unless some sense can be made of it.
The data center is the computer. The pendulum is swinging. Traditional cloud and virtualization level resource management in the data center aren't good enough to manage the growing demands for computing services. The answer for this challenge are solutions such as Apache Mesos and YARN.
This article is an overall analysis of four methods to process structured big data. Every method has its unique advantages, and which one people choose will be determined by their project features.
[Be sure to read part 1 first!]
After posting his analysis of the Enron email corpus, the author realized that the regex patterns he had set up to capture and filter out the cautionary/privacy messages at the bottoms of peoples emails were not working. Let’s have a look at his revised Python code for processing the corpus, and some new results.
This recent article describes a hands-on performance comparison between GraphChi and a 1,636-node Hadoop cluster. The task set for both was to process a Twitter graph with 1.5 billion edges, and the result, surprisingly enough, was a significantly quicker processing time for GraphChi. How does that work?