Big Data/Analytics Zone is brought to you in partnership with:

Alec is a Content Curator at DZone and lives in Raleigh, North Carolina. He is interested in Java and Android programming, and databases of all types. When he's not writing for the NoSQL and IoT Zones, you might find him playing bass guitar, writing short stories where nothing happens, or making stuff in Java. Alec is a DZone Zone Leader and has posted 579 posts at DZone. You can read more from them at their website. View Full User Profile

The Best of the Week (Nov. 1): Big Data Zone

11.10.2013
| 8936 views |
  • submit to reddit

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 1 to Nov. 7). Here they are, in order of popularity:

1. Hunk: A New Data Analytics Tool for Hadoop

Hadoop users might be interested in Hunk, an analytics tool recently released by Splunk that allows users to analyze and visualize data in Hadoop. Big data isn't worth much, after all, unless some sense can be made of it.

2. Apache Mesos: The Datacenter is the Computer

The data center is the computer. The pendulum is swinging. Traditional cloud and virtualization level resource management in the data center aren't good enough to manage the growing demands for computing services. The answer for this challenge are solutions such as Apache Mesos and YARN.

3. 4 Methods for Structured Big Data Computation

This article is an overall analysis of four methods to process structured big data. Every method has its unique advantages, and which one people choose will be determined by their project features.

4. Topic Modeling in Python and R: The Enron Email Corpus, Part 2

[Be sure to read part 1 first!]

After posting his analysis of the Enron email corpus, the author realized that the regex patterns he had set up to capture and filter out the cautionary/privacy messages at the bottoms of peoples emails were not working. Let’s have a look at his revised Python code for processing the corpus, and some new results.

5. How Can a Mac Mini Outperform a 1,636-Node Hadoop Cluster?

This recent article describes a hands-on performance comparison between GraphChi and a 1,636-node Hadoop cluster. The task set for both was to process a Twitter graph with 1.5 billion edges, and the result, surprisingly enough, was a significantly quicker processing time for GraphChi. How does that work?