Big Data/Analytics Zone is brought to you in partnership with:

Alec is a Content Curator at DZone. He lives in Raleigh and spends his free time writing and programming. Alec is a DZone Zone Leader and has posted 531 posts at DZone. You can read more from them at their website. View Full User Profile

The Best of the Week (Nov. 22): Big Data Zone

12.01.2013
| 6166 views |
  • submit to reddit

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Nov. 22 to Nov. 28). Here they are, in order of popularity:

1. How Python Became the Language of Choice for Data Science

Nowadays, Python is probably the programming language of choice (besides R) for data scientists for prototyping, visualization, and running data analyses on small and medium sized data sets. And rightly so, I think, given the large number of available tools. However, it wasn’t always like this.

2. Integrating R with Cloudera Impala for Real-Time Queries on Hadoop

Impala uses Hadoop as a storage engine, but moves away from MapReduce algorithms toward distributed queries. Also, R can be integrated with Impala to provide fast, interactive queries running on top of Hadoop data sets. The data can then be further processed or visualized within R.

3. An Introduction to Machine Learning With R

This set of slides presents an introduction to machine learning with R. It covers the strong points of R as a language, the basic concepts and uses of machine learning, and provides an overview of each, complete with code samples in R and images of the visualized data.

4. Data News: What Every Programmer Should Know About Memory, and More

This installment of Arthur Charpentier's regular collection of data science-related links includes a free e-book on "Applied Epidemiology Using R," an argument that statistics are the least important part of data science, and what every programmer should know about memory.

5. Hadoop, MapReduce and Hive: How to Use Non-Java Languages, Such as R

This recent tutorial demonstrates how to use non-Java languages - R, in particular - to work with Hadoop data through MapReduce and Hive. Though the tutorial focuses on R, it is also meant to open doors for users working with other languages, such as Python, Ruby, and Linux commands or Shell scripts.