Big Data/Analytics Zone is brought to you in partnership with:

Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2577 posts at DZone. You can read more from them at their website. View Full User Profile

Cloudera Makes Waves in Hadoop-Land

  • submit to reddit
Along with their first commercial offering, Cloudera is unloading a bunch of new features into their Cloudera Distribution of Hadoop (CDH), which has now reached its third version.  In an interview with DZone, Cloudera representative John Kreisa said, "We're really expanding the definition of what a Hadoop-based platform is."  Eight new projects have been added to the distro that provide job scheduling, workflow sequencing, and the ability to control streaming data sources.  Cloudera is also committing several of their in-house projects to the Apache Hadoop community.  


The Cloudera Hadoop distro simplifies connectivity, execution and performance-related projects.  The new features include a Pig package based on the latest Apache release along with HBase and ZooKeeper pachages, which were previously only supported as part of the contrib repository.  These are now first class packages in CDH3.  Now the additional projects included in CDH3 include: Hive, HBase, Sqoop, Oozie, Flume, Avro, Zookeeper, Pig, and Cloudera Desktop.  These projects address deployment requirements in the area of data integration, workflow, scheduling, high-level languages, serialization UI, fast read / write, and RPC.


Named for its similarity to logging flumes, which also have a 'stream' of 'logs', Flume is a project that's been developed internally at Cloudera that allows streaming data to be managed and captured into Hadoop.  Until today, this software was not publicly navailable, but now it's part of the CDH and it's being committed to the Apache Hadoop project.  Kreisa says, "Even medium sized organizations can have hundreds of different data sources that they want to load into a Hadoop cluster."  Medium sized organizations won't need a large budget to harness Flume.

Cloudera Desktop

Cloudera Desktop has been an available package in the CDH, and now it is also being sent to Apache, where it will be renamed 'Hadoop User Environment' (HUE).  The Cloudera Desktop GUI lets you build UIs on Hadoop and includes tools that are collected into a desktop environment and delivered as a web app.  The tools simplify cluster administration and job development through file browser navigation, a job browser, a cluster health console, and a job designer.  The job designer specifically lets you create reusable MapReduce, Streaming, and Pig templates for commonly run jobs.  It also lets you submit MapReduce jobs to your Hadoop cluster right from your browser.

Cloudera Enterprise

Cloudera's first commercial offering will help companies put Hadoop into production.  The offering consists of CDH plus a super-set of management tools and support.  Here are the three main areas of advanced management that will be available:

  • Authorization Management + Provisioning:  Extra Security layer and direct LDAP integration
  • Integration, Configuration, and Monitoring:  Mass configuration, monitoring, event handling and change management through a central console.
  • Resource Management:  Monitor and regulate the usage of cluster resources

You can download the CDH 3 for free at


Vijay Bhaskar replied on Sun, 2013/10/13 - 4:42am

The latest cloudera version is 4.4. cloudera added impala which gives very good performance when compared with hive.

Hadoop Architecture

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.