Big Data/Analytics Zone is brought to you in partnership with:

Mitch Pronschinske is the Lead Research Analyst at DZone. Researching and compiling content for DZone's research guides is his primary job. He likes to make his own ringtones, watches cartoons/anime, enjoys card and board games, and plays the accordion. Mitch is a DZone Zone Leader and has posted 2579 posts at DZone. You can read more from them at their website. View Full User Profile

Grid Engine an Early Supporter of Hadoop Apps

01.14.2010
| 9632 views |
  • submit to reddit
Sun Microsystem's Grid Engine was recently updated with plenty of new features, including industry first they say.  Grid Engine 6.2 update 5 (SGE 6.2u5) just became the first workload manager with direct support for Apache Hadoop applications, said Dan Templeton, a SGE engineer.  This feature allows Hadoop applications to be submitted to an SGE cluster like any other parallel application.

Grid Engine is used for distributed resource management.  Hadoop is a Java framework for distributed applications.  It contains the Hadoop Distributed File System, a distributed and fault-tolerant file system, and MapReduce, which is an application parallelization and execution environment.  The company Cloudera, who also puts out a Hadoop distribution, might disagree with Templeton's claim about SGE being the first workload manager to support Hadoop apps.   

The ability to submit Hadoop jobs to the Grid Engine grid is a pretty neat trick.  SGE is aware of the Hadoop Distributed File System and recognizes Hadoop jobtrackers and tasktrackers.  Grid Engine is able to route Hadoop jobs to the nodes where the job data already exists.  This is a whole lot better than having to set up a dedicated Hadoop cluster where you have to move the data over to those nodes.

Grid Engine Diagram


SGE 6.2u5 contains a number of other new features.  If grid applications require certain features such as multiple cores, high clock speeds, large caches, or high memory to run well, the job scheduler can allocate jobs to specific types of processors and server configurations.  For example, Grid Engine will run cache-heavy applications in a job that is allocated on four cores across four server sockets instead of four cores sharing a single socket.  Administrators can specify what hardware resources they need with the new core binding feature.

"Slotwise preemption" is another useful new feature that adds more sophisticated resource allocating rules.  Instead of simple rules like 'job queue A is subordinate to B', you can now limit how many jobs are running across specific queues or indicate which queues are more important when there is a resource conflict.  The SGE 6.2u5 setup is also easier to integrate with Amazon's EC2 and power down unused server nodes in a grid.

Your free download of Grid Engine is available here.