Big Data/Analytics Zone is brought to you in partnership with:

Rauf has posted 28 posts at DZone. You can read more from them at their website. View Full User Profile

Enterprise Job Scheduling for Big Data & Hadoop

02.14.2012
| 2512 views |
  • submit to reddit
Businesses of all sizes are looking beyond traditional business intelligence taking a more broader approach to BI that goes beyond the traditional data warehouse and operational database technologies of the past. With the explosion of social communication, mobile device data and many other forms of unstructured data coming into focus, businesses are now more interested than ever to ask questions about their data and their customers that they could not ask before.Hadoop type solutions lets businesses build out this new BI 2.0 type architecture and begin to leverage their data and operations in new ways in order to ask questions that they could not have imagined possible in the past. Hadoop analytics lets businesses ask questions and build reporting solution that effectively leverage massive (yet commodity) processing power and manipulate terabytes of data that where not practical for the average enterprise to do before.


Hadoop provides a broad stack of solutions from cpu/compute clustering, parallel programming, distributed data management, advanced ETL and NoSQL type data management....etc. Hadoop is also moving quickly to build more advanced resource management to allow more efficient job flow processing on larger clusters for the bigger deployments that may have hundreds or thousands of nodes and need to run many jobs concurrently.

Hadoop comes with a few internal capacity type schedulers for managing internal cluster load and resource management, but these are strictly for internal cluster capacity scheduling between nodes and are not functional or calendar based job scheduling tools. Vanilla Hadoop distributions do not include often ecesssary features needed by enterprises to manage and automate the full ecosystem and life-cycle of data processing typically needed by an enterprise to effectively support an end to end BI solution. In most cases an enterprise's IT group must build the necessary infrastructure to smoothly integrate Hadoop into their IT environment and avoid a lot of manual labor and impedance mismatches between their Hadoop operations and their traditional enterprise operations.

This is where JobServer, an enterprise job scheduler, comes into play. JobServer integrates with Hadoop at an enterprise IT level, letting analysts and IT administrators schedule and integrate their IT operations into the Hadoop stack. JobServer leverages a very open and flexible Java plugin APIto let Java developers integrate their customizations tightly into JobServer and into Hadoop. Often times what is needed is high level job and workflow automation in order to schedule ETL processing from operational data stores in order to pump data into your Hadoop stack and to schedule jobs to run on regular interval based on business rules and business needs.


JobServer provides the job automation and job scheduling needed to accomplish this, plus it offers key features such as audit-trails to track what jobs where run, when, and edited by whom for example. JobServer, for example, can be used to coordinate and orchastratge a number of Hadoop job flows together into a larger job flow and then take the output and pump it back out into your enterprise reporting systems and enterprise data warehouses. JobServer provides a number of GUI reporting features to let enterprise users from programmers and IT staff to track what is going on in your Hadoop and IT environment and to be alerted quickly of problems.

If you need to tame your Hadoop operations and provide automated and tight integration with your existing IT environment, applications and reporting solutions, give JobServer a look. It can be a great asset to help you run your Big Data operations more efficiently. Visit the JobServer product website for more details.


<a href="http://www.grandlogic.com>Contact Grand Logic</a> and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in house and cloud Hadoop deployments. Please contact Grand Logic for more information to see how our products can services can make your Hadoop deployment a success.
Published at DZone with permission of its author, Rauf Issa.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)