Big Data/Analytics Zone is brought to you in partnership with:

Gerrit has posted 2 posts at DZone. View Full User Profile

BigData Workflows Made Easy -> Glue

  • submit to reddit

Glue is a job execution engine, written in Java and Groovy. 

Workflows are written in Groovy DSL / Jython / Clojure / JRuby and use pre-developed modules to interact with external resources e.g. DBs, Hadoop, Netezza, FTP etc.

Glue is not XML, and is not a BI tool, but rather a tool that allows programmers to write workflows in a production environment using any of the supported languages. 

The nicest thing about glue is its modules that allows you to interact with DBs Hadoop Clusters etc using tested methods and which can be setup once and re-used in each workflow, this abstracts the configuration away from the workflows and saves tons of time spent debugging.

Another cool feature is the ability to run data-driven workflows from hadoop, i.e. you can register N workflows to a HDFS directory path and have those workflows run automatically as data arrives in that directory.

Published at DZone with permission of its author, Gerrit Jansen Van Vuuren.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)