Last week, I attended a Hadoop Tutorial presented in Durham, NC by Sarah Sproehnle, the Director of Educational Services at Cloudera. The tutorial offered both an informative, high-level history of the software framework that has been generating a lot of buzz in surprisingly different industries. When I caught up with Sarah after the tutorial, she told me that one aspect of the design of this tutorial was to cut through a lot of the buzz surrounding Hadoop in order to provide a diverse audience with "what Hadoop is really about" and show how they can use it. I still had a few more questions related to Hadoop, Big Data, and how Hadoop appeals to the Java community. Here's what she had to say.DZone: How do you think that Hadoop (and Big Data processing / analytics) will impact the overall developer space over the next few years?
Sarah Sproehnle: We're seeing a tremendous investment in developers moving from traditional back end database development to the Hadoop space. Processes that used to be coded in PL/SQL or that relied on large in-memory state are now being written using Hadoop for data processing and HBase for real-time applications. A lot of applications that were built on top of databases, where developers struggled to fit non-relational paradigms into relational stores, are now being built more quickly and with access to data at any scale.DZone: You covered some common implementations of Hadoop in your presentation - which of these do you think is the most innovative or interesting?
Sarah Sproehnle: A lot of people use Hadoop to do complex data processing such as billing mediation and transaction reconciliation. Similarly, Hadoop is a popular tool for recommendation engines and predictive modeling. At the forefront through is people building real-time interactive applications on top of HBase. These are driving both data serving (such as user profiles or POIs) and as the basis for incremental analytics where business can monitor how their systems are behaving in real-time.DZone: What are some cool tools (or uses) of Hadoop in development or coming up that Java developers should be aware of?
We're seeing a lot of interest in higher level libraries that make Hadoop much more accessible to Java developers. For example Crunch (here
and is a FlumeJava inspired library. We've been hearing some very positive feedback from Java developers who want a lot of the mechanics of MapReduce taken care of but don't want to write in Hive or Pig.