Big Data/Analytics Zone is brought to you in partnership with:

Bootstrap Marketing is a marketing agency focused on big data. Our principals each have 30+ years experience in marketing the solutions and technologies that make up big data – data warehouses, integration, business intelligence – and we understand this market inside and out. We are uniquely positioned tohelp big data companies sharpen their positioning or help companies in adjacent markets develop their big data value proposition and go to market. We work with high-tech companies of all sizes from SAP and Lenovo to angel-funded start-ups like Treasure Data and Metric Insights, and we understand the marketing and business challenges that go along with each stage of a company’s life-cycle. Our approach to doing business is agile, results focused, open and honest. We strive to live up to our company tagline: Marketing. Done Smart. Done Fast. Bootstrap is a DZone MVB and is not an employee of DZone and has posted 16 posts at DZone. You can read more from them at their website. View Full User Profile

Will Hadoop Replace the Data Warehouse?

  • submit to reddit

In the early days of data warehousing, there was a raging debate between two architectural approaches.  There was a camp that advocated Ralph Kimball’s federated data mart architecture, and a camp that advocated Bill Inmon’s enterprise data warehouse architecture.

The old “Kimbalite” vs “Inmonite” discussions of the 1990’s are reminiscent of a similar discussion going on today about the relative merits and promise of Hadoop versus conventional data warehouses built on relational databases.  And I suspect the issue will get resolved in a similar fashion.  People will get tired of discussing it, and both architectures will co-exist in perfect harmony.  Each will find its’ appropriate place in the corporate IT landscape.

Will Hadoop Replace the Data Warehouse?There are compelling arguments on each side of the question.  Hadoop’s free open source distributions run on low cost commodity hardware, and provide virtually unlimited storage of structured and unstructured data.  However, few organizations have stable, production- ready Hadoop deployments.  And the tools and technologies currently available for accessing and analyzing Hadoop data are in early stages of maturity.  There are issues associated with query performance, the ability to perform real time analytics, and the preference of business analysts and developers to leverage existing SQL skills.

In spite of these to-be-expected early stage challenges, I am coming across some real world use cases for Hadoop-based analytics.  At a recent Silicon Valley Forum on Big Data, Pandora’s director of software engineering explained how they have migrated their relational data warehouse to an analytic infrastructure built on Hadoop, using Tableau as the front end to Hive for visualization and analysis.

Data warehouses represent the established technology, and they aren’t likely to go away.  Nearly all medium to large scale enterprises have data warehouses and marts in place that took years to build, and they are delivering unquestioned business value.  The old axiom “if it ain’t broke, don’t fix it” is hard to argue with.  However, data warehouses are not designed to accommodate the increasing volumes of unstructured data from web logs, social media, mobile devices, sensors, medical equipment, industrial machines, and other sources.  And there are both economic and performance limitations on the amount of data that can be stored and accessed.

The current industry debate about the relative merits of Hadoop and data warehouses is as lively as the data warehouse architecture debates of the 90’s, but perhaps a bit less controversial and passionate.  Co-existence seems to be the prevailing sentiment among most practitioners, as well as the vendors of both Hadoop distributions and traditional data warehousing technologies.  Cloudera, Hortonworks, MapR, and  more recent Hadoop distro vendors ranging from Intel to WanDisco are promoting side-by-side use case scenarios, while IBM, Oracle, and Teradata are incorporating Hadoop into their core offerings.

So what’s it going to take to ignite more controversy and passion into the debate?  In my view, new innovations that make Hadoop data more accessible, more usable, and more relevant to business users will obfuscate the distinctions between Hadoop and the traditional data warehouse.  As the lines blur, the debate will intensify.  Those innovations are coming to market at a fast and furious pace, forcing organizations to make architectural decisions that will fundamentally determine how effectively they can exploit Big Data.  More on that in the third and final installment of this blog series. And make sure to read the first blog of this series, “Differentiation Across the Apache Hadoop Distribution Vendor Landscape.” 

Published at DZone with permission of Bootstrap Marketing , author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)