7 Tutorials on Big Data in the Cloud
My Big Data in the Cloud article for Visual Studio Magazine’s July 2012 issue asserts “Microsoft has cooked up a feast of value-added big data cloud apps featuring Apache Hadoop, MapReduce, Hive and Pig, as well as free apps and utilities for numerical analysis, publishing data sets, data encryption, and uploading files to SQL Azure and blobs.” Here’s the introduction:
Competition is heating up for Platform as a Service (PaaS) providers such as Microsoft Windows Azure, Google App Engine, VMware Cloud Foundry and Salesforce.com Heroku, but cutting compute and storage charges no longer increases PaaS market share. So traditional Infrastructure as a Service (IaaS) vendors, led by Amazon Web Services (AWS) LLC, are encroaching on PaaS providers by adding new features to abstract cloud computing functions that formerly required provisioning by users. For example, AWS introduced Elastic MapReduce (EMR) with Apache Hive for big data analytics in April 2009. In October 2009, Amazon added a Relational Database Services (RDS) beta to its bag of cloud tricks to compete with SQL Azure.
Note: My The battle for cloud services: Microsoft vs. Amazon article of 6/18/2012 for TechTarget’s SearchCloudApplications.com site compares the feature of the two cloud giants’ approaches to implementing Hadoop, MapReduce and Hive in Windows Azure and Amazon Web Services.
Microsoft finally countered with a multipronged Apache Hadoop on Windows Azure preview in December 2011, aided by Hadoop consultants from Hortonworks Inc., a Yahoo! Inc. spin-off. Microsoft also intends to enter the highly competitive IaaS market; a breakout session at the Microsoft Worldwide Partner Conference 2012 will unveil Windows Azure IaaS for hybrid and public clouds. In late 2011, Microsoft began leveraging its technical depth in business intelligence (BI) and data management with free previews of a wide variety of value-added Software as a Service (SaaS) add-ins for Windows Azure and SQL Azure (see Table 1).
Codename Description Link to Tutorial “Social Analytics” Summarizes big data from millions of tweets and other unstructured social data provided by the “Social Analytics” Team http://bit.ly/Kluwd1 “Data Transfer” Moves comma-separated-value (SSV) and other structured data to SQL Azure or Windows Azure blobs http://bit.ly/IC1DJp “Data Hub” Enables data mavens to establish private data markets that run in Windows Azure http://bit.ly/IjRCE0 “Cloud Numerics” Supports developers who use Visual Studio to analyze distributed arrays of numeric data with Windows High-Performance Clusters (HPCs) in the cloud or on premises http://bit.ly/IccY3o “Data Explorer” Provides a UI to quickly mash up BigData from various sources and publish the mash-up to a Workspace in Windows Azure http://bit.ly/IMaOIN “Trust Services” Enables programmatically encrypting Windows Azure and SQL Azure data http://bit.ly/IxJfqL “SQL Azure Security Services” Enables assessing the security state of one or all of the databases on a SQL Azure server. http://bit.ly/IxJ0M8 “Austin” Helps developers process StreamInsight data in Windows Azure
Table 1. The SQL Azure Labs team and the StreamInsight unit have published no-charge previews of several experimental SaaS apps and utilities for Windows Azure and SQL Azure. The Labs team characterizes these offerings as "concept ideas and prototypes," and states that they are "experiments with no current plans to be included in a product and are not production quality."
In this article, I'll describe how the Microsoft Hadoop on Windows Azure project eases big data analytics for data-oriented developers and provide brief summaries of free SaaS previews that aid developers in deploying their apps to public and private clouds. (Only a couple require a fee for the Windows Azure resources they consume.) I'll also include instructions for obtaining invitations for the previews, as well as links to tutorials and source code for some of them. These SaaS previews demonstrate to independent software vendors (ISVs) the ease of migrating conventional, earth-bound apps to SaaS in the Windows Azure cloud.
My Recent Articles about SQL Azure Labs and Other Added-Value Windows Azure SaaS Previews: A Bibliography post of 6/1/2012 contains brief descriptions of and links to all my recent articles on these topics.
This article went to press before the Windows Azure Team’s Meet Azure event on 6/7/2012, where the team unveiled the “Spring Wave” of new features, upgrades and updates to Windows Azure, including Windows Azure Virtual Machines, Virtual Networks, Web Sites and other new and exciting services. Also, the team terminated Codenames “Social Analytics” and “Data Transfer” projects in late June. However, as of 6/27/2012, the “Social Analytics” data stream from the Windows Azure Marketplace Data Market was still operational, so the downloadable C# code for the Microsoft Codename “Social Analytics” Windows Form Client still works.
Note: I modified my working version of the project to copy the data from about a million rows in the DataGridView to a DataGrid.csv file, which can be loaded on demand. Copies of this file and the associated source file for the client’s chart are available from my SkyDrive account. I will update the sample code to use the DataGrid.csv file if the Data Market steam becomes unavailable.
Read the article here.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)