Hadoop

  • submit to reddit

Graph Degree Distributions Using R Over Hadoop

There are two common types of graph engines. One type is focused on providing real-time, traversal-based algorithms...

0 replies - 3023 views - 09/16/12 by Marko Rodriguez in Articles

The Difference Between 'Hadoop DFS' and 'Hadoop FS'

While exploring HDFS, I came across these two syntaxes for querying HDFS: > hadoop dfs > hadoop fs Initally I couldn't differentiate between the...

0 replies - 4081 views - 09/13/12 by Abhishek Jain in Articles

Hadoop, Pig, and Broken Dreams of Environment Variables

Okay, this post isn't nearly as melodramatic as its title – I’m doing some log analysis with Hadoop and Pig. As the logs are coming from a webserver and...

0 replies - 2923 views - 09/12/12 by Oliver Hookins in Articles

Your First Hadoop MapReduce Job

Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. If you are new to Hadoop, first explore the Hadoop site. In this...

0 replies - 6657 views - 09/11/12 by Amresh Singh in Articles

Setting up a Hadoop Virtual Cluster with Vagrant

Usually for testing and using virtual machines, I go online, download the iso image of the machine I want to install, start Virtual Box, tell it to init...

0 replies - 7445 views - 09/09/12 by Carlo Scarioni in Articles

Levels of Abstraction in Big Data

Recently we’ve spoken to a number of people to find out how our real-time stuff could be of use to them. Those were all very interesting conversations, but...

0 replies - 3079 views - 09/06/12 by Mikio Braun in Articles

Five Words That Give Away Rotten Analytics Strategies

Over the last 12 months, I’ve had plenty of “conversations” about big data analytics and BI strategies with customers and potential users. The five...

0 replies - 2144 views - 09/05/12 by Nikita Ivanov in Articles

Introducing Apache Hadoop Services for Windows Azure

The SQL Server Team (@SQLServer) announced Apache Hadoop Services for Windows Azure, a.k.a. Apache Hadoop on Windows Azure or Hadooop on Azure, at the...

0 replies - 2503 views - 08/31/12 by Roger Jennings in Articles

Apache Projects are the Justice League of Scalability

In this post I will define what I believe to be the most important projects within the Apache Projects for building scalable web sites and generally managing...

0 replies - 4320 views - 08/28/12 by Phil Whelan in Articles

Mesos: A Highly Distributed Cloud Architecture Framework

I initially complained about the complexity of installing Mesos when I was playing around with Spark and Shark. However, when I saw the Twitter Mesos and...

0 replies - 3760 views - 08/22/12 by Maarten Ectors in Articles

Hadoop for Real-Time, and the Big Data Buzzwords of 2012

The website defines Spark as a MapReduce-like cluster computing framework designed to support low-latency iterative jobs. However it would be easier to...

0 replies - 2854 views - 08/15/12 by Maarten Ectors in Articles

The Search for a Better BIG Data Analytics Pipeline

"Big Data Analytics" has recently been one of the hottest buzzwords.  It is a combination of "Big Data" and "Deep...

0 replies - 3726 views - 08/14/12 by Ricky Ho in Articles

How HDFS Does Replication

As I learned about HBase and HDFS, I wanted to understand how HDFS actually does its replication, whether it's an synchronous replication, what is the...

0 replies - 3411 views - 08/14/12 by Rodrigo De Castro in Articles

How to Tweak Ganglia Using Hadoop

It's very important to monitor all the machines in the cluster in terms of OS health, bottlenecks, performance hits and so on. There are numerous tools...

0 replies - 4859 views - 08/08/12 by Swathi Venkatachala in Articles

Scala and Hadoop: Hand in Hand at Twitter

If you have read the paper published by Google’s Jeffrey Dean and Sanjay Ghemawat (MapReduce: Simplied Data Processing on Large Clusters), they revealed...

0 replies - 8967 views - 08/05/12 by Istvan Szegedi in Articles