
There are two common types of graph
engines. One type is focused on providing real-time, traversal-based
algorithms...
0 replies - 3023 views - 09/16/12 by Marko Rodriguez in Articles

While exploring HDFS, I came across
these two syntaxes for querying HDFS:
> hadoop dfs
> hadoop fs
Initally I couldn't differentiate between the...
0 replies - 4081 views - 09/13/12 by Abhishek Jain in Articles

Okay, this post isn't nearly as melodramatic as its title – I’m doing some log analysis with Hadoop and Pig. As the logs are coming from a webserver and...
0 replies - 2923 views - 09/12/12 by Oliver Hookins in Articles

Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. If you are new to Hadoop, first explore the Hadoop site.
In this...
0 replies - 6657 views - 09/11/12 by Amresh Singh in Articles

Usually for testing and using virtual machines, I go online, download the iso image of the machine I want to install, start Virtual Box, tell it to init...
0 replies - 7445 views - 09/09/12 by Carlo Scarioni in Articles

Recently we’ve spoken to a number of people to find out how our real-time stuff could be of use to them. Those were all very interesting conversations, but...
0 replies - 3079 views - 09/06/12 by Mikio Braun in Articles

Over the last 12 months, I’ve had plenty of “conversations” about big data analytics and BI strategies with
customers and potential users. The five...
0 replies - 2144 views - 09/05/12 by Nikita Ivanov in Articles

The SQL Server Team (@SQLServer) announced Apache Hadoop Services for Windows Azure, a.k.a. Apache Hadoop on Windows Azure or Hadooop on Azure, at the...
0 replies - 2503 views - 08/31/12 by Roger Jennings in Articles

In this post I will define what I believe to be the most important projects within the Apache Projects for building scalable web sites and generally managing...
0 replies - 4320 views - 08/28/12 by Phil Whelan in Articles

I initially complained about the complexity of installing Mesos when I was playing around with Spark and Shark. However,
when I saw the Twitter Mesos and...
0 replies - 3760 views - 08/22/12 by Maarten Ectors in Articles

The website defines Spark as
a MapReduce-like cluster computing framework designed to support
low-latency iterative jobs. However it would be easier to...
0 replies - 2854 views - 08/15/12 by Maarten Ectors in Articles

"Big Data Analytics"
has recently been one of the hottest buzzwords. It is a combination of
"Big Data" and "Deep...
0 replies - 3726 views - 08/14/12 by Ricky Ho in Articles

As I learned about HBase and HDFS, I wanted to understand how HDFS
actually does its replication, whether it's an synchronous replication,
what is the...
0 replies - 3411 views - 08/14/12 by Rodrigo De Castro in Articles

It's
very important to monitor all the machines in the cluster in terms of
OS health, bottlenecks, performance hits and so on. There are numerous
tools...
0 replies - 4859 views - 08/08/12 by Swathi Venkatachala in Articles

If you have read the paper published by Google’s Jeffrey Dean and Sanjay Ghemawat (MapReduce: Simplied Data Processing on Large Clusters),
they revealed...
0 replies - 8967 views - 08/05/12 by Istvan Szegedi in Articles