
Curator's Note: The content of this article was written by John Berryman. By this point everyone is well acquainted with the power of...
0 replies - 1336 views - 04/22/13 by Eric Genesky in Articles

Mainstream Big Data is all about MapReduce, but when looking at real-time data, limitations of that approach are starting to show. In this post, I’ll review...
0 replies - 19306 views - 02/25/13 by Mikio Braun in Articles

In my last post I wrote about sorting files in Linux. Decently large files (in the tens of GB’s) can be sorted fairly quickly using that approach....
0 replies - 1918 views - 01/26/13 by Alex Holmes in Articles

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight.This is part two of a two part blog series on how to read/write Apache Hive data...
0 replies - 1082 views - 01/11/13 by Scott Leberknight in Articles

I recently received an email from an audience of my blog on Map/Reduce algorithm design
regarding how to detect whether a graph is acyclic using...
0 replies - 2336 views - 01/04/13 by Ricky Ho in Articles

MapReduce is an incredibly powerful algorithm, especially when used to process large amounts of data using distributed systems of commodity hardware. It...
0 replies - 3737 views - 12/04/12 by Mike Miller in Articles

Faunus is an Apache 2 licensed distributed graph analytics engine that is optimized for batch processing graphs represented...
0 replies - 2644 views - 11/13/12 by Marko Rodriguez in Articles

After my post on "Word frequency using MapReduce in Python," I got my paws dirty with some silly Javascript. Once I reduced a whole
chunk of code,...
1 replies - 8520 views - 10/01/12 by Hemanth Madhavarao in Articles

Let's say there are N items (with N in the billions) and we want to find all of those
that are similar to one another, with similarity defined by a distance...
0 replies - 2878 views - 09/24/12 by Ricky Ho in Articles

There are two common types of graph
engines. One type is focused on providing real-time, traversal-based
algorithms...
0 replies - 3023 views - 09/16/12 by Marko Rodriguez in Articles

Big Data is all about technology and business model innovation. Why?
Because, a lot of next generation business models are DATA centric.
Almost all...
0 replies - 2339 views - 09/10/12 by Ravi Kalakota in Articles

The SQL Server Team (@SQLServer) announced Apache Hadoop Services for Windows Azure, a.k.a. Apache Hadoop on Windows Azure or Hadooop on Azure, at the...
0 replies - 2503 views - 08/31/12 by Roger Jennings in Articles

Of all the myriad of terms that the tech industry throws around at
the moment, none is as often subverted for marketing spin as “big data”.
So much so...
1 replies - 2220 views - 08/15/12 by Ben Kepes in Articles

As I learned about HBase and HDFS, I wanted to understand how HDFS
actually does its replication, whether it's an synchronous replication,
what is the...
0 replies - 3408 views - 08/14/12 by Rodrigo De Castro in Articles

In this article I’ll introduce the concept of Streaming MapReduce processing using GridGain
and Scala. The choice of Scala is simply due to the fact that...
0 replies - 3519 views - 08/07/12 by Nikita Ivanov in Articles