MapReduce

  • submit to reddit

Beginner Tips For Elastic MapReduce

Curator's Note: The content of this article was written by John Berryman.   By this point everyone is well acquainted with the power of...

0 replies - 1336 views - 04/22/13 by Eric Genesky in Articles

Big Data Beyond MapReduce: Google's Big Data Papers

Mainstream Big Data is all about MapReduce, but when looking at real-time data, limitations of that approach are starting to show. In this post, I’ll review...

0 replies - 19306 views - 02/25/13 by Mikio Braun in Articles

Sorting Text Files with MapReduce

In my last post I wrote about sorting files in Linux. Decently large files (in the tens of GB’s) can be sorted fairly quickly using that approach....

0 replies - 1918 views - 01/26/13 by Alex Holmes in Articles

Reading Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight.This is part two of a two part blog series on how to read/write Apache Hive data...

0 replies - 1082 views - 01/11/13 by Scott Leberknight in Articles

MapReduce: Detecting Cycles in a Network Graph

I recently received an email from an audience of my blog on Map/Reduce algorithm design regarding how to detect whether a graph is acyclic using...

0 replies - 2336 views - 01/04/13 by Ricky Ho in Articles

MapReduce's Founding Documents

MapReduce is an incredibly powerful algorithm, especially when used to process large amounts of data using distributed systems of commodity hardware. It...

0 replies - 3737 views - 12/04/12 by Mike Miller in Articles

Faunus Provides Big Graph Data Analytics

Faunus is an Apache 2 licensed distributed graph analytics engine that is optimized for batch processing graphs represented...

0 replies - 2644 views - 11/13/12 by Marko Rodriguez in Articles

A JavaScript MapReduce One-Liner

After my post on "Word frequency using MapReduce in Python," I got my paws dirty with some silly Javascript. Once I reduced a whole chunk of code,...

1 replies - 8520 views - 10/01/12 by Hemanth Madhavarao in Articles

Location Sensitive Hashing in MapReduce

Let's say there are N items (with N in the billions) and we want to find all of those that are similar to one another, with similarity defined by a distance...

0 replies - 2878 views - 09/24/12 by Ricky Ho in Articles

Graph Degree Distributions Using R Over Hadoop

There are two common types of graph engines. One type is focused on providing real-time, traversal-based algorithms...

0 replies - 3023 views - 09/16/12 by Marko Rodriguez in Articles

Innovation and Big Data in Corporations: A Roadmap

Big Data is all about technology and business model innovation.  Why? Because, a lot of next generation business models are DATA centric.  Almost all...

0 replies - 2339 views - 09/10/12 by Ravi Kalakota in Articles

Introducing Apache Hadoop Services for Windows Azure

The SQL Server Team (@SQLServer) announced Apache Hadoop Services for Windows Azure, a.k.a. Apache Hadoop on Windows Azure or Hadooop on Azure, at the...

0 replies - 2503 views - 08/31/12 by Roger Jennings in Articles

Big Data: Enterprise Hype or the Future of Enterprise?

Of all the myriad of terms that the tech industry throws around at the moment, none is as often subverted for marketing spin as “big data”. So much so...

1 replies - 2220 views - 08/15/12 by Ben Kepes in Articles

How HDFS Does Replication

As I learned about HBase and HDFS, I wanted to understand how HDFS actually does its replication, whether it's an synchronous replication, what is the...

0 replies - 3408 views - 08/14/12 by Rodrigo De Castro in Articles

A Practical Intro to Streaming MapReduce Processing

In this article I’ll introduce the concept of Streaming MapReduce processing using GridGain and Scala. The choice of Scala is simply due to the fact that...

0 replies - 3519 views - 08/07/12 by Nikita Ivanov in Articles