Big Data

  • submit to reddit

Distributed Lock using Zookeeper

This article is by Stephen Mouring, Jr. On my project we have a number of software components that run concurrently, some on a cron, and some as part of our...

0 replies - 823 views - 01/08/13 by Scott Leberknight in Articles

The Closet Bayesian

When I was a grad student, a statistics postdoc confided to me that he was a “closet Bayesian.” This sounded absolutely bizarre. Why would someone be...

0 replies - 2061 views - 01/07/13 by John Cook in Articles

Palantir's Stephen Cohen on Big Data: "Humans are of a Greater Order than Algorithms"

Is 2013 the year we're all replaced by an apocalyptic revolt of robotic yellow elephants? Not so fast, says Palantir's Stephen Cohen. For all big data's power,...

0 replies - 2462 views - 01/07/13 by Alex Crafts in Articles

Building a Statistical Significance Testing Web Service with R

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on large, in-memory data...

0 replies - 1799 views - 01/07/13 by Gary Sieling in Articles

Analyzing Mortage Data with Hadoop MapReduce: Java vs. Pig

In a recent post, I used Pig to analyze some MBS (mortgage-backed security) new-issue pool data from Fannie Mae.  At the time, I noted a number of...

0 replies - 3098 views - 01/07/13 by Wayne Adams in Articles

Running the SurveyApplicationCS Demo Project under Android Jelly Bean 4.2

My (@rogerjenn) LightSwitch HTML Client Preview 2: OakLeaf Contoso Survey Application Demo on Office 365 SharePoint Site post updated 12/25/2012...

0 replies - 2095 views - 01/06/13 by Roger Jennings in Articles

Big Graph Data on the Hortonworks Big Data Platform

This is an archival repost of a blog post that was originally published on Hortonworks’ blog. The Hortonworks Data Platform (HDP)...

0 replies - 2516 views - 01/06/13 by Marko Rodriguez in Articles

Google on Turning Data Problems into Advantages

At Google I/O 2012, Ju-kay Kwek and Navneet Joneja offered advice on turning businesses' data challenges into competitive advantages with thoughtful big...

1 replies - 2288 views - 01/05/13 by Eric Gregory in Articles

More on "Staggering Odds" - Visualizing Probabilities

Following my previous post, a few more things. As mentioned by Frédéric, it is – indeed – possible to compute the probability of all pairs. More...

0 replies - 1058 views - 01/05/13 by Arthur Charpentier in Articles

How to Query Massive Datasets Using Google BigQuery

Google's Ryan Boyd and Michael Manoochehri teach you how to use BigQuery to query bigger-than-big datasets:

0 replies - 2481 views - 01/04/13 by Eric Gregory in Articles

Nathan Marz's "Lambda Architecture" Approach to Big Data

Over at Database Tutorials and Videos, you can read a fascinating excerpt of Nathan Marz's Big Data (partially available now in an early-access...

0 replies - 2975 views - 01/04/13 by Eric Gregory in Articles

Are These "Staggering" Odds Really So Staggering?

I was supposed to take a holiday break, but Frédéric, professor in Tours, came back to me this morning with a tickling question. He asked me what were...

0 replies - 506 views - 01/04/13 by Arthur Charpentier in Articles

Convert OpenStreetMap Objects to KML with R

A geo-quick-tip: With the osmar and maptools package you can easily pull an OpenStreetMap object and convert it to KML, like below...

0 replies - 1184 views - 01/04/13 by Kay Cichini in Articles

What is Data Science?

The term “data science” started to appear a few years ago and has continually gained traction. So what is it?First of all, there is no such thing as...

0 replies - 2860 views - 01/03/13 by Mikio Braun in Articles

Mahout: Parallelising the Creation of DecisionTrees

A couple of months ago I wrote a blog post describing our use of Mahout random forests for the Kaggle Digit Recogniser...

0 replies - 2035 views - 01/03/13 by Mark Needham in Articles