Big Data

  • submit to reddit

Veteran Apache Hadoop Developer Joins WANdisco

We’re pleased to announce that founder of Apache BigTop and one of the original developers of Apache Hadoop, Dr. Konstantin Boudnik has joined...

0 replies - 1516 views - 01/12/13 by Jessica Thornsby in Articles

Etsy Engineer: "Whom the Gods Would Destroy, They First Give Real-Time Analytics"

Prognosticating analysts suggest that 2013 will be the year of real-time analytics. But Dan McKinley, Principal Engineer at Etsy.com, suggests we all...

0 replies - 1215 views - 01/11/13 by Eric Gregory in Articles

Reading Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight.This is part two of a two part blog series on how to read/write Apache Hive data...

0 replies - 1054 views - 01/11/13 by Scott Leberknight in Articles

Building Big Data Pipelines with Hadoop

Here's an in-depth JavaZone tutorial on building big data pipelines: Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline...

0 replies - 1302 views - 01/11/13 by Alex Crafts in Articles

Semantifying a Mediawiki for Chinese Rock Music

During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to...

0 replies - 495 views - 01/11/13 by René Pickhardt in Articles

What is Streamdrill Good For?

A few weeks ago, we released the beta (oh, sorry, the ”β”) of streamdrill. One question we heard quite often was, “Well, what is it good for?” So...

0 replies - 1752 views - 01/10/13 by Mikio Braun in Articles

Finding Pixels with No Variance Using R

I’ve written previously about our attempts at the Kaggle Digit Recogniser problem and our approach so far...

0 replies - 1675 views - 01/10/13 by Mark Needham in Articles

Bash Magic: List Hive Table Sizes in GB

To list the sizes of Hive tables in Hadoop in GBs: sudo -u hdfs hadoop fs -du /user/hive/warehouse/ | awk '/^[0-9]+/ { print int($1/(1024**3)) "...

0 replies - 1965 views - 01/10/13 by Jakub Holý in Articles

Hypothesis-Driven Development

You’ve got your vision of what you want to build. You’ve also got a ton of unknowns and uncertainty. You know you can’t just go build it and hope...

0 replies - 3292 views - 01/09/13 by Abby Fichtner in Articles

Sonnet Primes in Python

A while back I wrote about sonnet primes, primes of the form ababcdcdefefgg where the letters a through g represent digits and a is...

0 replies - 1642 views - 01/09/13 by John Cook in Articles

Statisticians Promote Contributions to Society During the International Year of Statistics in 2013

The comSysto GmbH and more than 1,400 organizations in 111 countries are combining energies in 2013 to promote the International Year of Statistics...

0 replies - 1248 views - 01/09/13 by Daniel Bartl in Articles

Writing Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight.This is part one of a two part blog series on how to read/write Apache Hive data...

0 replies - 2451 views - 01/09/13 by Scott Leberknight in Articles

A Trip to the Math Museum

This Saturday, we had two interesting museum experiences, with the kids. Kids are 10, 7 (and a half as she keeps saying) and 2 (and a half, too). In the...

0 replies - 1587 views - 01/09/13 by Arthur Charpentier in Articles

"Please Login to Your Facebook Account" - Behind a Data Mining Scam

So someone sends you a link to the latest Gangnam parody / cat meme / man jumping on frozen pool video and the link looks something like...

0 replies - 2619 views - 01/08/13 by Troy Hunt in Articles

Gazzang Predicts This Will Be the Year of Big Data

SOA World magazine points to a 2013 prediction from Gazzang, a company dealing with Linux data security. Gazzang reckons that 2013 will be the year...

0 replies - 1018 views - 01/08/13 by Eric Gregory in Articles