Big Data

  • submit to reddit

Analyzing (and Visualizing) the Statistics Blogosphere

John Johnson did an analysis of the statistics blogosphere for the Coursera Social Networking Analysis class. His blog post about the analysis lists...

0 replies - 2411 views - 11/18/12 by John Cook in Articles

The Romney Campaign's Big Data Fiasco

In this great article on the IT project fiasco in Mitt Romney's campaign, there's a quote that is representative of much of what you see around...

1 replies - 4270 views - 11/17/12 by Rodrigo De Castro in Articles

Let's Build Better Election Visualizations

In the wake of the U.S. election, a ton of different outcome visualizations have made the rounds online. The traditional winner-take-all electoral map gets the...

0 replies - 4721 views - 11/16/12 by Eric Gregory in Articles

The Probability of Long Runs

Suppose you’ve written a program that randomly assigns test subjects to one of two treatments, A or B, with equal probability. The researcher using your...

0 replies - 2748 views - 11/15/12 by John Cook in Articles

Building a Naive Bayes Classifier in the Browser Using Map-Reduce

The last decade of Javascript performance improvements in the browser provide exciting possibilities for distributed computing....

0 replies - 3558 views - 11/14/12 by Gary Sieling in Articles

Monitoring at eBay: Big Data Problems

This post is based on a talk by Bhaven Avalani and Yuri Finklestein at QConSF 2012 (slides). Bhaven and Yuri work on the Platform Services team at...

1 replies - 7132 views - 11/14/12 by Matt O'Keefe in Articles

Election Analytics, Tetris, and More Data Links of the Week

So, Movember finally arrived (seehttp://ca.movember.com/). So far, not a lot of articles about moustaches. But I should find some by the end of the month!...

0 replies - 1955 views - 11/14/12 by Arthur Charpentier in Articles

Faunus Provides Big Graph Data Analytics

Faunus is an Apache 2 licensed distributed graph analytics engine that is optimized for batch processing graphs represented...

0 replies - 2680 views - 11/13/12 by Marko Rodriguez in Articles

Complexity and Brain Size

The last year or so I’ve been leading a small team of developers. We’ve been working on a project that involves genomics and molecular biology,...

0 replies - 2669 views - 11/13/12 by Ola Bini in Articles

A Big Data Quadfecta: (Cassandra + Storm + Kafka) + ElasticSearch

In my previous post, I discussed our BigData Trifecta, which includes Storm, Kafka and Cassandra. Kafka played the role of our work/data queue. ...

0 replies - 4925 views - 11/12/12 by Brian O' Neill in Articles

A Simple Way to Build an ExtJS 3.4 Scatter Chart

ProblemYou want to display a scatter chart in ExtJS 3.4.SolutionUse the Ext.chart.Chart class, and make the lines invisible.DiscussionThis example works by...

0 replies - 2025 views - 11/12/12 by Gary Sieling in Articles

Email Marketing is a Predictive Analytics Problem

In his book Permission Marketing, Seth Godin referred to email marketing as “the most personal advertising medium in history”.  That was...

0 replies - 2646 views - 11/10/12 by Ravi Kalakota in Articles

Testing Hadoop Programs with MRUnit

This post will take a slight detour from implementing the patterns found in Data-Intensive Processing with MapReduce to discuss something equally...

0 replies - 3335 views - 11/09/12 by Bill Bejeck in Articles

R Without Hadley Wickham

Tim Hopper asked on Twitter today:#rstats programming without @hadleywickham’s libraries is like ________ without _________.Some of the replies...

0 replies - 3024 views - 11/08/12 by John Cook in Articles

Normality Versus Goodness-of-Fit Tests

In many cases, in statistical modeling, we would like to test whether the underlying distribution from an i.i.d. sample lies in a given (parametric) family,...

0 replies - 2136 views - 11/07/12 by Arthur Charpentier in Articles