Big Data

  • submit to reddit

LZOP Decompression - Revenge of the Useless Cat

For me LZOP is the ubiquitous compression codec with working with large text files in HDFS due to its MapReduce data locality advantages. As a result when I...

0 replies - 1306 views - 02/11/13 by Alex Holmes in Articles

Google on Datastore Query, Index, and Transactions

This Google Developers tutorial delves into querying, indexing, and transactions with App Engine's Datastore service, driven by Google Bigtable: In this...

0 replies - 1552 views - 02/11/13 by Eric Gregory in Articles

Using R — .Call(“hello”)

In an introductory post on R APIs to C code, Calling C Code ‘Hello World!’, we explored the .C() function with some ‘Hello World!’ baby...

0 replies - 1249 views - 02/10/13 by Jonathan Callahan in Articles

MapReduce with Mongoose and CoffeeScript

After searching the InterWeb for a decent MapReduce example coded in CoffeeScript I came up blank and decided to write my own. This one uses Mongoose too -...

0 replies - 1364 views - 02/10/13 by Col Wilson in Articles

Natura Non Facit Saltus

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see several...

0 replies - 951 views - 02/09/13 by Arthur Charpentier in Articles

Google's Introduction to Datastore

This Google Developers tutorial explores App Engine's Datastore service, driven by Google Bigtable: Datastore service in App Engine is the core component...

0 replies - 2593 views - 02/09/13 by Eric Gregory in Articles

R: Modelling a Conversion Rate with a Binomial Distribution

As part of some work Sid and I were doing last week we wanted to simulate the conversion rate for an A/B testing we were planning. We started with...

0 replies - 1138 views - 02/08/13 by Mark Needham in Articles

Escaping Solr Query Characters In Python

I’ve been working in some Python Solr client code. One area where bugs have cropped up is in query terms that need to be escaped before passing to Solr....

0 replies - 1067 views - 02/08/13 by Doug Turnbull in Articles

An Introduction to Hadoop on Azure

At the ØREDEV conference in Sweden, Yaniv Rodenski spoke about Hadoop on Azure, discussing how it works, various storage options, cloud service...

0 replies - 1044 views - 02/08/13 by Eric Gregory in Articles

What is a Data Scientist?

Scott and I ventured out of the office yesterday evening to check out a new group starting up– Charlottesville’s Big Data Group. The most exciting...

0 replies - 8766 views - 02/07/13 by Doug Turnbull in Articles

Big Data, Statistics, and Computer Science

“Today, software and hardware together provide far more powerful factories than most statisticians realize, factories that many of today’s most able...

0 replies - 1643 views - 02/07/13 by Arthur Charpentier in Articles

Hbase Error: Region is not online: -ROOT-„0

If you are running HBase and commands are giving you an error that looks like this:Fri Oct 05 21:45:02 UTC 2012,...

0 replies - 1739 views - 02/07/13 by George London in Articles

Taking a Random Walk

Consider the following time series,What does this look like? I know, it's a stupid game, but I keep using it in my time series courses. It does...

0 replies - 1464 views - 02/06/13 by Arthur Charpentier in Articles

Developing Your Own Solr Filter - Part 2

In the previous entry “Developing Your Own Solr Filter” we’ve shown how to implement a simple filter and how to use it in Apache Solr. Recently, one of...

0 replies - 1541 views - 02/06/13 by Rafał Kuć in Articles

Overdispersion with Different Exposures

In actuarial science, and insurance ratemaking, taking into account the exposure can be a nightmare (in datasets, some clients have been here for a...

0 replies - 1040 views - 02/05/13 by Arthur Charpentier in Articles