
For me LZOP is the ubiquitous compression codec with working with large text files in HDFS due to its MapReduce data locality advantages. As a result when I...
0 replies - 1306 views - 02/11/13 by Alex Holmes in Articles

This Google Developers tutorial delves into querying, indexing, and transactions with App Engine's Datastore service, driven by Google Bigtable:
In this...
0 replies - 1552 views - 02/11/13 by Eric Gregory in Articles

In an introductory post on R APIs to C code, Calling C Code ‘Hello World!’, we explored the .C() function with some ‘Hello World!’ baby...
0 replies - 1249 views - 02/10/13 by Jonathan Callahan in Articles

After searching the InterWeb for a decent MapReduce example coded in CoffeeScript I came up blank and decided to write my own. This one uses Mongoose too -...
0 replies - 1364 views - 02/10/13 by Col Wilson in Articles

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see several...
0 replies - 951 views - 02/09/13 by Arthur Charpentier in Articles

This Google Developers tutorial explores App Engine's Datastore service, driven by Google Bigtable:
Datastore service in App Engine is the core component...
0 replies - 2593 views - 02/09/13 by Eric Gregory in Articles

As part of some work Sid and I were doing last week we wanted to simulate the conversion rate for an A/B testing we were planning.
We started with...
0 replies - 1138 views - 02/08/13 by Mark Needham in Articles

I’ve been working in some Python Solr client code. One area where bugs have cropped up is in query terms that need to be escaped before passing to Solr....
0 replies - 1067 views - 02/08/13 by Doug Turnbull in Articles

At the ØREDEV conference in Sweden, Yaniv Rodenski spoke about Hadoop on Azure, discussing how it works, various storage options, cloud service...
0 replies - 1044 views - 02/08/13 by Eric Gregory in Articles

Scott and I ventured out of the office yesterday evening to check out a new group starting up– Charlottesville’s Big Data Group. The most exciting...
0 replies - 8766 views - 02/07/13 by Doug Turnbull in Articles

“Today, software and hardware together provide far more powerful factories than most statisticians realize, factories that many of today’s most able...
0 replies - 1643 views - 02/07/13 by Arthur Charpentier in Articles

If you are running HBase and commands are giving you an error that looks like this:Fri Oct 05 21:45:02 UTC 2012,...
0 replies - 1739 views - 02/07/13 by George London in Articles

Consider the following time series,What does this look like? I know, it's a stupid game, but I keep using it in my time series courses. It does...
0 replies - 1464 views - 02/06/13 by Arthur Charpentier in Articles

In the previous entry “Developing Your Own Solr Filter” we’ve shown how to implement a simple filter and how to use it in Apache Solr. Recently, one of...
0 replies - 1541 views - 02/06/13 by Rafał Kuć in Articles

In actuarial science, and insurance ratemaking, taking into account the exposure can be a nightmare (in datasets, some clients have been here for a...
0 replies - 1040 views - 02/05/13 by Arthur Charpentier in Articles