Big Data

  • submit to reddit

Setting the Umask when Using Capistrano

This is one of those posts to remind me how I solved a problem last time!I've recently been using Capistrano for deployment and other remote tasks...

0 replies - 932 views - 04/18/13 by Trevor Parsons in Articles

Dev of the Week: A. Jesse Jiryu Davis

Every week, we feature a new developer/blogger from the DZone community here and in our newsletter, catching up to find out what they're working on...

0 replies - 6199 views - 04/17/13 by Eric Gregory in Articles

Detecting Social Capitalists on Twitter with Graph Databases

 Curator's Note: The content of this article is based on the original written over at the Sparsity Technologies' blog .Nicolas Dugué and Anthony Perez...

0 replies - 3304 views - 04/17/13 by Damaris Coll in Articles

Social Networks in Fact and Fiction

SIAM News arrived this afternoon and had an interesting story on the front page: Applying math to myth helps separate fact from fiction.In a nutshell, the...

0 replies - 2884 views - 04/16/13 by John Cook in Articles

Thoughts on the European Data Forum

I travelled to Ireland last week, to attend the second meeting of the European Data Forum (EDF). The EDF provided travel support for my...

0 replies - 1903 views - 04/16/13 by Paul Miller in Articles

The Quiet Creep of Facial Recognition

If you don’t follow Alistair Croll of Solve for Interesting, you should. In a piece published last week, You’ll Be Tagged, Croll makes the...

0 replies - 1315 views - 04/16/13 by Christopher Taylor in Articles

Using R with Geospatial Data

GIS, an acronym that brings joy to some and strikes fear in the heart of those not interested in buying expensive software. Luckily fight or flight can be...

0 replies - 1887 views - 04/16/13 by Jonathan Callahan in Articles

Reserving with Negative Increments in Triangles

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was...

0 replies - 1502 views - 04/15/13 by Arthur Charpentier in Articles

How to Debug Solr with Eclipse

Recently I was puzzled by some behavior Solr was showing me. I scratched my head and called over a colleague. We couldn’t quite figure out what was going on....

0 replies - 2175 views - 04/15/13 by Doug Turnbull in Articles

Cassandra 1.1 – Tuning for Frequent Column Updates

Cassandra is known for its good write performance. But there are scenarios, when you might run into trouble – especially when particular use case...

0 replies - 1670 views - 04/14/13 by Daniel Bartl in Articles

Tweaking Movie Subtitles with R

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in...

0 replies - 1825 views - 04/13/13 by Kay Cichini in Articles

Modifying Solr Result Relevancy Via An “Auxiliary Boost” Field

English is a confusing language. I mean, does it really make sense that you can park in a driveway or drive in a parkway? Also, I’ve always been amused that...

0 replies - 1962 views - 04/12/13 by John Berryman in Articles

Data Roundup: The Hidden Biases of Big Data and More

Long time no see. Here are some very interesting posts, discovered this week, somewhere else on the internet,“Good biology can be done without maths” or...

0 replies - 930 views - 04/12/13 by Arthur Charpentier in Articles

Machine Learning: Naïve Bayes Rule for Malware Detection and Classification

ABSTRACT: This paper presents statistics and machine learning principles as an exercise while analyzing malware. Conditional probability or Bayes’...

0 replies - 3983 views - 04/10/13 by Ryan Fahey in Articles

Algorithm of the Week: Monte Carlo Methods

First: Why? Monte Carlo methods are excellent for problems that are complex enough that an exact solution is nigh impossible and 100% perfect accuracy is...

0 replies - 5721 views - 04/09/13 by Justin Bozonier in Articles