Big Data

  • submit to reddit

Plotting Gaussians, Including Univariate, Multivariate and Linear Models

While working on Bayesian Networks in a continuous space, I'm plotting lots of Gaussian distributions representing prior, likelihood and posterior functions....

0 replies - 1465 views - 01/22/13 by Daniel Korzekwa in Articles

How Partitioning, Collecting, and Spilling Work in MapReduce

The figure below shows the various steps that the Hadoop MapReduce framework takes after your map function emits a key/value output record. Please note that...

0 replies - 1746 views - 01/22/13 by Alex Holmes in Articles

Introduction to In-Memory Data Grid: Main Features

It is not a new attempt at all to use main memory as a storage area instead of a disk. You can find in your daily life many cases in which Main Memory...

0 replies - 2104 views - 01/22/13 by Esen Sagynov in Articles

Prediction API: Machine Learning from Google

Introduction One of the exciting APIs among the 50+ APIs offered by Google is the Prediction API. It provides pattern matching and machine learning...

0 replies - 6514 views - 01/22/13 by Istvan Szegedi in Articles

ScalaTest a MapReduce using Akka

For people in hurry here is the MapReduce with ScalaTest and Akka code and stepsI was trying to learn Scala and I wanted to kill several birds...

0 replies - 1405 views - 01/21/13 by Krishna Prasad in Articles

Hadoop Single Node Set Up

With this post I am hoping to share the procedure to set up Apache Hadoop in single node. Hadoop is used in dealing with Big Data sets where deployment...

0 replies - 933 views - 01/21/13 by Pushpalanka Jay... in Articles

Data Management, AML, and KYC Analytics

To roadmap Wall Street priorities for 2013, we have been having an interesting set of meetings recently with MDs and leading architects in various banks and...

0 replies - 2413 views - 01/19/13 by Ravi Kalakota in Articles

Building SOLID Databases: Introduction

The SOLID design approach is a set of principles developed in object-oriented programming.  This series will explore the applicability of these principles...

0 replies - 4509 views - 01/19/13 by Chris Travers in Articles

The Guardian Is Brilliant in Supporting Relevant Events with Open Data

I’m a big fan of what The Guardian is doing with their data and API strategy. I think they are a model for what old media should be doing around the...

0 replies - 1517 views - 01/18/13 by Kin Lane in Articles

A Fistful of Monoids

I first ran into monoid swhile searching for monads on google. Monoids are ubiquitous in programming and chances are that you have already used them without...

0 replies - 884 views - 01/18/13 by Muhammad Ashraf in Articles

Yet Another Big Data Whitepaper

I recently read the white paper “Challenges and Opportunities with Big Data” published by the Computing Community Consortium of the CRA. It was...

0 replies - 545 views - 01/18/13 by Mikio Braun in Articles

SPARQL and dbpedia: Getting Structured Data from Wikipedia

I always wondered if you could extract structured data from Wikipedia. Then I stumbled upon DBPedia and SPARQL. DBPedia stores...

0 replies - 2367 views - 01/17/13 by Krishna Prasad in Articles

TaskletStep Oriented Processing in Spring Batch

Many enterprise applications require batch processing to process billions of transactions every day. These big transaction sets have to be processed without...

0 replies - 624 views - 01/17/13 by Eren Avşaroğulları in Articles

Apache Lucene Solr 3.6.2

Apache Lucene and Solr PMC recently announced another version of Apache Lucene library and Apache Solr search server numbred 3.6.2. This is a minor bugfix...

0 replies - 603 views - 01/17/13 by Rafał Kuć in Articles

The Best Color for a Database and More Data Links of the Week

Once again, interesting posts and articles here and there...[RIP] doctorow‘s obit for Aaron Swartz: http://boingboing.net/…“what color do you...

0 replies - 1028 views - 01/17/13 by Arthur Charpentier in Articles