Big Data

  • submit to reddit

Apache Lucene Solr 3.6.2

Apache Lucene and Solr PMC recently announced another version of Apache Lucene library and Apache Solr search server numbred 3.6.2. This is a minor bugfix...

0 replies - 667 views - 01/17/13 by Rafał Kuć in Articles

The Best Color for a Database and More Data Links of the Week

Once again, interesting posts and articles here and there...[RIP] doctorow‘s obit for Aaron Swartz: http://boingboing.net/…“what color do you...

0 replies - 1150 views - 01/17/13 by Arthur Charpentier in Articles

TaskletStep Oriented Processing in Spring Batch

Many enterprise applications require batch processing to process billions of transactions every day. These big transaction sets have to be processed without...

0 replies - 692 views - 01/17/13 by Eren Avşaroğulları in Articles

What is Streamdrill's Trick?

In the previous posts I talked about what streamdrill is good for and how it compares to other Big Data approaches to real-time...

0 replies - 828 views - 01/16/13 by Mikio Braun in Articles

Is the Big Data Shakeout Coming in 2013?

Is the inevitable Big Data shakeout coming?  If you are an enterprise customer, how do you prepare for this? What strategies do you adopt to take...

0 replies - 1454 views - 01/16/13 by Ravi Kalakota in Articles

MapReduce Algorithms – Secondary Sorting

This post covers the pattern of secondary sorting, found in chapter 3 of Data-Intensive Text Processing with MapReduce.  While Hadoop...

0 replies - 971 views - 01/16/13 by Bill Bejeck in Articles

Graph Search: A Sign of Things to Come

It seems like we've been speculating for such a long time that Facebook would release its own smartphone.  And while we're still waiting to see that after...

0 replies - 6008 views - 01/15/13 by Mitch Pronschinske in Articles

What Should You Read to Learn Elementary Statistics?

I’ve thought about making a personal FAQ page. If I do, one of the questions would be what elementary statistics book I recommend. Unfortunately, I don’t...

0 replies - 1561 views - 01/15/13 by John Cook in Articles

Disrupting the Datawarehouse Market with Redshift

Amazon is taking another step at disrupting an existing market. This time they have their sight set on the Datawarehouse market. Amazon is currently running a...

0 replies - 502 views - 01/15/13 by Maarten Ectors in Articles

Dev of the Week: Swathi Venkatachala

Every week, we check in with a new developer/blogger from the DZone community to find out what they're working on now and what's coming next. This week...

0 replies - 3296 views - 01/15/13 by Eric Gregory in Articles

Optimization in R

Optimization is a very common problem in data analytics.  Given a set of variables (which one has control), how to pick the right value such that...

0 replies - 959 views - 01/15/13 by Ricky Ho in Articles

R for Actuarial Science

As mentioned in the Appendix of Modern Actuarial Risk Theory, “R (and S) is the ‘lingua franca’ of data analysis and statistical computing, used in...

0 replies - 1345 views - 01/14/13 by Arthur Charpentier in Articles

Limiting Joins in Apache Hive

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight.Working with large datasets in Hadoop / Hive works is difficult when you have an...

0 replies - 890 views - 01/14/13 by Scott Leberknight in Articles

Cloudera Impala – Fast, Interactive Queries with Hadoop

As discussed in the previous post about Twitter’s Storm, Hadoop is a batch oriented solution that has a lack of support for ad-hoc, real-time...

0 replies - 1028 views - 01/14/13 by Istvan Szegedi in Articles

Calculating a Co-Occurrence Matrix with Hadoop

This post continues with our series of implementing the MapReduce algorithms found in the Data-Intensive Text Processing with MapReducebook. This time we...

0 replies - 722 views - 01/14/13 by Bill Bejeck in Articles