• submit to reddit
Krishna Prasad01/03/13
0 replies

HtmlUnit vs JSoup: HTML Parsing in Java

In continuation of my earlier blog Jsoup: nice way to do HTML parsing in Java, in this blog I will compare JSoup with other similar framework, HtmlUnit. A

Ben Wootton01/03/13
0 replies

7 Habits of Highly Effective Maven Users

One of the biggest leaps forward in my productivity and satisfaction as a developer was experienced when I learnt about and adopted Maven for my Java projects.

Mikio Braun01/03/13
0 replies

What is Data Science?

The way I see it, “data science” is a term coined to describe a special set of requirements and a certain role within web based companies which accumulate a huge amount of data and wish to make use of that information.

Mark Needham01/03/13
0 replies

Mahout: Parallelising the Creation of DecisionTrees

A couple of months ago I wrote a blog post describing our use of Mahout random forests for the Kaggle Digit Recogniser Problem and after seeing how long it took to create forests with 500+ trees I wanted to see if this could be sped up by parallelising the process.

Eric Gregory01/03/13
1 replies

Here's How to Build an Optimal Hadoop Cluster

If you're ringing in the New Year by building a Hadoop cluster, then you might want to take a look at Atlantbh's detailed tutorial.

Greg Duncan01/03/13
0 replies

Introducing REST JSON/JSONP "Open Beer Database" API!

Yup, there's an Open Beer Database, described as "a free, public database and API for beer information." Now, that's my kind of information...

Mark Needham01/03/13
0 replies

The Tracer Bullet Approach: An Example

We were building an internal application for an insurance company and didn’t have any idea how difficult it was going to be to put something into production so we decided to find out on the first day of the project.

Abby Fichtner01/03/13
0 replies

On Shipping More in 2013

I’m not a fan of New Year’s Resolutions. I believe that every day gives us the opportunity to reinvent ourselves and so it’s silly to put so much stock into that one day each year. Nonetheless, with the new year starting, I find myself longing to create more...

Mark Needham01/03/13
0 replies

Sed: Replacing Characters with a New Line

I’ve been playing around with writing some algorithms in both Ruby and Haskell and the latter wasn’t giving the correct result so I wanted to output an intermediate state of the two programs and compare them.

Erich Styger01/03/13
0 replies

PWM and Shell for a LED

Controlling a LED is a great starter for any embedded project: simple and you immediately get feedback if it works.

Todd Merritt01/03/13
0 replies

What is an Enterprise Resource Planning (ERP) System?

The ERP acts a go between for various independent departmental systems so that they can integrate with one another. Here's analogy using Rice Krispies...

Lukas Eder01/02/13
0 replies

A Database Landscape Map

So you want to go with the flow and implement your next application on top of some NoSQL, NotJustSQL, NewSQL, AlmostSQL, SQL++, NextGenSQL, and what not, just to be sure not to miss out on some of the latest developments in the data business

David Pollak01/02/13
6 replies

Deploying a Simple Lift App on Escalante Using OpenShift

OpenShift and Escalante just work with Lift. Thanks for Galder for creating Escalante and lowering to barriers to entry for Lift.

Istvan Szegedi01/02/13
0 replies

Machine Learning Using Microsoft HDInsight on Azure

One of the key Microsoft HDInsight components is Mahout, a scalable machine learning library that provides a number of algorithms relying on the Hadoop platform.

Esen Sagynov01/02/13
1 replies

The Availability and Operational Stability of NoSQL

In this article, I will analyze the distribution and availability of these products from the operational aspect. The selected targets are Cassandra, HBase and MongoDB.