Tools & Methods

  • submit to reddit

Bundler and Cross-Platform Development

The hazards of cross-platform developmentRecently I helped a co-worker with getting Rails, Nginx, and Unicorn up and running in a Linux VM, using Capistrano to...

0 replies - 1324 views - 01/26/13 by Brian Swan in Articles

Sorting Text Files with MapReduce

In my last post I wrote about sorting files in Linux. Decently large files (in the tens of GB’s) can be sorted fairly quickly using that approach....

0 replies - 1925 views - 01/26/13 by Alex Holmes in Articles

Topological Data Analysis, Science Behind Paywalls, and More Data Links

This week, I was travelling in Europe (from Lausanne to Amsterdam), and since I have no cell-phone, it is a bit difficult – somethimes – to...

0 replies - 1347 views - 01/26/13 by Arthur Charpentier in Articles

Get Real Data from the Semantic Web - Finding Resources

In my last article, I briefly explained how to get data from a resource using python and SPARQL. This article explains how to find the resource in the...

0 replies - 1416 views - 01/25/13 by Col Wilson in Articles

R: Ordering Rows in a Data Frame by Multiple Columns

In one of the assignments of Computing for Data Analysis we needed to sort a data frame based on the values in  two of the columns and then return the top...

0 replies - 1303 views - 01/25/13 by Mark Needham in Articles

Building SOLID Databases: Single Responsibility and Normalization

IntroductionThis instalment will cover the single responsibility principle in object-relational design, and its relationship both to data normalization and...

0 replies - 1644 views - 01/25/13 by Chris Travers in Articles

Jenkins Description Setter Plugin for Improving Continuous Delivery Visibility

In Continuous Delivery each build is potentially shippable. This fact implies among a lot of other things, to assign a none snapshot version to your...

1 replies - 1926 views - 01/25/13 by Alex Soto in Articles

Lexicographically Sorting Large Files in Linux

When I hear the word “sort” my first thought is usually “Hadoop”! Yes, sorting is one thing that Hadoop does well, but if you’re working with large...

0 replies - 1833 views - 01/24/13 by Alex Holmes in Articles

Controlling User Logging in Hadoop

Imagine that you’re a Hadoop administrator, and to make things interesting you’re managing a multi-tenant Hadoop cluster where data scientists, developers...

0 replies - 1765 views - 01/24/13 by Alex Holmes in Articles

On the Ethic of Delivery

Over the past few years I’ve been a kind of informal adviser to the Defrag event. The role is less than onerous, Eric Norlin totally understands...

0 replies - 1364 views - 01/24/13 by Ben Kepes in Articles

Get Real Data from the Semantic Web

Semantic Web this, Semantic Web that, what actual use is the Semantic Web in the real world? I mean how can you actually use it? If you haven't heard the...

0 replies - 11289 views - 01/24/13 by Col Wilson in Articles

Running Unit Tests and Integration Tests Separately With Maven Failsafe and TestNG

Recently for my new pet project I decided that I would like to have some tests executed during standard mvn test and some other ones only during different...

3 replies - 1853 views - 01/24/13 by Tomasz Dziurko in Articles

Apache Lucene and Solr 4.1 Released

Today Apache Lucene and Solr PMC announced another version of the Apache Lucene library and Apache Solr search server - version 4.1. This is a major release...

0 replies - 7235 views - 01/23/13 by Rafał Kuć in Articles

Serpents and Sunbursts in Source Code Structure

Hardware engineers have it so easy. Or at least they used to. Reviews went something like this. Gary's designed a new board (OK, this was the late...

0 replies - 2233 views - 01/23/13 by Edmund Kirwan in Articles

How to Completely Fail at BDD

Are you interested in introducing BDD to your team? Don’t try and do it like this under these circumstances. Learn from my failure.Having experienced...

10 replies - 17172 views - 01/23/13 by Jon Archer in Articles