The way I see it, “data science” is a term coined to describe a special set of requirements and a certain role within web based companies which accumulate a huge amount of data and wish to make use of that information.
A couple of months ago I wrote a blog post describing our use of Mahout random forests for the Kaggle Digit Recogniser Problem and after seeing how long it took to create forests with 500+ trees I wanted to see if this could be sped up by parallelising the process.
We were building an internal application for an insurance company and didn’t have any idea how difficult it was going to be to put something into production so we decided to find out on the first day of the project.
I’m not a fan of New Year’s Resolutions. I believe that every day gives us the opportunity to reinvent ourselves and so it’s silly to put so much stock into that one day each year. Nonetheless, with the new year starting, I find myself longing to create more...
I’ve been playing around with writing some algorithms in both Ruby and Haskell and the latter wasn’t giving the correct result so I wanted to output an intermediate state of the two programs and compare them.
So you want to go with the flow and implement your next application on top of some NoSQL, NotJustSQL, NewSQL, AlmostSQL, SQL++, NextGenSQL, and what not, just to be sure not to miss out on some of the latest developments in the data business