Big Data/BI Zone is brought to you in partnership with:

Mark Needham is a software developer and consultant at ThoughtWorks. I have a keen interest in developer testing and object oriented design of systems. Mark is a DZone MVB and is not an employee of DZone and has posted 388 posts at DZone. You can read more from them at their website. View Full User Profile

Mahout: Using a Saved Random Forest/DecisionTree

10.31.2012
| 2906 views |
  • submit to reddit

One of the things that I wanted to do while playing around with random forests using Mahout was to save the random forest and then use use it again which is something Mahout does cater for.

It was actually much easier to do this than I’d expected and assuming that we already have a DecisionForest built we’d just need the following code to save it to disc:

int numberOfTrees = 1;
Data data = loadData(...);
DecisionForest forest = buildForest(numberOfTrees, data);
 
String path = "saved-trees/" + numberOfTrees + "-trees.txt";
DataOutputStream dos = new DataOutputStream(new FileOutputStream(path));
 
forest.write(dos);

When I was looking through the API for how to load that file back into memory again it seemed like all the public methods required you to be using Hadoop in some way which I thought was going to be a problem as I’m not using it.

For example the signature for DecisionForest.load reads like this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
 
public static DecisionForest load(Configuration conf, Path forestPath) throws IOException { }

As it turns out though you can just pass an empty configuration and a normal file system path and the forest shall be loaded:

int numberOfTrees = 1;
 
Configuration config = new Configuration();
Path path = new Path("saved-trees/" + numberOfTrees + "-trees.txt");
DecisionForest forest = DecisionForest.load(config, path);

Much easier than expected!

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)