Big Data/Analytics Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 544 posts at DZone. You can read more from them at their website. View Full User Profile

Mahout: Using a Saved Random Forest/DecisionTree

10.31.2012
| 5097 views |
  • submit to reddit

One of the things that I wanted to do while playing around with random forests using Mahout was to save the random forest and then use use it again which is something Mahout does cater for.

It was actually much easier to do this than I’d expected and assuming that we already have a DecisionForest built we’d just need the following code to save it to disc:

int numberOfTrees = 1;
Data data = loadData(...);
DecisionForest forest = buildForest(numberOfTrees, data);
 
String path = "saved-trees/" + numberOfTrees + "-trees.txt";
DataOutputStream dos = new DataOutputStream(new FileOutputStream(path));
 
forest.write(dos);

When I was looking through the API for how to load that file back into memory again it seemed like all the public methods required you to be using Hadoop in some way which I thought was going to be a problem as I’m not using it.

For example the signature for DecisionForest.load reads like this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
 
public static DecisionForest load(Configuration conf, Path forestPath) throws IOException { }

As it turns out though you can just pass an empty configuration and a normal file system path and the forest shall be loaded:

int numberOfTrees = 1;
 
Configuration config = new Configuration();
Path path = new Path("saved-trees/" + numberOfTrees + "-trees.txt");
DecisionForest forest = DecisionForest.load(config, path);

Much easier than expected!

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)