Java Bayesian Classifier ci-bayes 1.0 released
ci-bayes is based off of the chapter on Bayesian classification from Toby Segaran's "Programming Collective Intelligence," and has been ported from the original python with the explicit permission of the author.
ci-bayes is built with Maven 2, and has an explicit runtime dependency on javolution; it provides factories for use with Spring 2, but those aren't required for runtime in the simplest case.
A simple example of how the classifier works might look like this:
FisherClassifier fc=new FisherClassifierImpl();
fc.train("The quick brown fox jumps over the lazy dog's tail","good");
fc.train("Make money fast!", "bad");
String classification=fc.getClassification("money"); // should be "bad"
Currently, ci-bayes uses the SpamAssassin testing corpora for performance and accuracy testing. The methodology is fairly simple: it first trains itself according to the SpamAssassin conventions with seven out of ten corpora, then goes back through the training set, testing the remaining three corpora to see if the result matches what SpamAssassin generated.
It's able to run the classification tests in just over eleven seconds on a single CPU core, with a 98% match with SpamAssassin; given that SpamAssassin and ci-bayes have different classification mechanisms and different functions, this is probably acceptable for most usages. (SpamAssassin uses a neural network to analyze spam; it's not a strict bayesian classifier, so a 98% accuracy is - in my opinion - a marvelous result.)
The binary jar for ci-bayes-1.0-SNAPSHOT is available on java.net.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)