NoSQL Zone is brought to you in partnership with:

Experience of 7.5 years in Java and J2EE Technologies. Currently working as a Java Consultant at Xebia Ravindra has posted 2 posts at DZone. View Full User Profile

Movie Recommendation App using Spring Data and Redis

03.08.2011
| 15845 views |
  • submit to reddit

This blog explains how to build a movie recommendation app using Spring Data and Redis, a NoSQL database. We will be using a NoXML approach and try to identify the nuances of a NoSQL database.

The Movie recommendation application stores the ratings for different movies by users and tries to provide similarity scores between users and recommend movies. It is based on an example in the book 'Programming Collective Intelligence'. We will be using Redis ZSet to store data. A ZSet is a sorted set which keeps the members of the set sorted by a supplied rank.

Code is located on github

Configuring Redis

Install redis as per installations on redis site. run 'redis-server'. Thats it.

Configuring Spring Data

The Spring Data documentation explains how to configure and use Spring Data and redis. The most important part is to add Spring milestone/snapshot repositories in your pom.xml. But I won't be repeating that here. Let us take a JavaConfig based approach to configure Spring.

We need the following configuration to setup the RedisConnectionFactory and a StringRedisTemplate. If you are not familiar with Spring JavaConfig then you would point out an issue of RedisConnectionFactory not being a Singleton anymore. But thats what you need cglib for in your pom.xml (the Singleton issue will be taken care of by enhancing your config class).

@Configuration
public class Config {
@Bean
public RedisConnectionFactory getConnectionFactory() {
JedisConnectionFactory cf = new JedisConnectionFactory();
return cf;
}
@Bean
public StringRedisTemplate getRedisTemplate() {
return new StringRedisTemplate(getConnectionFactory());
}
}

DataModel

So, now how do you create a non-relational data model. In NoSQL world you try to optimize your data model for use cases. You give things like data duplications etc lesser importance (and that is why purists hate NoSQL solutions).

Our use cases are:-
    1. Store rating for movies by a user.
    2. Compute similarity between users.
    3. Recommend movies

Here is the code in the AllInOne *UberDao*. Ignore lines adding Movies and Users. The StringRedisTemplate is autowired in DAO.

We create ZSetOperations bound to the user key and then add ratings for the movies. It also maintains a ZSet for mapping movie to  user ratings (this is required for user case #3). If we do not maintain duplicate data then logic would be required to extract same data later (space vs time).

@Component
public class UberDao {

@Autowired
private StringRedisTemplate srt;

public void addRatings(String user, Map<String, Double> ratings) {
// Used for batch mode
srt.multi();

srt.boundSetOps("Users").add(user);
BoundZSetOperations<String, String> boundZSetOps = srt.boundZSetOps(user);

for (Map.Entry<String, Double> mr : ratings.entrySet()) {
srt.boundSetOps("Movies").add(mr.getKey());
// ZSet to keep track of movie => user rank map
srt.boundZSetOps(mr.getKey()).add(user, mr.getValue());
boundZSetOps.add(mr.getKey(), mr.getValue());
}
// runs all commands in batch
srt.exec();
}

   
Now an observation here, the datamodel looks like a Map. Yes it does: it is a key value store and the point to note is that the database is the extension of the application. There is no impedence mismatch between the datastore and the application model. Is it right or wrong? I will keep that question open.

Operations in Redis allow you to do transactional updates to counters and do server side operations like UNION and INTERSECT. You can see use of multi and exec to do transactional updates.

Computing similarity

Similarity between users can be used by calculating the euclidian distance between user ratings for the common movies or finding the correlation. Class Recommend implements both (please refer to source code on github).

To get the common movies for two users we can fetch their movies and add loops in the client codes. But Redis has a built in intersect  mechanism for such *social* tasks. We use zInterStore to compute the difference between user ratings and then compute the euclidean distance. See class Recommend for details of calculating similarity scores and 'Collective Intellegence' for details.

    
public Map<String, Double> getScoreDiff(final String p1, final String p2) {
Map<String, Double> mScoreMap = new HashMap<String, Double>();
final String combinedKey = p1 + ":" + p2;

Set<Tuple> movieAndScores = srt.execute(new RedisCallback<Set<Tuple>>() {
@Override
public Set<Tuple> doInRedis(RedisConnection con)
throws DataAccessException {
// emits a new zset ...
con.zInterStore(combinedKey.getBytes(), Aggregate.SUM, new int[] {1,-1}, p1.getBytes(), p2.getBytes());
// remove this key after a while.
con.expire(combinedKey.getBytes(), 120);
return con.zRangeByScoreWithScore(combinedKey.getBytes(), 1, 20);
}

});

for (Tuple t : movieAndScores) {
mScoreMap.put(new String(t.getValue()), t.getScore());
}

return mScoreMap;
}

A person can be compared with every other person and then a list of the top 5/10 people with similar tastes can be found. The movies that those people see could be of interest.

Recommendations

To compute recommendations for a user you create a weighted (by similarity scores) rating for Movies that the users have not seen. For this you need ratings for a movie from all users. Class Recommend (method getRecommendations) does this.

You can play with the class Ratings to change the feed data and find recommendations.

How do i run it

I am using a testcase (MovieTest) to capture different steps (no assertions there). They need to be run in sequence one after the other. I did not find any JUnitRunner for JavaConfig in spring so we have to initialise the application in test case.

public class MovieTest {

private AnnotationConfigApplicationContext ctx;
private UberDao dao;
private Recommend recomender;

@Before
public void init() {
// No junit runner to run app with javaconfig.
ctx = new AnnotationConfigApplicationContext(Config.class);
ctx.scan("xebia.moviez.dao");
ctx.scan("xebia.moviez.service");
dao = ctx.getBean(UberDao.class);
recomender = ctx.getBean(Recommend.class);
}

Conclusion

Creating applications with a NoSQL database can be difficult at first as we try to create a relational model in a NoSQL database. NoSQL databases have the schema information embedded in code and without that information, data is more or less useless. NoSQL is not fit for everything, it has its use cases. It's not only for scalability. Imagine, if someone would have created an application using HashMaps only, before the term NoSQL was there.
Published at DZone with permission of its author, Ravindra Rawat.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Chance Gold replied on Tue, 2011/03/08 - 1:09pm

Very interesting and timely article. But you didn't discuss how Spring Data helped you code the solution which was disappointing.

Sura Sos replied on Tue, 2011/03/08 - 1:34pm

Nice introduction.. Here is a link to github https://github.com/SpringSource/ that has more examples to the other types of nosql technology with spring-data framework.

Ravindra Rawat replied on Wed, 2011/03/09 - 12:10am in response to: Chance Gold

Thanks. Spring-Data does all the heavy lifting (just had to configure two beans) and redis operations are exposed nicely. 

I was more focussed on showing the match between application data and the storage format.

John Nonak replied on Fri, 2011/03/11 - 2:23am

Hi, Nice example. We are working on a similar application, so this is very interesting for us, but just out of curiosity, this example is just for illustrating NoSQL uses? I mean I quickly checked the code and it seems that you are processing the similarities by looping over the users and movies. If your database contains over 100000 users and let say over a thousand of movies I would think that the performances will suffer. Well as I said, I quickly checked the code so I might be totally missing the point :) But anyway it's nice to see Spring Data and Redis example.

Ravindra Rawat replied on Tue, 2011/03/15 - 7:10am in response to: John Nonak

You are right. It is an illustration to show NoSQL usage (not performant by any means). 

In you application you could calculate similarities in background (nightly/periodic jobs) not on the fly (in case you see performace issues). There is a possibility to do euclidian distance computation using multiple threads but that is not the point of this post.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.