Enterprise Integration Zone is brought to you in partnership with:

Bilgin Ibryam is a software engineer with Master's degree in Computer Science and currently working for Red Hat in London. He is also the author of "Instant Apache Camel Message Routing" book, an open source enthusiast, Apache OFBiz, and Apache Camel committer. In his spare time, he enjoys contributing to open source projects and blogging at ofbizian.com. Bilgin is a DZone MVB and is not an employee of DZone and has posted 24 posts at DZone. You can read more from them at their website. View Full User Profile

How to keep your content repository and Solr in synch using Camel

  • submit to reddit
With recent contributions to Camel, now camel-jcr component has a consumer which allows monitoring a Java Content Repository for changes. If your jcr supports OPTION_OBSERVATION_SUPPORTED then the consumer will register an EventListener and get notified for all kind of events. The chances are that you are not interested in all the events from the whole repository and in this case it is possible to narrow down the notifications to receive by further specifying the path of interest, event types, node uuids, nodeTypes and etc.

How can this consumer be useful? (hhmmm, you tell me) Lets say we have a CMS and we want to keep our external Solr index in synch with the content updates.So whenever a new node is added to the content repository all of its properties get indexed in Solr, and if the node is deleted from the content repository then corresponding document is removed from Solr.

Here is a Camel route that will listen for changes under /path/site folder and all its children. But this route will get notified only for two kind of events: NODE_ADDED and NODE_REMOVED, because the value of eventTypes option is a bit mask of the event types of interest (in this case 3 for masking 1 and 2 respectively).


   .when(script("beanshell", "request.getBody().getType() == 1"))

   .when(script("beanshell", "request.getBody().getType() == 2"))

   .log("Event type not recognized" + body().toString());

Then the route will split each event into a separate message and depending on the event type will send the node creation events to direct:index route and node deletion events to direct:delete route.

Delete route is a simple one: It sets the solr operation to delete_by_id in the message header
and the node identifier into message body which in our case represents also the uniqueKey in the solr schema. Followed by a solr commit.

   .setHeader(SolrConstants.OPERATION, constant(SolrConstants.OPERATION_DELETE_BY_ID))
   .setBody(script("beanshell", "request.getBody().getIdentifier()"))

   .log("Deleting node with id: ${body}")

   .setHeader("SolrOperation", constant("COMMIT"))

Indexing part consist of two routes, where the nodeRetriever route is actually getting the node from content repository using its identifier from the update event:

   .setHeader(JcrConstants.JCR_OPERATION, constant(JcrConstants.JCR_GET_BY_ID))
   .setBody(script("beanshell", "request.getBody().getIdentifier()"))

   .log("Reading node with id: ${body}")

After the node is retrieved from the repository using content enricher EIP, there is also a processor to extract node properties and set them into Camel message properties so that they get indexed as solr document fields.

   .enrich("direct:nodeRetriever", nodeEnricher)

   .log("Indexing node with id: ${body}")
   .setHeader("SolrOperation", constant("INSERT"))

You can find the complete working example on github. In case your CMS is not a JCR, but CMIS compliant, have a look at this cmis component on my github account.

Published at DZone with permission of Bilgin Ibryam, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)