NoSQL Zone is brought to you in partnership with:

Stefan is well-versed in Neo4j, Groovy, Grails and he's also a volunteer firefighter. Stefan is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

JVM Toolkit "Ratpack" and Neo4j

09.30.2013
| 5953 views |
  • submit to reddit

Back in May this year I attended the Gr8conf in Copenhagen. As always, this conference added couple of things to my personal “take-a-look-at-this” list. The most exciting thing for me was Ratpack, a lean toolkit for building web applications on the JVM. Ratpack is powered by Netty and provides an event-driven network engine as opposed to classic servlet-based containers like Tomcat or Jetty, which bind threads to requests. In high-load scenarios with a huge number of concurrent requests, the thread based model suffers from thread blocking wheres Ratpack is almost non-blocking. To get familiar with Ratpack I’ve decided to implement a server component for Neo4j based on Ratpack. My first goal was to have a Cypher endpoint, just like the standard Neo4j offers. Secondary goals were some more features:

  • Support for multiple output formats: JSON, HTML, CSV, message pack
  • The ability to get a list of currently running queries and a button to abort each one individually. This is IMHO a feature lacking in the classic Neo4j server. People getting started with Cypher tend to write queries that run very long, and there is now a way to abort them.

For the future I’d like to add some more features:

  • Transactional Cypher endpoint
  • TBD (if you have ideas, please send a comment)

The goal is not to create a full fledged alternative to the existing Neo4j server. This project should focus on maximum throughput and ease-of-use for a Cypher-only server component. To get started, I’ve cloned https://github.com/ratpack/example-ratpack-gradle-groovy-app. You’ll find my code at https://github.com/sarmbruster/neo4j-ratpack.

Handling Requests

In Ratpack you either write inline handlers in src/ratpack/ratpack.groovy or, for more complex cases, write a handler class derived from AbstractHandler and register that in ratpack.groovy.

Ratpack features Google Guice as well, so we can register, for example, a GraphDatabaseService as injectable component. See Neo4jModule.  We’re exposing and configuring a GraphDatabaseService, a Cypher ExecutionEngine, a guard (see below), and a QueryRegistry. We can refer to other components using the @Inject constructor annotation.

The core piece of code is CypherHandler. It parses the Cypher command and parameters out of the request, runs it, and renders the result depending on the requested content type.

Terminate Queries

From a tech perspective, this was the most interesting part to write. Neo4j can be run with a optional guard. Since this feature is not part of the public API, it is not officially documented and might therefore be changed without further notice – be warned. To enable the guard feature, a config option execution_guard_enabled needs to be set to true. However, you can get access to the guard by calling 

((GraphDatabaseAPI)graphDb).dependencyResolver.resolveDependency(Guard.class)

In neo4j-ratpack, the guard is exposed as a Guice component, so any ratpack handler can just inject it.

Each query is registered with a QueryRegistry. Part of that process is setting up a VetoGuard that throws an exception based on a boolean flag. In case of an exception, the query is aborted.

Load Tests

The next step was running some load tests to a standard Neo4j server and neo4j-ratpack in order to compare the performance of the server components. All tests were run on my ThinkPad x230 (i7-3520M, 2.9GHz, 16 GB RAM, Ubuntu 13.04). For simplicity, load generation and the server itself were running on the same machine – which is by far not perfect, but a starting point.

The intention of these load tests is not measuring Neo4j itself – it focuses on the server component only.

Using JMeter. I’ve run a Cypher query...

START person=node:person(firstName={firstName}) 
WITH person 
ORDER BY person.lastName LIMIT 10 
MATCH (uniCity)<-[:IS_LOCATED_IN]-(uni)<-[studyAt:STUDY_AT]-(person), 
    (company)<-[worksAt:WORKS_AT]-(person)-[:IS_LOCATED_IN]->(personCity), 
    (company)-[:IS_LOCATED_IN]->(companyCountry) 
RETURN person.firstName, person.lastName, person.birthday, person.creationDate, person.gender, person.browserUsed, person.locationIP, personCity.name, uni.name, studyAt.classYear, uniCity.name, company.name, worksAt.workFrom,companyCountry.name

...with different parameters against a graph DB consisting of 1.6M nodes, 7M relationships, and 7M properties. Kudos to my colleague Alex who helped me setting up the dataset based from the LDBC project he’s involved with.

The same graph.db was used by both Neo4j server and neo4j-ratpack. No specific JVM tuning parameters were set. I’ve run the load test with an increasing number of concurrent threads and focused on observing throughput and latency. The following diagrams were created using a Python matplot script orginating from http://www.metaltoad.com/blog/plotting-your-load-test-jmeter. Please note, the latency is displayed in green on a logarithmic axis.  Throughput is in blue on linear axis (ranges are different for the diagrams).

neo4jserver_jdk7

.
ratpack_jdk7

We’re observing an increasing rate of errors when going beyond 25k threads. Since the load generator is colocated with the system to test, this seems to be point where JMeter’s own memory and CPU consumption influences the system under test too much – so we’ll disregard the range above 25k.

The most interesting finding is that with Ratpack, the latency remains nearly constant in the range of [2.5k - 10k] threads, whereas the standard Neo4j server shows increasing latency.

At 2.5k threads Ratpack shows fully saturated CPU that’s why throughput decreases. With more or faster CPU we could improve both, latency and throughput. The explanation for the difference observed can be found in the different threading model. Neo4j server uses Jetty internally, which does blocking I/O, while Ratpack uses Netty. To verify this, I’ve taken thread dumps with Yourkit:

threading telemetry of neo4j server

threading telemetry of neo4j server

threading telemetry of neo4j-ratpack

Threading telemetry of neo4j-ratpack

It’s interesting to see that Neo4j server uses 10 worker threads per core (40 in total on my laptop). Most of the time, most of them are in blocked status indicated by the red color. Ratpack, on the other hand, has 8 worker threads being mostly in ‘green’ aka runnable status. So Ratpack indeed uses non blocking IO.

Conclusion

For Cypher-only use cases with high concurrency requirements, using Ratpack instead of Neo4j server might be an interesting alternative. However be aware, Ratpack is bleeding edge, the current version is 0.9-SNAPSHOT.




Published at DZone with permission of Stefan Armbruster, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Robert Greathouse replied on Mon, 2013/09/30 - 1:29pm

I tried using Ratpack in a personal project. You are correct that it is bleeding edge. Virtually every update broke backwards compatibility. 

However, will all that said, it is a nice framework. I am very excited for the churn to slow so that I can confidently use it in my applications.

Stefan Armbruster replied on Mon, 2013/09/30 - 1:42pm in response to: Robert Greathouse

A version number of 0.9-SNAPSHOT sets the expectation to shoot on a moving target. +1 on your comment.

Ashwin Jayaprakash replied on Tue, 2013/10/01 - 9:53pm

 Another framework based on Netty? I wonder how this compares with Vertx.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.