Manik Surtani is a core R&D engineer at JBoss and project lead on JBoss Cache. He has a background in artificial intelligence and neural networks, a field he left behind when he moved from academic circles to the commercial world. Since then, he's been working with Java-related technologies, first for a startup, focusing on knowledge management and information exchange. He later worked for a large London-based consultancy as a tech lead focused on e-commerce applications on large J2EE and peer-to-peer technology. Manik is a strong proponent of open source development methodologies, ethos, and collaborative processes, and often speaks at Java User Groups around the world. Manik is a DZone MVB and is not an employee of DZone and has posted 38 posts at DZone. You can read more from them at their website. View Full User Profile

A Look Inside JBoss Cache

11.04.2008
| 24974 views |
  • submit to reddit

So I was asked a number of times as to why I didn’t put in any shameless plugs for JBoss Cache - the project I lead - when I wrote my last article at DZone on distributed caching and parallelism, and here is why.

This follow-up article focuses entirely on the brand-spanking-new JBoss Cache 3.0.0 - code-named Naga, and goes in depth to discuss how our high-performance concurrency model works.

Life before Naga

Before I start talking about Naga specifically, I’d like to delve into a brief history of JBoss Cache. Around 2002, the JBoss Application Server (JBoss AS) needed a clustered cache solution (see my previous article for definition of terms) for its HTTP and EJB session state replication, to maintain high availability in a cluster. JGroups - an open source group communication suite - shipped with a demo replicated hash map. Bela Ban, the founder and current maintainer of JGroups, expanded it to accommodate a tree structure - where data is organized into nodes into a tree data structure - and other cache-relevant features such as eviction and JTA transactions. Around early 2003, this was moved into JBoss AS’s CVS repository and became a part of the JBoss AS code base.

Around March 2005, JBoss Cache was extracted from the JBoss AS’s repository and became it’s own standalone project. The rest, as they say, became history. Features such as cache loading and various cache loader implementations, eviction policies, and buddy replication were gradually added. TCP-based delegating cache servers allowed you to build tiers of caches. Custom marshalling frameworks provided a highly performant alternative to Java serialization when replicating state. Along the way, the cache had through one more major release - 2.0 - which involved a major API change and baselining on Java 5. Two other editions - a POJO edition and Searchable edition - have evolved as well, building on the core cache to provide specific features.

Handing over the torch

Now, it is time for the 2.x series to hand over the torch to 3.0.0. Naga, as it is called internally, is well deserving of a major version change. In addition to evolutionary changes and improvements in the code base - better resource management, marshalling, overall performance enhancements, and a brand new and much simplified configuration file format - it also contains at least one revolutionary change:

MVCC has landed

Multi-versioned concurrency control - MVCC - was adopted as the default concurrency scheme in the cache.

When run in local mode, the most costly part of the cache in terms of memory and CPU cycles is the locking involved in maintaining integrity of shared data. In a clustered mode, this locking is the second most expensive thing, after RPC calls made by the cache instance to remote instances.

Legacy locking schemes

In JBoss Cache 1.x and 2.x, we offered two different locking schemes - an optimistic one and a pessimistic one. Each had their pros and cons, but in the end they were both still costly in terms of performance.

The pessimistic scheme used a lock per tree node. Reader threads would obtain non-exclusive read locks and writer threads would obtain exclusive write locks on these nodes before performing any operations. The locks we used were a custom extension of JDK ReentrantReadWriteLocks, modified to support lock upgrading where within the scope of a transaction, a thread may start off reading a node and then later attempt to write to it.

Overall, this scheme was simple and robust, but didn’t perform too well due to the memory overhead of maintaining a lock per node. More importantly, there was the reduced concurrency since the existence of read locks prevented write locks from being obtained. The effect of readers blocking writers also introduced the possibility of deadlocks. Take, for example, transaction A which performs a read on node /X and a write on node /Y before committing. Transaction B, which performs a read on node /Y and a write on node /X before committing, starts at the same time. With some unfortunate timing, we could end up with a situation where transaction A has a read lock on /X and and is waiting for the write lock on /Y. Transaction B has a read lock on /Y and is waiting on a write lock on /X. Both transactions would deadlock, until one of them times out and rolls back. And typically, once one transaction has timed out, chances are so would the other since they would both have been waiting for almost the same amount of time.

To overcome the deadlock potential, we offered an optimistic locking scheme. Optimistic locking used data versioning on each node. It copied any nodes accessed into a transaction workspace, and allowed transactions to work off the copy. Nodes copied for reading provided repeatable read semantics while nodes copied for writing allowed writer threads to proceed regardless of simultaneous readers. Modified nodes were then merged back to the main tree at transaction commit time, subject to version checking to ensure no concurrent writes took place.

Optimistic locking offered a much higher degree of concurrency with concurrent readers and writers, and removed the risk of deadlock. But it had two main drawbacks. One is performance, since the constant copying of state for each concurrent thread increased memory footprint significantly, and was also costly in terms of CPU cycles. The other is that concurrent writers could exist, but one would inevitably fail at commit time when a data version check fails. So this meant that writer transactions could happily go ahead and do a lot of costly processing and writing, but only fail at the very end when attempting to commit.

So how does MVCC help?

MVCC offers non-blocking readers where readers do not block writer threads, providing a high degree of concurrency as well as removing the risk of deadlock. It also is fail-fast, in that writers work sequentially and don’t overlap, and if they do time out in acquiring a write lock, it happens very early on in the transaction, when a write occurs rather than when a transaction commits. Finally, MVCC is also memory efficient in that it only maintains 1 copy of state for all readers, and 1 version being modified for the single, sequential writer. Even better, our implementation of MVCC uses no locks at all for reader threads (very significant for a read-heavy system like a cache), and a custom exclusive lock implementation for writers. This custom lock is completely free of synchronized blocks and uses modern techniques like compare-and-swap and memory fencing using volatile variables to achieve synchronization. All this leads to a very highly performant and scalable locking scheme.

Let’s look at some details here.

The extremely high performance of JBoss Cache's MVCC implementation for reading threads is achieved by not requiring any synchronization or locking for readers. For each reader thread, the cache wraps state in a lightweight container object, which is placed in a container local to the thread (a ThreadLocal) or an ongoing transaction. All subsequent operations made on the cache with regards to this state happens via this container object. This use of Java references allows for repeatable read semantics even if the actual state changes concurrently.

Writer threads, on the other hand, need to acquire a lock before any writing can commence. We now use lock striping to improve the memory performance of the cache, and the size of the shared lock pool can be tuned using the concurrencyLevel attribute of the locking element. (See the JBoss Cache configuration reference for details ). After acquiring an exclusive lock on a node, the writer thread then wraps the state to be modified in a container as well, just like with reader threads, and then copies this state for writing. When copying, a reference to the original version is still maintained in the container for rollbacks. Changes are then made to the copy and the copy is finally written to the data structure when the write completes.

This way, subsequent readers see the new version while existing readers still hold a reference to the original version in their context, achieving repeatable read semantics.

If a writer is unable to acquire the write lock after some time, a TimeoutException is thrown.

Although MVCC forces writers to obtain a write lock, a phenomenon known as write skews may occur when using repeatable read as your isolation level. This happens when concurrent transactions perform a read and then a write, based on the value that was read. Since reads involve holding on to the reference to the state in the transaction context, a subsequent write would work off that original state read, which may now be stale.

The default behavior with dealing with a write skew is to throw a DataVersioningException, when it is detected when copying state for writing. However, in most applications, a write skew may not be an issue (for example, if the state written has no relationship to the state originally read) and should be allowed. If your application does not care about write skews, you can allow them to happen by setting the writeSkewCheck configuration attribute to false. See the JBoss Cache configuration reference for details.

Note that write skews cannot happen when using READ_COMMITTED since threads always work off committed state. Write skews are also witnessed in optimistic locking, manifested as a DataVersioningException, except that this can happen with any isolation level when using optimistic locking.

Is there a tutorial on this stuff?

Of course. All you need to do is download the jbosscache-core-all.zip distribution of JBoss Cache. A tutorial is bundled in the distribution, complete with a GUI to visualize what’s going on in your cache as you do stuff. Alternately, there is also a GUI demo to demonstrate cache capabilities.

Nice - so where do I get it?

So to sum things up, Naga is the latest and greatest of what JBoss Cache has to offer, with significantly faster access for both readers and writers, much better stability and predictability in performance, faster replication, lower memory footprint, makes you coffee, and walks your dog. Download Naga here [6]. A users’ guide, FAQ and tutorial is also available here.

 

 

Published at DZone with permission of Manik Surtani, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Sundstrom replied on Tue, 2008/12/02 - 7:36pm

Manik,

Excellent post, very clearly written.

I'ved used JBoss Cache 2.x and am looking forward to 3.x, but have been waiting for PojoCache to be ported over. (update: I see that is complete, great!)

I'll dig around on my own, but I'd love to learn more about lock striping, memory fencing, and compare and swap techniques that you've used.

One question I have is how 3.x manages distributed locking. Some discussion of the protocols and algorithms involved would help me make informed decisions on data partitioning and cache topologies.

Thanks!

-David

 

Manik Surtani replied on Tue, 2009/01/06 - 4:04am

Hi David

We do have some design documentation on the JBoss Cache wiki (including the MVCC locking designs), and while some components are out of date or not as well documented, you should still have a look at the wiki.  We do plan to bring it up to date soon, though, but at the end of the day the source code (along with the comments there) will always be your best documentation.

Cheers

Manik

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.