Performance Zone is brought to you in partnership with:

Rob is currently the director of architecture at Amkor Technology, a global leader in providing assembly and test services to semiconductor companies and electronics OEMs. Rob also the author of the O'Reilly books Programming ColdFusion MX and Programming ColdFusion and have written dozens of technology articles over the years. He's a frequent speaker at ColdFusion user groups and conferences. Rob has posted 1 posts at DZone. View Full User Profile

Building High Performance Applications with ColdFusion 9 and Ehcache 2.4

04.25.2011
| 20719 views |
  • submit to reddit
Building High Performance Applications with ColdFusion 9 and Ehcache 2.4 Take a look around at any high performance website such as Facebook or Twitter, and regardless of the operating system, application server, programming language or database it’s built on there’s one thing you can be fairly certain of – the extensive use of caching.  Regardless of how fast your sharded and clustered MySQL database is or how sweet your NoSQL backend, caching can provide a performance boost, often one or more orders of magnitude faster than you could ever achieve without it.

Implementing caching in Adobe® ColdFusion® 9 is simple and easy.  When Adobe released ColdFusion 9.0, they included a stand-alone version of Terracotta’s Ehcache (version 2.0 to be specific).  If you’re not familiar with Ehcache, it’s widely considered the de-facto caching solution for Java applications.  It comes in both open source and commercial versions with varying levels of capability and support.

In-Process vs. Out-of-Process Cache

ColdFusion 9 ships with the core open source version of Ehcache embedded.  By default the cache is configured as a single node and runs in the same JVM process as your ColdFusion server.  This is known as an in-process or L1 cache.  Because both ColdFusion and Ehcache share the same JVM container, they also share the same memory allocation.  In many instances, this is fine.   However, there’s a limit to how much scalability you can expect from an in-process cache. 

It’s also possible to configure Ehcache to run in distributed mode.  In distributed mode, an L2 cache called the Terracotta Server is added that runs out-of-process in its own JVM. Typically the L2 cache is installed on a separate server from your application and L1 cache.  There are two flavors of the Terracotta Server: open source and commercial.  The open source version of Terracotta is limited to one active and one passive node.  The commercial version, known as the Terracotta Server Array, lets you scale your cache out by adding additional nodes to your out-of-process L2 cache.  The result is a single virtual cache made up of multiple physical nodes.  Terracotta server arrays are easily capable of scaling to multiple terabytes in size. 

When configured in distributed mode, Ehcache employs a tiered caching architecture where the L2 cache contains the full set of cached data while the L1 cache that’s local to your application has a tunable subset of the most frequently accessed cache items.  Cache access patterns tend to follow the Pareto principle: 80% of the requests to your cache can be served by 20% of the items in the cache.    Given this, Ehcache’s tiered architecture would suggest having your L1 cache sized such that it contains about 20% of the data in your L2 cache.  Ehcache automatically determines which data should be stored in L1 by employing a Least Recently Used (LRU) algorithm.

This is one of the big differentiators between between Ehcache with Terracotta and other purely distributed caching systems such as MemcacheD or the so called NoSQL databases.  In a purely distributed cache, network latency contributes to overall retrieval time in excess of 1 millisecond.  In Ehcache’s case, however, items from the L1 cache are retrieved in less than 1 microsecond while items from the L2 cache are retrieved in fewer than 2 milliseconds. That’s an overall performance difference of an order of magnitude between Ehcache and purely distributed caching systems[1].

Using the Pareto distribution, that means that 80% of the requests to your cache will come from the L1 cache and be returned in under 1 microsecond. The other 20% of requests will go to the out-of-process L2 cache and still be returned in less than 2 milliseconds.

In scenarios where you require redundancy for load balancing or high availability, Ehcache supports replicated caching across all versions of both their in-process and out-of-process cache configurations.  This lets you cluster your caches and ensures that changes on one cache node are automatically replicated to all other nodes in the cluster.  The replication mechanism is configurable to allow for popular options such as RMI, JMS, JGroups, and more.

Now that you have a better idea of how Ehcache is architected, let’s take a look at upgrading the version of Ehcache that ships with ColdFusion 9 as well as installing and configuring a Terracotta server so we can build out a high performance cache.

Upgrading ColdFusion 9 to Ehcache 2.4 and Installing the Terracotta Server

Before we go further, you’ll need to upgrade ColdFusion’s version of Ehcache from 2.0 (that ships with ColdFusion 9.0.1) to the latest version, currently 2.4.1.  While this isn’t a complicated task per se, it does involve downloading multiple sets of files from different projects, and it also comes with a caveat.  Although Terracotta (the makers of Ehcache) does their best to maintain backward compatibility, there is no guarantee that replacing the version of Ehcache that ships with ColdFusion with the latest and greatest version is going to work.  There shouldn’t be any problems, but if there are, Adobe will not provide support as they only certify the version of Ehcahe that they ship with ColdFusion.  That said, I’ve upgraded my version of Ehcache several times now and have encountered no issues along the way.

Since we’re going to be replacing Ehcache related files that ship with ColdFusion 9 with updated versions, the first thing we need to do is make backups of the original files.   Rather than delete the files or move them somewhere else, I like to rename them in place in case I need to revert back to them later. First, stop your ColdFusion server service.  Next, locate your /lib directory. The location will vary depending on whether you’re running the stand-alone or multi-instance versions of ColdFusion.  For standalone, the directory is:

CFHOME/lib/

For multi-instance, it’s:

/JRun4/servers/instance/cfusion-ear/cfusion-war/WEB-INF/cfusion/bin

Here’s you’ll find four files we’re interested in:

·       ehcache.jar

·       ehcache-web.jar

·       slf4j-api-1.5.6.jar

·       slf4j-log4j12-1.5.6.jar

 

As I mentioned, I generally don’t like to delete these files just in case. What I do is simply rename them with a “.original” extension so that Ehcache.jar becomes ehcache.jar.original.  That way, there’s no confusion over what the original filename or extension originally was.

For the next set of steps, you’ll need the latest build of the Terracotta Kit.  The Terracotta Kit contains Terracotta 3.5.0, Ehcache 2.4.1 and Quartz Scheduler 2.0.0.  We’ll be focusing on Terracotta and Ehcache in this article. You can find the Terracotta Kit on the Terracotta website:

http://www.terracotta.org/dl/oss-download-catalog

There are multiple versions of the kit available. For our purposes, download terracotta-3.5.0.tar.gz.  Once you have the file on your local system, unzip and untar it.  You should see a directory structure like the one below:

Navigate to the /lib directory and extract the following files into your ColdFusion /lib directory:

·       ehcache-core-2.4.1.jar

·       ehcache-terracotta-2.4.1.jar

Next, you’ll need to extract terracotta-toolkit-1.2-runtime-3.0.0.jar from /common into your ColdFusion /lib directory.

Extract the entire /terracotta-3.5.0 directory from the archive and place it whereever you usually install programs.  In my case, on a 64-bit Windows 7 machine, I put the files in c:/program files/terracotta/terracotta-3.5.0.  This way, if I want to work with multiple versions of the Terracotta server, I have a single location where I can keep them all organized.  In a production environment you typically wouldn’t run the Terracotta on the same physical server as your application server.  Part of the draw to distributed caching is that you can distribute your cache to other physical resources on your network as necessary to achieve horizontal scale.  For the purposes of development, however, running the Terracotta server on the same machine as your application server will due fine.

You’ll also want to upgrade the ehcache-web.jar file in your ColdFusion /lib directory. This file needs to be downloaded separately and can be found in the ehcache-web-2.0.3-distribution.tar.gz package on the Terracotta website: http://www.terracotta.org/dl/ehcache-oss-download-catalog.  The file you want to extract is ehcache-web-2.0.3.jar.

Finally, you’ll need to download updates to SLF4J, a java logging facade that’s used by Ehcache and the Terracotta server. You can download the necessary JAR file from their website.  Note that version 1.6.1 is what you want for Terracotta 3.5.0/Ehcache 2.4.1:

http://www.slf4j.org/download.html

Once you have the SLF4J jar file, extract the following files and place them in your /lib directory:

·       slf4j-api-1.6.1.jar

·       slf4j-jdk14-1.6.1.jar

·       slf4j-log4j12-1.6.1.jar

That’s it, the upgrade is complete. Go ahead and restart your ColdFusion server.

Caching Basics

ColdFusion implements two types of caching with Ehcache: template and object. Template caching lets you cache page fragments or entire web pages. The template cache is fairly well black boxed. ColdFusion manages all of the cache keys automatically and also handles all of the storage and retrieval of cached items.  Here’s a quick example of how easy it is to use the template cache to store a page fragment:

<cfoutput>

I'm dynamic data #now()# <br/>

</cfoutput>

 

<!--- cache this fragment for 5 seconds regardless of how many times it's accessed --->

<cfcache timespan="#createTimeSpan(0,0,0,5)#">

      <cfoutput>

      I'm cached dynamic data: #now()# <br/>

      </cfoutput>

</cfcache>

 

<!--- cache this item, then flush it if it's not accessed for 5 seconds --->

<cfcache idletime="#createTimeSpan(0,0,0,5)#">

      <cfoutput>

      I'm cached dynamic data too: #now()# <br/>

      </cfoutput>

</cfcache>

If you run this code, you’ll see that on the initial load all three timestamps are identical.  Now start hitting your browser’s reload button.  As you would expect, the timestamp for the uncached code will update with each load.  The other two timestamps, however, won’t initially change as the values will be returned from the cache.  Keep clicking reload.  After 5 seconds you should see the value of the second timestamp change as well:

This is because we set a timespan for the fragment to 5 seconds, meaning that ColdFusion should use the cached value for 5 seconds before updating with the live data.  If you stop reloading the page and wait another 5 seconds or so before hitting reload again, you should see that all of the timestamps including the third one should have changed.  This is because we set idletime for the third timestamp to 5 seconds.  Idletime lets you specify how long to keep an item in cache if no one is requesting it before it should be evicted.  In this case, we set idletime to 5 seconds so that not accessing the value from the cache for more than 5 seconds causes it to be evicted and the live value to be displayed and subsequently cached on the next request.

If you want to cache an entire page instead of individual fragments, the code is similar. However, instead of wrapping the entire page in <cfcache>… </cfcache> tags, you only need to place a single <cfcache> tag at the top of the page:

<cfcache timespan="#createTimeSpan(0,0,0,5)#">

 

<cfoutput>

Currently #timeFormat(now(),'hh:mm:ss')# <br />

</cfoutput>

 

<cfoutput>

Random number: #rand()#

</cfoutput>      

Running this code caches the entire page for 5 seconds.  As you can see, there are two different sections of dynamic values that are being output and cached.

If you want more control over caching in ColdFusion, consider using the object cache.  The object cache gives you granular control over putting items in and getting items out of the cache.  It also lets you store more than just pages and fragments.  Using the object cache, you can put any type of data that ColdFusion supports into the cache, including complex data types such as structures, arrays, queries and CFCs. Here’s a simple example that caches a query object from the sample database that installs with ColdFusion for 2 minutes:

<!--- Go to the cache. If the data isn’t there, go to the db then

      repopulate the cache --->

 

<cfset getArtists = cacheGet("artistQuery")>

 

<cfif isNull(getArtists)>

      <cfquery name="getArtists" datasource="cfartgallery">

            SELECT *

            from artists

      </cfquery>

     

      <cfset cachePut("artistQuery", getArtists, createTimeSpan(0,0,2,0))>

</cfif>

 

<h3>Query:</h3>

<cfdump var="#getArtists#">

In this code, the first thing we’re do is attempt to get the query from the cache using the cacheGet() function.  The cacheGet() function takes two arguments, the cached value’s key and an optional cache name.  Here, we’re just passing in “artistQuery” as the key.  The next bit of code checks to see if the cacheGet() returns Null.  If it does, we know that the query doesn’t exist in the cache so we immediately go back to the source system (in this case the cfartgallery database) and re-run the query.  After the results come back, we put them back in the cache using the cachePut() function, setting a timeout for the cached query of 2 minutes.  Finally, we return the query and dump it to the browser.

Scaling out with the Terracotta Server

So far, all of the examples we’ve walked through used the L1 in-process cache that’s part of Ehcache core.  Let’s up the ante a bit by adding a L2 out-of-process cache. 

You’ll need to open the XML file used by Ehcache for configuration.  This file is called ehcache.xml and it’s located in the same place as the Jar files we renamed earlier: CFHOME/lib/ for stand-alone ColdFusion or /JRun4/servers/instance/cfusion-ear/cfusion-war/WEB-INF/cfusion/bin for multi-instance.  If you scroll to the bottom of the file, you’ll see XML that looks like this:

<defaultCache

      maxElementsInMemory="10000"

      eternal="false"

      timeToIdleSeconds="86400"

      timeToLiveSeconds="86400"

      overflowToDisk="false"

      diskSpoolBufferSizeMB="30"

      maxElementsOnDisk="10000000"

      diskPersistent="false"

      diskExpiryThreadIntervalSeconds="3600"

      memoryStoreEvictionPolicy="LRU"

      clearOnFlush="true"

> 

</defaultCache>

This is the configuration ColdFusion and Ehcache use for every cache that gets created unless you specify otherwise.  It’s possible to create your own custom caches either programmatically at runtime, or by hard-coding them in the ehcache.xml file.  For the purposes of setting up our Terracotta server, let’s go ahead and add a custom cache to the ehcache.xml file.  Go ahead and stop your ColdFusion server.  Place the following code above the default configuration and save the file.

<cache

      name="terracottaTest"

      maxElementsInMemory="10000"

      eternal="false"

      timeToIdleSeconds="86400"

      timeToLiveSeconds="86400"

      overflowToDisk="false"

      diskSpoolBufferSizeMB="30"

      maxElementsOnDisk="10000000"

      diskPersistent="false"

      diskExpiryThreadIntervalSeconds="3600"

      memoryStoreEvictionPolicy="LRU"

      clearOnFlush="true"

      >

</cache>

What you’ve just done here is created a custom cache called terracottaTest. Right now, the cache is still configured to run in-process.  Go ahead and modify the code like this:

<terracottaConfig url="localhost:9510" />

 

<cache

      name="terracottaTest"

      maxElementsInMemory="10000"

      eternal="false"

      timeToIdleSeconds="86400"

      timeToLiveSeconds="86400"

      overflowToDisk="false"

      diskSpoolBufferSizeMB="30"

      maxElementsOnDisk="10000000"

      diskPersistent="false"

      diskExpiryThreadIntervalSeconds="3600"

      memoryStoreEvictionPolicy="LRU"

      clearOnFlush="true"

      >

      <terracotta clustered="true" />

</cache>

Your custom cache is now configured it for distributed caching with an L2 out-of-process Terracotta server and all it took was 2 additional lines of code! Before you start your ColdFusion server back up, we need to bring up the Terracotta server you installed earlier.  Open a command prompt and navigate to the Terracotta server’s /bin directory.  On my Windows 7 machine, that’s in:

C:/program files/terracotta/terracotta-3.5.0/bin

Go ahead and run start-tc-server.bat if you’re on Windows, or start-tc-server.sh if you’re on Linux/Unix.  If your server starts successfully, you should see a screen similar to this:

Once the Terracotta server is up and running and you see the “ready for work” message, go ahead and restart your ColdFusion server.  It’s important to note that if you have ColdFusion configured for distributed caching and you don’t have the Terracotta server running, it will hang your ColdFusion server.  Hopefully this is something Adobe will address in a future ColdFusion update.

Ehcache comes with a nice developer console you can use as you develop to gain insight into what’s happening with both your L1 and L2 caches.  To run the console, navigate to the same directory where you launched the Terracotta server and run the /dev-console.bat on Windows or /dev-console.sh on UNIX/Linux.  You’ll get a login screen like this:

Click Connect and you should see both your L1 local client node as well as the L2 Terracotta server you configured (Server Array).  If you expand out the Terracotta Cluster tree item, you’ll see a screen that should show that your TerracottaTest cache is configured in distributed mode:

Let’s run some sample code so you can watch as data is put in and then retrieved from the cache:

<!--- fill the cache with 10,000 items --->

<cfloop from="1" to="10000" index="i">

      <cfset cachePut(i, 'Item #i#', '#createTimeSpan(0,0,5,0)#', '#createTimeSpan(0,0,5,0)#', 'terracottaTest')>

</cfloop>

 

<!--- get from the cache 10 million times --->

<cfloop from="1" to="10000000" index="j">

      <cfset x = cacheGet(randRange(1,10000), 'terracottaTest')>

</cfloop>

This will put 10,000 items in the cache, then start randomly pulling items out 10 million times.  This should allow enough time for you to see what’s happening in the dev console:

Conclusion

This article really only begins to scratch the surface of what you can accomplish with ColdFusion and Ehcache.  There are several other features in Ehcache 2.4 that are beyond the scope of this article but worth exploring:

·       2nd level cache provider for ColdFusion’s Hibernate ORM implementation

·       Cache search

·       Distributed transactions

·       NonStopCache

·       Integration with Terracotta Big Memory

For more information on Ehcache and Terracotta’s other products, see their website at http://www.terracotta.org.

About the Author

Rob Brooks-Bilson is a consultant, author, and the director of architecture at Amkor Technology, a global leader in providing assembly and test services to semiconductor companies and electronics OEMs.  His responsibilities at Amkor include development of strategic technology direction, planning of effective resource utilization, coordinating and directing technical development teams, and more.   He’s a frequent speaker at industry conferences as well as local user groups. Brooks-Bilson is also the author of two O'Reilly books, "Programming ColdFusion" and "Programming ColdFusion MX.".

Outside of work, Rob's a technophile, blogger, photographer, bed jumper, world traveler, hiker, mountain biker, and Adobe Community Professional for ColdFusion.


[1] See Greg Luck’s discussion of the topic on his blog

Published at DZone with permission of its author, Rob Brooks-bilson.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Shoaib Almas replied on Sat, 2012/08/25 - 5:46am

About the following paragraph:
"-In conjunction with an in-house or third-party analytics engine, very fast lookup of analytics results.
e.g. A credit card company needs to score real-time credit card transactions. There are hundreds of millions per day. Results of an in-house fraud model with transactions up to the close of business the previous day are loaded into the cache. The cache is further adjusted during the current business day for actual usage and can return fraud scores on billions of credit card numbers in a fraction of a second."

Can you please provide further references on this topic (articles, papers, presentations)? I am working on a similar problem and I am very interested in the details of this solution, especially the performance aspects (time and size).

Java Forum

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.