Nick Maiorano is an independent consultant with 20 years of experience as a software developer and architect. He specializes in building high performance server-side applications in Java as well as providing technical leadership. He has worked in the telecom, financial and aerospace industries and also co-founded a web-based start-up. Nick is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. View Full User Profile

Boxers or Briefs, Stateful or Stateless

08.11.2008
| 9757 views |
  • submit to reddit

In this article, I'll compare two opposing design paradigms for session data management in a session-based online application. Which is better, stateful or stateless?

Epic clashes - in changing rooms near you!

Human history is marked with epic clashes of ideologies: Darwinism vs. Creationism, Capitalism vs. Socialism and Boxers vs. Briefs. Some of these ideas have played out over centuries on the global stage. Sometimes a clear winner emerges but most times not.

Our own software engineering world is not immune to ideological clashes. One such example is the design of server-side applications: should they be designed as stateful or stateless? It pits those who believe in the performance of statefulness vs. those whose believe in the simplicity of statelessness. It is divisive because the choice is not always clear. When an application is being designed, software engineers may have limited requirements in hand, short-term vision or lack experience to make the right decision. In addition, the technological landscape is always shifting. As new technologies emerge, assumptions are invalidated and rules rewritten.

Demanding online applications

Online bookstores, instant messaging services and online travel agencies are examples of session-based online applications. The more popular brands have very demanding requirements. Their throughput requirements are measured in the thousands of transactions per second, their response time latency clocked in milliseconds, and their availability north of “five-nines”. (That's 99.999% uptime or 86 seconds of downtime a day.) Being session-based means that a user establishes a session, perhaps via a sign-in, and that a conversation exists between client and server. Consequently, each request is implicitly bound to a session context typically stored on the server-side. Deciding how the application manages this data is the focus of this article.

Managing session data

In some ways, session-based applications face tougher engineering challenges than their sessionless counterparts. This is because session data must be managed. The larger and more dynamic the data, the tougher the challenge. Sessions can include data describing the end-user (user name, address), dialog data (current request, time of request, http headers, browser capabilities), and the shopping cart data representing merchandise being purchased. Since high-traffic sites require the application to run on a cluster of back-end servers, the data must be accessible regardless of which server processes the request. There are essentially two ways to solve this problem: a) bring the data to the thread or b) bring the thread to the data.

The first option is really all about externalizing the data and making it accessible from any server. Externalization renders the application stateless because servers do not own nor encapsulate the session data. The application is still session-based yet the components are without end-user state.

The second option keeps session data internal and owned by a given server. It internalizes the data to that server. This has the effect of making the application stateful because components within the application encapsulate behaviour and data.

If we need to design a session-based online application, with high throughput requirements and a large session data footprint, which paradigm serves us best? The ultimate choice will have a profound effect on the design, performance, scalability, operability and reliability of the system. Let’s explore the effects of stateness against these five pillars.

Statelessly simple

At first glance, designing applications statelessly is simpler. A cluster of homogeneous servers hosting the application can be deployed and requests can be load-balanced round-robinly to any server in the cluster. This is standard fare for HTTP-based load balancers. Once a request lands on a server, the application can access the session data remotely. Simplistically, this data can be accessed and stored via a shared database. After each request, the modified data is stored back onto the database.

An accidental by-product is that the session is always recoverable should a server crash. As an end-user, it’s nice to know that my flight and hotel are still booked if a server crashes before I paid for my exotic vacation. In fact, a load-balancer can seamlessly re-direct my request to another server and maybe, I won't even notice the glitch. Thus session data recovery is built-in. Additionally, it can greatly simplify the life of a system administrator who can shut down any server without impacting the end-user. Stateless applications are therefore easier to host and get by with off-the-shelf infrastructure.

Statefully fast

Stateful applications, on the other hand, are inherently more complex. This is the price to pay for higher performance. Statefulness goes hand in hand with caching, which, in turn, is the gateway to high performance. There are many forms of caching but they can be summarized as either being caches internal to the application, stored in a hash map for example, or caches external to the JVM, such as distributed and remote caches. With respect to statefulness, only the first category is applicable since the latter is just another type of data externalization and therefore more closely associated with statelessness from an application point of view. Stateful applications can take advantage of the fact that all session data is readily available for consumption in its native object form. Data does not need to be externalized before and after every request and this saves considerable CPU cycles. There is no need to design complex object-to-relational data mapping. Regardless of the amount of tools available in this space, there is a price to pay in both design effort and performance.

Since nothing known to software engineering is faster than memory access, statefulness naturally translates into higher performance and higher degree of scalability for the entire application. Stateful applications thus circumvent design issues related to externalization as well as benefit from higher performance.

Statelessly complicated

On the other hand, statelessness has an alluring air of simplicity that can be deceiving. If you take the naïve approach and externalize session data after every request, you will simply shift all of the pain onto the external store. If, for example, a database is used as an external store, it will need to be tuned and configured for redundancy and performance. Individual queries will also require tuning, as performance will not come standard out-of-the-box. 

Externalization also has its challenges in regards to how data is transformed. There is considerable application-wide complexity to manage with object-to-relational mapping. Choosing to side-step the object-to-relational mapping issue with Java serialization is also fraught with risk. Java serialization hides externalization details from the developer. However, it hides so much, is so easy and automatic that serialization bugs creep up at runtime. All it takes is one attribute in a complex objects structure to be unserializable and the entire externalization fails.  

There are products that can help with externalization (Times Ten in-memory database, Coherence distributed caching and Terracotta’s NAM come to mind). While these are examples of products that can tip balance in favour of statelessness, effort will ultimately be dispensed to get the right behaviour and performance. Buying a ready-made product that meets our stringent requirements will be expensive. Taking the simplistic approach and externalizing the session data can ironically spread complexity across the entire application code base. In the end, what seemed simple was indeed not. There is no free lunch.

Statefully slow

Statefulness is also not a silver bullet for performance. The pain point will become the large footprint of cached session data. For Java applications in particular, large heaps containing caches of long-lived session data objects along with short-lived working data objects, can punish the garbage collector. In high throughput systems, this deadly combination of object behaviour will strain even the best built JVMs.

In addition, stateful applications put reliability at risk in two ways. First, since data is cached, there is an inherent risk of bugs that can cause data to linger in memory forever. Forgetting to clear data in memory will have far more serious consequences for a stateful application. Secondly, session recovery must be designed explicitly. Since memory is volatile, there are some critical pieces of data that will need to be externalized and stored remotely.

Statefulness also adds more infrastructure complexity because sessions become sticky to servers. The clustering and routing infrastructure will need to take into account the locality of the session and its server.

In the end, it is unrealistic to completely avoid externalization. There's only so much session data that can fit in memory. Some data will need to be stored remotely.

Conclusion

Interestingly, statelessness started out as the simpler alternative but taken to an extreme, became very complicated. Statefulness started out being the faster alternative but also, taken to an extreme, became inefficient. This shows the value in mixing both paradigms and making the right choices and tradeoffs.

In the real world, only toy applications can be purely stateful or purely stateless. Real applications will fall somewhere along the stateness continuum rather than at the edges. Choices are made by analyzing how critical the session data is, how often it changes, how big the footprint, how complex the session objects, how high the throughput, how low the latency, how capable the software engineers are in managing complexity and how cheap the overall deployment cost must be.

Even with due diligence, the outcome won’t be perfect. Just like religion, you pick one and eat the meat that comes with it (or lack thereof).

For other articles like this, please visit www.deepheap.blogspot.com

Published at DZone with permission of Nick Maiorano, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jose Maria Arranz replied on Tue, 2008/08/12 - 7:19am

Nice article but I find the some concepts a bit confusing.

You suggest stateless = database and I think this is partially true, any session based application use the database frecuently too. I think most of people understand stateless and stateful thinking in the server:

stateless = temporal data in the browser

stateful = temporal data in the server session

Anyway your reflections about saving temporal data in a database or in a distributed and shared cache vs non-replicated sessions are valid.

Another point of view of the stateless vs stateful debate is the security, the stateless version is more insecure.

And finally: briefs of course!

 

phil swenson replied on Tue, 2008/08/12 - 9:09am

The stateless approach in RoR is very simple.... store to a DB or to MemCached.  All the framework crap has been done for you.  What are the stateless approaches in the Java world?  I've always maintained state on a particular server... which sucks in that you have to use sticky load balancing.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.