Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE. Founder and creator of http://computoser.com and http://welshare.com . Worked on Ericsson projects, Bulgarian e-government projects and large scale recruitment platforms. Member of the jury of the International Olympiad in Linguistics and the Program committee of the North American Computational Linguistics Olympiad. Bozhidar is a DZone MVB and is not an employee of DZone and has posted 90 posts at DZone. You can read more from them at their website. View Full User Profile

State Does Not Belong In The Code

02.24.2012
| 4697 views |
  • submit to reddit

What is “state” in your web application? It’s the data that gets stored (regardless of the destination – memory, database, file-system). The application itself must not store any state in the code. This means your classes should only have fields with objects that are also stateless. In other words – you should not store anything in your services, DAOs or controllers during the program flow. This is a complete “must” for your service layer. Why?

Your application needs to be scalable. This means it needs to be run in a cluster, and state is the hardest thing to distribute. If you minimize the places where state is stored, you minimize the complexity of clustering. But state should exist, and here is where it is fine to have it:

  • the database – be it SQL, NoSQL or even a search engine, it’s the main thing that stores state. It is the thing that is supposed to support clustering, or a huge dedicated machine that handles requests from multiple other “code” servers. The code communicates with the database, but the code itself does not store anything for more than one client request;
  • cache – caching is relatively easy to distribute (it’s basically key-value). There are many ready-to-use solutions like EhCache and memcached. So instead of computing a result, or getting it from the DB on each request, you can configure caching and store the result in memory. But again – code does not store anything – it just populates and queries the cache;
  • HTTP session – in web components (controllers, managed beans, whatever you call it). It is very similar to caching, though it has a different purpose – to allow identifying subsequent actions by the same user (http itself is stateless). But as your code runs on multiple machines, the load-balancer may not always send subsequent requests to the same server. So the session should also be replicated across all servers. Fortunately, most containers have that option built-in, so you just add one configuration line. Alternatively you can instruct the load-balancer to use a “sticky session” (identify which server to send the request depending on the session cookie), but it moves some state management to the load-balancer as well. Regardless of the option you choose, do not put too much data in the session
  • the file system – when you store files, you need them to be accessible to all machines. There are multiple options here, including SAN or using a cloud storage service like Amazon S3, which are accessible through an API

All these are managed outside the code. Your code simply consumes them through an API (the Session API, the cache API, JDBC, S3/file system API). If the code contained any of that state (as instance-variables of your objects) the application would be hard to support (you’d have to manage state yourself) and will be less scalable. Of course, there are some rare cases, where you can’t go without storing state in the code. Document these and make sure they do not rely on working in a cluster.

But what can go wrong if you store state in the objects that perform the business logic? You have two options then:

  • synchronize access to fields – this will kill performance, because all users that make requests will have to wait in queue for the service to manage its fields;
  • make new instance of your class for each HTTP request, and manage the instances somehow. Managing these instances is the hard part. People may be inclined to choose the session to do it, which means the session grows very large and gets harder to replicate (sharing a lot of data across multiple machines is slower, and session replication must be fast). Not to mention the unnecessarily increased memory footprint.

Here’s a trivial example of what not to do. You should pass these kinds of values as method arguments, rather than storing them in the instance:

class OrderService {
   double orderPrice;

   void processOrder(OrderDto order) {
         for (Entry entry : order.getEntries() {
              orderPrice += entry.getPrice();
         }
         boolean discounts = hasDiscounts(order);
   }
   boolean hasDiscounts(OrderDto order) {
        return order.getEntries().length > 5 && orderPrice > 200;
   }
}

So, make all your code stateless – this will ensure at least some level of scalability.

 

From http://techblog.bozho.net/?p=793

Published at DZone with permission of Bozhidar Bozhanov, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Mason Mann replied on Sat, 2012/02/25 - 6:50am

Yawn

Diptamay Sanyal replied on Sat, 2012/02/25 - 11:38pm

Some points are valid like not store state in Business classes but otherwise it pretty much against OOP and the premise that managing instances is an issue on each HTTP request and also adding huge memory overhead is a fallacy. Infact, what is being promoted is probably create Singletons which are shared across the code with no member variables. This is truly against sensible design not to mention promotes procedural style programming. 

One can make scalable code by following simple things at the minimum:

1) Do not use HTTP session. Ever. Use cookies or the right caching framework.

2) Do not use heavyweight bloated frameworks.

3)  All static assets goes to file system (SAN/S3 as suggested here). There are even more options. Use Akamai etc.

4) Akamize public service urls. Using Etag is sensible as well.

5) Do parallel data loads on in request call. Use thread pools or use a modern Language like Scala with Actors.

6) Don't create crazy normalized DB design. Design tables for performance and minimal joins. 

.... 

Diptamay Sanyal replied on Sat, 2012/02/25 - 11:40pm

Oh forgot to add. Avoid JPA :). Using JDBC and super optimized SQL queries.

Jonathan Fisher replied on Sun, 2012/02/26 - 8:33pm in response to: Diptamay Sanyal

1) Whaa???? DO USE the HTTP Session :) What I hoping you meant is avoid putting large objects in Session! A session identifier is a heck of a lot smaller than putting objects in cookies! Also DONT use cookies unless you have to, since they will be transmitted every request. 

2) Agree!

3) Agree.  More importantly, servce static content from a cookie-less domian.

4) Agree. 

5) Partially Agree. Object pooling is very much a proven and memory efficient technology, whereas Scala is very much a green language.

6) Agree. Normalize just enough for space / speed tradeoff.

7) This is completely circumstantial! If you can't write efficient JPQL, it's likely you're doing naughty things with your data model.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.