Lieven Doclo is a Belgian consultant specialized in Java development. He's currently working for a Java-centric consultancy firm. He's the project lead for Spring Rich Client and has an interest Java in rich-client development. Lieven has posted 2 posts at DZone. View Full User Profile

Clustering Tomcat Servers with High Availability and Disaster Fallback

03.11.2011
| 25324 views |
  • submit to reddit

There has been a lot of buzz lately on high-availability and clustering. Most developers don't care and why should they? These features should be transparent to the application architecture and not something of concern to the developers of that application. But knowledge never hurts, so I emerged myself into the world of load balancing, heartbeats and virtual IP addresses. And you know what? Next time we need a infrastructure like this, I can at least sit down with the guys from the infrastructure department and at least know what the hell they are talking about.

So what exactly is a high-availability clustered infrastructure (HACI, as I'll call it from now on) ? In essence, it should be a zero-downtime infrastructure (or at least perceived as one by the end user, which means never ever returning a default browser 404 page), capable of horizontal scaling when the need for it arises and without a single point of failure. It's the SLA writer's dream. A basic HACI setup looks like this:

The users enters through a virtual IP address, assigned to one of the two load balancers. Only one of the load-balancers is active (the active master, LB1), the other one is there in the event LB1 fails ((LB2, a passive slave). The two load balancers are redundant, ie. having the exact same configuration. The load balancers redirect all traffic to the real servers. This can be done through round-robin assignment or through other means like sticky sessions, where the same user is redirected to the same server each and every time within a session. Servers can be added at any moment and configured on the load balancers. Ideally, the load balancer configuration is aware of the hardware specification and balances the load accordingly, but that's beyond the scope of this article (it involves adding weights). If all servers balanced by the load balancer fail, a backup server should be used to redirect all traffic coming from the load balancer. This can be a very lightweight server, which purpose is only to provide a sensible error page to the user (something like 'Sorry, we are performing maintenance'). Again, perception and immediate feedback to the user is key. You don't want to show the user a plain 404 page. Off course, if the backup server goes down too, you're in trouble (off course, by that time, warning bells should have gone off on every level in the hierarchy).

So how to achieve this with as little effort as possible? If you want to try this out, I suggest you start by installing a virtual machine like VirtualBox or VMWare. This way you can try out the configuration yourself. In this example, I'll be load-balancing 3 Tomcat servers using sticky sessions using 2 load balancers in active-passive mode. I'm assuming all 3 Tomcat servers share the same hardware configuration, so they are all able to handle the same amount of traffic each. I'm also throwing in a backup server, in case all 3 Tomcat servers go down (serving a custom 503 page kindly informing the user of a catastrophic failure, instead of dropping the standard 404 bomb).

You want to start off by assigning IP addresses to the servers. This will make your life a bit easier. We'll need 7 addresses: 3 for the tomcat server, 1 for the backup server, 2 for the loadbalancers and 1 virtual IP address to be shared between the load balancers (and which will be the entry point for your users). So our assignment will be:

Virtual IP   10.0.5.99     www.haci.local
LB1 10.0.5.100 lb1.haci.local #MASTER
LB2 10.0.5.101 lb2.haci.local #SLAVE
WEB1 10.0.5.102 web1.haci.local
WEB2 10.0.5.103 web2.haci.local
WEB3 10.0.5.104 web3.haci.local
BACKUP 10.0.5.105 backup.haci.local

Setting up the web servers is easy. You just install Tomcat on each server and create a simple JSP file to be served to users (make a small change, like the background color, on each server to distinguish the servers). I won't be covering session replication between the Tomcat servers, as it'll take me too far. If you want, you can configure the appropriate session replication and storage (using multicast or JDBC for example).

The backup server I'm using is a basic LAMP server that returns a simple 503 page on every request it gets. The 503 error code is important, because it reflects the current state of the system: currently unavailable.

For the loadbalancers I'll be using 2 applications: HAProxy and keepalived. HAProxy is going to handle load balancing, while keepalived will handle the failover between the two load balancers.

First, we're going to configure HAProxy for both LB1 and LB2. Installing HAProxy is quite easy on an ubuntu system. Just do a sudo apt-get install haproxy and you're off. After the install, backup the current HAProxy config and start editing away.

 

cp /etc/haproxy.cfg /etc/haproxy.cfg_orig
cat /dev/null > /etc/haproxy.cfg
vi /etc/haproxy.cfg

 

The content of the config to reflect our setup should become something like this (same config on LB1 and LB2):

 

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 4096
#debug
#quiet
user haproxy
group haproxy

defaults
log global
mode http
option httplog
option dontlognull
retries 3
redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000

frontend http-in
bind 10.0.5.99:80
default_backend servers

backend servers
mode http
stats enable
stats auth someuser:somepassword
balance roundrobin
cookie JSESSIONID prefix
option httpclose
option forwardfor
option httpchk HEAD /check.txt HTTP/1.0
server web1 10.0.5.102:80 cookie haci_web1 check
server web2 10.0.5.103:80 cookie haci_web2 check
server web3 10.0.5.104:80 cookie haci_web3 check
server webbackup 10.0.5.105:80 backup

 

After this, enable HAProxy on both LB1 and LB2 by editing /etc/defaults/haproxy

 

# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

 

So far for the HAProxy configuration. We can't start it up yet, as LB1 and LB2 aren't listening yet on the virtual IP address.

Next we'll configure the failover of the loadbalancers using keepalived. Installing it on Ubuntu is as easy as it was for HAProxy: sudo apt-get install keepalived. But its configuration is slightly different on both load balancers. First, we need to configure the both servers to be able to listen to the shared IP address. Add the following line to /etc/sysctl.conf:

 

net.ipv4.ip_nonlocal_bind=1

 

And run

 

sysctl -p

 

Now, we configure keepalived so that LB1 is configured as the main load balancer and binds to the shared IP address, while LB2 is on standby, ready to take over whenever LB1 goes down.

The configuration for LB1 looks like this (edit /etc/keepalived/keepalived.conf):

 

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}

vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101 # 101 on master, 100 on backup
virtual_ipaddress {
10.0.5.99
}
track_script {
chk_haproxy
}
}

 

Start up keepalived and check whether it is listening to the virtual IP address.

 

/etc/init.d/keepalived start
ip addr sh eth0

 

It should return something like this, indicating it is listening to the virtual IP address

 

2: eth0:  mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff
inet 10.0.5.100/24 brd 10.0.5.255 scope global eth0
inet 10.0.5.99/32 scope global eth0
inet6 fe80::20c:29ff:fea5:5b93/64 scope link
valid_lft forever preferred_lft forever

 

Next, we configure LB2. The configuration is almost the same, exception for the priority.

 

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}

vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 100 # 101 on master, 100 on backup
virtual_ipaddress {
10.0.5.99
}
track_script {
chk_haproxy
}
}

 

Start up keepalived and check the network interface.

 

/etc/init.d/keepalived start
ip addr sh eth0

 

It should return something like this, indicating it is not listening to the virtual IP address

 

2: eth0:  mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff
inet 10.0.5.101/24 brd 10.0.5.255 scope global eth0
inet6 fe80::20c:29ff:fea5:5b93/64 scope link
valid_lft forever preferred_lft forever

 

Now, start up HAProxy on both LB1 and LB2.

 

/etc/init.d/haproxy start

 

Now you can issue requests to 10.0.5.99 (or www.haci.local), which will go to LB1, which in turn will load-balance the request to either WEB1, WEB2 and WEB3. You can test the load balancing by turning off WEB1 (or the server you're currently on). You can also the backup server by turning all main webservers (WEB1, WEB2 and WEB3). And you can test the loadbalancer failover by turning off LB1. At that point LB2 will kick in and act as the master, loadbalancing all requests. When you turn LB1 back on, it'll take over the master role once again. HAProxy allows you to add extra servers very easily, reloading the configuration without breaking existing sessions. See the HAProxy documentation for more info or on ServerFault. (http://serverfault.com/questions/165883/is-there-a-way-to-add-more-backend-server-to-haproxy-without-restarting-haproxy).

Cheap and effective. While most enterprise shops have hardware load balancers, which also have these possibilities and more, if you're on a tight budget or need to simulate a HACI environment for development purposes (a lesson here: always simulate your production environment when you're testing during development), this might be the sane option.

To finish, I'll quickly explain how to set up the backup server (a simple LAMP server).

Create a vhost configuration on the apache for www.haci.local or any other domain pointing to the virtual IP address and set up mod_rewrite for it:

 

RewriteEngine On
RewriteCond %{REQUEST_URI} !\.(css|gif|ico|jpg|js|png|swf|txt)$ [NC]
RewriteConf %{REQUEST_URI} !/503.php
RewriteRule .* /503.php [L]

 

Then create the 503.php file and add this to the top of it:

 

<?
header('HTTP/1.1 503');
header('Retry-After: 600');
?>
<html>
   <head><title>Unavailable</title></head>
   <body><p>Sorry, our servers are currently undergoing maintenance. Please check 
back with us in a while. Thank you for your patience.</p></body>
</html>

 

You can decorate the 503.php file any way you like. You can even use CSS, JavaScript and image files in the php file.

Now, back to my IDE. I'm getting withdrawal symptoms.

Published at DZone with permission of its author, Lieven Doclo.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Josh Marotti replied on Fri, 2011/03/11 - 12:24pm

This always drives me nuts.  You spent allll that time and configuration getting tomcat clustered.

I'd put up jboss servers, edit one config file to cluster them together, and spend the rest of my time working on my app...

Lieven Doclo replied on Fri, 2011/03/11 - 4:02pm

Josh, I haven't even started on the real clustering (read: session replication), but that's maybe a half-hour job. This article is mainly setting up Tomcats (or any application server) in a fail-over, loadbalanced environment. While I agree that JBoss has simpler configuration, I don't like the fact that JBoss is doing everything. I actually like the separation of concerns: my network admins can tinker with the loadbalancing parameters without ever having to touch one of my application server instances. And my application servers can be out-of-the box, not requiring any configuration (if working in a stateless environment not requiring session replication) and not having to be aware it's in a load-balanced environment. But thanks anyway for pointing that out.

Slim Ouertani replied on Mon, 2011/03/14 - 1:39am

Thanks for this post, very interresting for me. I want to remember that load balancing is different to clustering.

Do you mean that you will do the too approches ?

Richard Nduka replied on Tue, 2011/03/15 - 11:30am

Lieven, many thanks for this article. Infact we are in the process of implementing something similar at the office. One funny problem we ran into though is that our application has scheduled jobs which each runs everyday. Running the application on 2 tomcat instances for example had each of the jobs being executed twice :-). Is there any sophisticated way of dealing with such problems rather than modifying the application code?

 Thanks.

Martin Grotzke replied on Mon, 2011/04/04 - 1:30pm

Nice post! Regarding session replication you might want to check out memcached-session-manager, which is a memcached based session manager drop-in for tomcat (open source). It supports replication/handling of both sticky and non-sticky sessions and works with tomcat 6.x and 7.x.

Anton Pryamostanov replied on Sat, 2014/05/24 - 6:42am

Dear Lieven,


First of all thanks for this comprehensive tutorial; there are not too much of haproxy tutorials out there.

Secondly I need to mention that check.txt is not specified to be present in my Tomcat web app. This means that without this file, "option httpchk HEAD /check.txt HTTP/1.0" will give 503. So I created this file and whole thing started to work. Hope that it will help someone.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.