Performance Zone is brought to you in partnership with:

Greg Wilkins is the Chief Technical Officer (CTO) and one of the founding CEO of Webtide. He was also a founder and CEO of Mort Bay Consulting. Greg has deep knowledge of all facets of software development. He has, 22 years experience as a software developer, team leader, architect, trainer, and technical mentor in industry sectors ranging from telecommunications, finance, realtime computing to internet applications. He is closely involved with the open source movement, being the creator of the Jetty web container, a co-founder of Apache Geronimo, and a committer or contributor to a number of other open source projects. Greg sits on the JCP Servlet Expert Group and is active in the Open Ajax Alliance. Greg received his B.S. Computer Science Degree with 1st Class Honors from Sydney University, Australia. Greg has posted 11 posts at DZone. View Full User Profile

Cometd-2 Throughput vs Latency

  • submit to reddit

With the imminent release of cometd-2.0.0, it's time to publish some of our own lies, damned lies and benchmarks. It has be over 2 years since we published the 20,000 reasons that cometd scales and in that time we have completely reworked both the client side and server side of cometd, plus we have moved to Jetty 7.1.4 from eclipse as the main web server for cometd.

Cometd is a publish subscribe framework that delivers events via comet server push techniques from a HTTP server to the browser. The cometd-1 was developed in parallel to the development of many of the ideas and techniques for comet, so the code base reflected some of the changed ideas and old thinking as was in need of a cleanup. Cometd-2 was a total redevelopment of all parts of the java and javascript codebase and provides:

  • Improved Java API for both client and server side interaction.
  • Improved concurrency in the server and client code base.
  • Fully pluggable transports
  • Support for a websocket transport (that works with latest chromium browsers).
  • Improved extensions  
  • More comprehensive testing and examples.
  • More graceful degradation under extreme load.

The results have been a dramatic increase in throughput while maintaining sub second latencies and great scalability.

The chart above shows the preliminary results of recent benchmarking carried out by Simone Bordet for a 100 room chat server.  The test was done on Amazon EC2 nodes with 2 x amd64 CPUs and 8GB of memory, running ubuntu Linux 2.6.32 with Sun's 1.6.0_20-b02 JVM. Simone did some tuning of the java heap and garbage collector, but the operating system was not customized other than to increase the file descriptor limits.  The test used the HTTP long polling transport. A single server machine was used and 4 identical machines were used to generate the load using the cometd java client that is bundled with the cometd release.  

It is worth remembering that the latencies/throughput measured include the time in the client load generator, each running the full HTTP/cometd stack for many thousands of clients when in a real deployment  each client would have a computer/browser. It is also noteworthy that the server is not just a dedicated comet server, but the fully featured Jetty Java Servlet container and the cometd messages are handled within the rich application context provided.

It can be seen from the chart above, that message rate has been significantly improved from the 3800/s achieved in 2008. All scenarios tested were able to achieve 10,000 messages per second with excellent latency. Only with 20,000 clients did the average latency start to climb rapidly once the message rate exceeded 8000/s.  The top average  server CPU usage was 140/200 and for the most part latencies were under 100ms over the amazon network, which indicates that there is some additional capacity available for this server.  Our experience of cometd in the wild indicates that you can expect another 50 to 200ms network latency crossing the public internet, but that due to the asynchronous design of cometd, the extra latency does not reduce throughput.

Below is an example of the raw output of one of the 4 load generators, which shows some of the capabilities of the java cometd client, which can be used to develop load generators specific for your own application:

Statistics Started at Mon Jun 21 15:50:58 UTC 2010
Operative System: Linux 2.6.32-305-ec2 amd64
JVM : Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM runtime 16.3-b01 1.6.0_20-b02
Processors: 2
System Memory: 93.82409% used of 7.5002174 GiB
Used Heap Size: 2453.7236 MiB
Max Heap Size: 5895.0 MiB
Young Generation Heap Size: 2823.0 MiB
- - - - - - - - - - - - - - - - - - - -
Testing 2500 clients in 100 rooms
Sending 3000 batches of 1x50B messages every 8000µs
- - - - - - - - - - - - - - - - - - - -
Statistics Ended at Mon Jun 21 15:51:29 UTC 2010
Elapsed time: 30164 ms
Time in JIT compilation: 12 ms
Time in Young Generation GC: 0 ms (0 collections)
Time in Old Generation GC: 0 ms (0 collections)
Garbage Generated in Young Generation: 1848.7974 MiB
Garbage Generated in Survivor Generation: 0.0 MiB
Garbage Generated in Old Generation: 0.0 MiB
Average CPU Load: 109.96191/200
Outgoing: Elapsed = 30164 ms | Rate = 99 messages/s - 99 requests/s
Waiting for messages to arrive 74450/75081
All messages arrived 75081/75081
Messages - Success/Expected = 75081/75081
Incoming - Elapsed = 30470 ms | Rate = 2464 messages/s - 2368 responses/s (96.14%)
Messages - Wall Latency Distribution Curve (X axis: Frequency, Y axis: Latency):
@ _ 56 ms (19201, 25.57%)
@ _ 112 ms (33230, 44.26%) ^50%
@ _ 167 ms (10282, 13.69%)
@ _ 222 ms (3438, 4.58%) ^85%
@ _ 277 ms (2479, 3.30%)
@ _ 332 ms (1647, 2.19%)
@ _ 388 ms (1462, 1.95%) ^95%
@ _ 443 ms (971, 1.29%)
@ _ 498 ms (424, 0.56%)
@ _ 553 ms (443, 0.59%)
@ _ 609 ms (309, 0.41%)
@ _ 664 ms (363, 0.48%)
@ _ 719 ms (338, 0.45%) ^99%
@ _ 774 ms (289, 0.38%)
@ _ 829 ms (153, 0.20%) ^99.9%
@ _ 885 ms (46, 0.06%)
@ _ 940 ms (3, 0.00%)
@ _ 995 ms (1, 0.00%)
@ _ 1050 ms (1, 0.00%)
@ _ 1105 ms (1, 0.00%)
Messages - Wall Latency Min/Ave/Max = 1/120/1105 ms
Messages - Network Latency Min/Ave/Max = 1/108/1100 ms

As time permits, we would like to update our java client to also support the websocket protocol, so that we can also generate the load from 20,000 websocket clients and see how this new protocol may further improve throughput and latency.




Published at DZone with permission of its author, Greg Wilkins.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Michael Eric replied on Wed, 2012/09/26 - 3:36pm

Hi, This is very impressive. Do you have any machine resource usage stats(eg: top, vmstat,netstat )? You mention that CPU usage is not high, but wondering if it affects other figures such as memory consumption, and load average. I am also looking forward to the websocket benchmark.

linux archive 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.