GapList – a lightning-fast List implementation
This article introduces GapList, an implementation which strives for combining the strengths of both ArrayList and LinkedList. We will show how it is implemented to offer efficient random access to elements by index (as ArrayList does) and at the same time efficient adding and removing elements to and from head and tail of the list (as LinkedList does).
Background
The Java Collections Framework belongs to the most used functionality in the Java standard library. Nearly each application uses it to handle some collections of data.
The framework comprises various interfaces and also implementations of these interfaces. By far the most popular interface is java.util.List which allows to handle an array-like list of elements which can grow and shrink as needed.
The Java standard library offers two main implementations of the List interface: ArrayList and LinkedList. As explained in http://docs.oracle.com/javase/tutorial/collections/implementations/list.html you will typically end up using ArrayList as general-purpose List implementation as it is usually faster and consumes less memory.
But the cited document also mentions that executing add/delete operations on the beginning or interior of an ArrayList will not perform very well. So the question arises: Is ArrayList really suited as general-purpose implementation of List?
Motivation
Before delving further into the details, we want to have a look at a situation which you could encounter in your daily programmer's life. It will illustrate that some needs cannot be met by using either ArrayList or LinkedList.
Consider the following use case: We write a software which has to analyze incoming events. The analysis is always done on a fixed size window containing the last events arrived. So to maintain these fixed size window, we need a data structure which supports efficient adding elements at the beginning and removing elements from the end. The analysis is then done by methods which require random access to the list - either because it is easier to implement or because they really need it.
Let's consider our two alternatives:
ArrayList would excel in the analysis part, but behave poorly in adding/removing the events
LinkedList would excel in adding and removing elements, but behave poorly in the analysis part
Which implementation would you choose? As a good programmer this will not be an easy choice because you know that your solution will not handle all cases very well. Of course you know that you should not care about premature optimization, but nevertheless it may give you at least a bad feeling. It may be that you do some performance measurements to keep you sleeping silently at night or you write your concerns in a design documents or in some Javadoc comments to show your co-workers that you have understood the implications of what you have done.
So whatever you do, the solution will not be an optimal one and you may have spent some time measuring performance, writing concerns or doing some brain work. And you may continue to wonder how you could realize a better solution which you could then be really be proud of.
But in fact you would like to have standard libraries which are fast enough so they can be used as building blocks for other components without compromises. And here GapList comes to the rescue: it will excel in all operations.
Review of ArrayList
The implementation of GapList is an improvement of ArrayList. We therefore first have a look at the implementation of ArrayList to see why the performance of some operations tends to be bad.
The following illustration shows how data is stored in an ArrayList.
All elements are stored in a Java array. The first element is at index 0, the second at index 1 and so on. To prevent new allocation on every insert, the allocated array is typically larger than the stored number of elements and a size variable is used to keep track of the used elements.
If we look at the illustration, it becomes quite obvious why different operations have different performance characteristics: operations at the end are very fast because no data must be copied, whereas operations at the beginning need an expensive copy operation.
If we analyze some operations in a row, we see that subsequent operations cannot profit from previous operations. So in the following example, the elements right from the insertion point have to be moved twice for two subsequent insert operations.
As a consequence, ArrayList is not suited for iterating over a list and adding or removing elements. If you really need to iterate over an ArrayList and modify it, you would better build up a new ArrayList instance during interating.
The fact that ArrayList offers a very asymmetric performance behavior can also become problematical if the operations are executed through a GUI where the user would notice that an operation at the end is ways faster than the same operation at the beginning of the list.
Introduction to GapList
To solve the issues brought out, we introduce GapList as another implementation of the java.util.List interface. As main features, GapList provides
Efficient access to elements by index
Constant time insertion at head and tail of list
Exploit the locality of reference often seen in applications
Let's see how GapList is implemented to offer these features.
If we compare how the different kind of inserts are handled by ArrayList, we can quickly come up with a solution to guarantee fast insertion both at the beginning and at the end of the list. Instead of moving all elements to gain space at index 0, we leave the existing elements in place and write the elements at the end of the allocated array if there is space left. So we basically use the array as a kind of rotating buffer.
For accessing the elements in the right order, we have to remember the start position of the first element and use a modulo operation to calculate the physical index from the logical one:
physIndex = (start + index) % capacity
To exploit the locality of reference, we allow a gap to be included in the storage of the list elements. The gap formed by the unused slots in the backing array can be anywhere in the list. There is at most one gap, but there can also be none.
This gap helps you to take advantage of the locality of reference to the list, so if you add an element to the middle of the list, a subsequent addition to the middle will be fast.
If a GapList has no gap, one is created if needed. If the gap is at a wrong place, it is moved. But if the operations happen near to each other, only few data will have to be copied.
GapList also allows removal of elements at the beginning and at the end without any moving of elements.
Removals in the middle are handled similar to insertions: an existing gap may be moved or vanish if no longer needed.
Perfomance of GapList
We now present benchmark figures comparing GapList, ArrayList, and LinkedList for some common operations.
Let's first have a look at the most import operation: retrieving elements.

We can see that the performance of GapList is even slightly better than ArrayList in retrieving elements. The performance of LinkedList is 2'000 times worse if accessing elements in a random way (note that all charts use a logarithmic axis).
The next diagram shows the performance for adding elements.

GapList features good performance irrespective of the insert location whereas ArrayList is 3'000 times slower if adds happen at the beginning of list.
In the case of completely random add and remove operations, GapList is slightly slower than ArrayList. This is what we would expect as there is no locality of reference GapList can exploit and the implementation of GapList is somehow more complex.
But what happens if subsequent operations happen next to each other? Or we even are iterating over the list?

We can see that the performance of GapList improves the more local the subsequent operations happen and GapList can become 100 times faster than ArrayList. It is also nearly as fast as LinkedList in iterating.
The performance of remove operations is similar to the add operations shown.
To summarize, we he have seen that GapList offers great performance in any test case:
it is as fast as ArrayList in retrieving elements which is the most important operation
it is as fast as LinkedList in adding/removing elements at head/tail of list
it is as fast as ArrayList in adding/removing elements at random locations
it is as fast as LinkedList in adding/removing elements by iterating
Features of GapList
GapList has been designed as drop-in replacement for both ArrayList and LinkedList:
GapList implements the interface java.util.Deque which is not implemented by ArrayList, but only by LinkedList.
All public methods provided by ArrayList are also implemented by GapList (ensureCapacty, trimToSize).
A known drawback of the Java Collections Framework is the fact that there is no support for primitive types. So if you need to store a bunch of integers, you will end up with a List<Integer>. This approach uses much more memory than storing the data in a simple int[].
GapList also offers implementations for all simple data types, e.g IntGapList.
As the classes for the primitive data types are generated out of the generic version using a templating mechanism, some bulk operations have been added directly to GapList. They include remove(), move(), copy(), but also sort(), rotate(), or shuffle().
GapList is not thread-safe. It must also be considered that exploiting the locality of reference only works if there is a locality: if two threads are working on different parts of the list, this optimization can even have a negative effect on performance as the gap has to be moved back and forth. Of course this can also happen with a single thread but it is more unlikely.
MaxList - a Use Case for GapList
We now come back to our example presented as motivation and show how GapList can help us to create a really simple and powerful implementation.
As both adding and removing are fast, we can easily use GapList as a fixed size rotating buffer. Elements can be added until the maximum size is reached. If another element should be added, the first element is removed before the addition.
As all our performance considerations are solved by GapList, the implementation of the described functionality is quite simple. You basically have to override the add methods to guarantee that the list will never grow beyond the defined limit.
add(T elem) {
if (size() == maxSize()) {
removeFirst();
}
super.add(elem)
}
Conclusion
GapList offers excellent performance in any situation which makes it superior to ArrayList and LinkedList. This allows to use GapList as drop-in replacement for both ArrayList and LinkedList.
GapList, MaxList and the implementations working with primitive data types are part of the Brownies Collections library and can be downloaded from http://www.magicwerk.org/collections.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





Comments
Jörg Buchberger replied on Tue, 2012/03/20 - 6:19am
Hi Thomas
why did you choose not to use java.util.ArrayDeque respectively a fixed-size sub-class thereof?
What is it, that bothered you about java.util's various Queue and Deque implementations?
Cheers.
Jörg
Thomas Mauch replied on Tue, 2012/03/20 - 7:36am
in response to:
Jörg Buchberger
Hi Jörg,
ArrayDeque and the other Queue and Deque implemenations (except LinkedList) do not implement the List interface so you can not access the elements by index.
ArrayDeque also does not allow null elements to be added.
The advantage of GapList is to offer both List and Deque interface with an excellent performance.
Regards,
Thomas
Nishant Kumar replied on Wed, 2012/03/21 - 7:43am
in response to:
Thomas Mauch
And how about ArrayBlockingQueue which implements both Queue and Collection? It is a simple circular array based queue implementation. The event example that you can gave would probably also need some concurrency checks.
Please mention the size of the list that you used to perform these benchmarks. It will help understand the usecsaes in which GapList shines. For example should I even care about all these differences when my list size is 10 or100?
Thomas Mauch replied on Wed, 2012/03/21 - 10:51am
in response to:
Nishant Kumar
Both ArrayDeque and ArrayBlocking implement Collection, but only List does offer access to the elements by index.
The benchmarks have been run against a collection with 100'000 elements, where each operation has been executed 50'000 times. This may seem a lot, but do you know always exactly how your component will be used?
If you have never more than 100 elements you could choose any collection you like. But what if the user starts working with bigger data sets? If you have chosen ArrayList as collection, the user would start to notice that an operation at the end is ways faster than the same operation at the beginning of the list. Looking at the benchmark we can see that he would suddenly have to wait 6 seconds if he is doing the insertions at the beginning.
So the excellent performance offered by GapList in any case supports you in writing software which scales well. And even if you have only 100 elements: why should you choose an implementation which can become quite easily noticeable slow if you have an alternative?
Thomas
Lance Semmens replied on Fri, 2012/03/23 - 11:18am
Thanks for the interesting read.
I would like to see another metric which is the size in memory for each list implementation. I'd like to see what the memory costs of the "gaps" are after multiple random adds and deletes. You can easily get this by ObjectOutputStream.writeObject() to a ByteArrayOutputStream and inspecting the size of the byteArray.
Lance.
Thomas Mauch replied on Sat, 2012/03/24 - 6:08am
in response to:
Lance Semmens
Hi Lance
The memory usage of GapList is comparable to ArrayList. Both implementations use an array which grows if more space is needed to hold all elements. The "gaps" do not need more memory, but merely organize the available space differently.
If you look at the illustrations in the article, you can see that also ArrayList has something like a "gap, but as this gap is always at the end of the array, it sound strange to name it like this.
By the way, the method you mention for measuring memory costs will not work for GapList because the method writeObject() only writes the number of elements followed by all elements to the serialization stream - and so does ArrayList.
Thomas
Thomas Chen replied on Wed, 2012/03/28 - 11:44am
Very Good, Thomas.
Could you tell me what tools do you use to make a performance test
Thomas Mauch replied on Thu, 2012/03/29 - 4:44am
in response to:
Thomas Chen
It's a self written simple test framework.
You define how to initialize and conduct your test. The framework then runs the first for a some time so the JVM can make its optimization etc. Then the test is run for a specified time and the execution time is collected (excluding initialization). The time for a single test run is then calculcated by dividing the execution time by the number of test runs completed.
Thomas
Nice Robot replied on Tue, 2012/05/08 - 10:57am
This is called a Circular Array.
Here's an API for a similar implementation:
http://docs.oracle.com/html/E15725_01/com/tangosol/util/CircularArrayList.html
Thomas Mauch replied on Wed, 2012/05/09 - 2:20am
in response to:
Nice Robot
Thanks for pointing that out.
The Oracle implementation of CircularArrayList is of course not publicly available, but there are also others, e.g.org.objectweb.proactive.core.util.CircularArrayList.As GapList, it offers constant time insertion at head and tail of list, but it does not exploit the locality of reference provided by the "gapped" implementation of GapList. CircularArrayList also only implements the List interface, but not Dequeue.
I run my benchmark for CircularArrayList and as you could expect, it is as fast as GapList in the add operations shown in the second chart of the article, but as worse as ArrayList in the add operations of the third chart.
Mikael Grev replied on Thu, 2012/07/19 - 6:01pm
Hello Thomas,
There's a bug in addAll(Collection). It leaves the list with all nulls.
I hope for an update.
Cheers,
Mikael Grev
Thomas Mauch replied on Tue, 2012/07/24 - 6:04pm
Hi Mikael
Thank you for your feedback.
I have uploaded a new release 0.9.1 which contains a fix for the bug you discovered and some enhancements.
Could you use the Google group brownies-collections for further feedback?
I just discovered your message by accident as I receive no message if someone posts a new comment - is there a way to get a notification?
Enjoy,
Thomas