Performance Zone is brought to you in partnership with:

I am founder and Master Developer of Plumbr, memory leaking detection tool. I enjoy solving problems, raising self-awareness and professional pride of developers around me. In my out-of-office time I am bookworm and computer games addict. Nikita is a DZone MVB and is not an employee of DZone and has posted 69 posts at DZone. You can read more from them at their website. View Full User Profile

On Java Collection Waste

  • submit to reddit

This post comes from Kairi Kangro at the Plumbr blog.

This article is about overhead posed by one of the most popular frameworks used – I bet there is close to no applications where the java.util.Collections is not used.

The article is based on the fact that the framework provides default values for e.g. initial size of the collections. So we had a hypothesis that most people do not bother managing the sizes of their collections by themselves, and therefore end up with under-utilised collections wasting memory. If this were so, we could implement a solution that tells people where their half-empty collections are created and what they could do to avoid wasting memory.


To test our hypothesis, we decided to measure the usage of  thirteen most common java.util.Collection members:

  • java.util.HashMap
  • java.util.WeakHashMap
  • java.util.IdentityHashMap
  • java.util.Hashtable
  • java.util.LinkedHashMap
  • java.util.HashSet
  • java.util.LinkedHashSet
  • java.util.ArrayList
  • java.util.concurrent.ArrayBlockingQueue
  • java.util.Vector
  • java.util.ArrayDeque
  • java.util.PriorityQueue
  • java.util.concurrent.ConcurrentHashMap

For all objects deriving from those classes we measured once every 30 seconds the amount of objects contained in the collection and the number of empty spaces in the array(s) underlying the collection. The amount of memory consumed of such empty spaces is a waste of the collection. This data was collected from 556 different real-world applications using a special version of Plumbr.


To estimate of the extent of the problem, we calculated the total waste over all collections after each 30 seconds, and took the maximal and average value of these totals for each session. As also seen from the following diagram, most of the sessions (397 out of 556) had a maximal summary waste of less than a megabyte, and only 32 sessions had a maximal summary waste of over 10 MB, with a maximum of around 60 MB.

ArrayList HashMap overhead

The diagram should be read similar to the following example: there was 291 applications wasting memory more than 0.1MB but less than 1MB.


Comparing the outcome to the heap sizes allocated, we concluded that the problem is not large enough, at least if  waste is defined as unused heap. What also made us drop the research in this direction was fragmentation – this waste was often spread over 100′s of different collections, making the optimization harder and more error prone.

The particular research did not go into details whether CPU overhead on dynamically increasing the underlying arrays was a big enough problem, this will be a story for another time. For which I can only recommend to follow us in Twitter to be notified when we are ready to publish.

Published at DZone with permission of Nikita Salnikov-tarnovski, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Shanmuga Reddy replied on Thu, 2014/02/13 - 5:35am

when an elemets are added in a list of types int and strings .while printing i need to print only string any suggestions folks



Ramesh Peetla replied on Thu, 2014/02/13 - 5:36am

how to reverse a string with out using String Library??????


Ramesh Peetla replied on Thu, 2014/02/13 - 5:37am in response to: Shanmuga Reddy


Shanmuga Reddy replied on Thu, 2014/02/13 - 5:38am in response to: Ramesh Peetla


Johan Parent replied on Thu, 2014/02/20 - 11:24am

Great article! 

I agree that depending in the usage patterns the standard Collection implementation that comes with the JDK can be suboptimal. The Pareto4j  project is a open source project under the ASL v2.0   

It provides:

  •  a tool to discover places where the JDK Collection implementation is suboptimal in terms of memory. 
  • a Collections implementation with a low memory footprint
Profilers obviously cover memory usage to in their analysis. But they give a broad and detailed view of the memory usage. Furthermore profilers avoid instrumenting the JDK classes for performance reasons.  Therefore one can not easily get a clear view when it comes to the impact of the Collection classes on a program's performance. The pareto4j inspection tool aims solely at Collection classes. It helps to quickly detect and report patterns in the usage of the Collections at runtime. The inspection happens through JMX and dumps to text files.No code changes are required to use the tool. A small java agent instruments the code at load time. Note that this is a development tool! The tool brings some CPU and memory overhead depending on the application. The tool can easily be configured using system properties.  As a test a whole range of large application have been analyzed with it; Findbugs, IntelliJ IDEA CE (12 & 13), Eclipse and weblogice 12.1.1 to name a few. All exhibit to some extends suboptimal use of the Collections API in one place are another.The Pareto4j project is inspired by the next presentation which, in a nutshell, explains that the general purpose implementation of the JDK can be suboptimal:

More information can be found on the project homepage

Full disclosure: I'm the author of the tool. Please do not hesitate to contact me in case of questions, problems or suggestions.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.