Peter is a DZone MVB and is not an employee of DZone and has posted 161 posts at DZone. You can read more from them at their website. View Full User Profile

A collection with billions of entries

08.09.2011
| 4164 views |
  • submit to reddit

There are a number of problems with having a large number of records in memory. One way around this is to use direct memory, but this is too low level for most developers. Is there a way to make this more friendly?

Limitations of large numbers of objects

  • The overhead per object is between 12 and 16 bytes for 64-bit JVMs. If the object is relatively small, this is significant and could be more than the data itself.
  • The GC pause time increases with the number of objects. Pause times can be around one second per GB of objects.
  • Collections and arrays only support two billion elements

Huge collections

One way to store more data and still follow object orientated principles is have wrappers for direct ByteBuffers.  This can be tedious to write, but very efficient.

What would be ideal is to have these wrappers generated automatically.

Small JavaBean Example

This is an example of JavaBean which would have far more overhead than actual data contained.
interface MutableByte {
    public void setByte(byte b);

    public byte getByte();
}

It is also small enough that I can create billions of these on my machine. This example creates a List<Bytes> with 16 billion elements.

final long length = 16_000_000_000L;
HugeArrayList<MutableByte> hugeList = new HugeArrayBuilder<MutableByte>() {{
    allocationSize = 4 * 1024 * 1024;
    capacity = length;
}}.create();

List<MutableByte> list = hugeList;
assertEquals(0, list.size());

hugeList.setSize(length);

// add a GC to see what the GC times are like.
System.gc();

assertEquals(Integer.MAX_VALUE, list.size());
assertEquals(length, hugeList.longSize());

byte b = 0;
for (MutableByte mb : list)
    mb.setByte(b++);

b = 0;
for (MutableByte mb : list) {
    byte b2 = mb.getByte();
    byte expected = b++;
    if (b2 != expected)
        assertEquals(expected, b2);
}
From start to finish, the heap memory used is as follows. with -verbosegc
0 sec - 3100 KB used
[GC 9671K->1520K(370496K), 0.0020330 secs]
[Full GC 1520K->1407K(370496K), 0.0063500 secs]
10 sec - 3885 KB used
20 sec - 4428 KB used
30 sec - 4428 KB used
  ... deleted ...
1380 sec - 4475 KB used
1390 sec - 4476 KB used
1400 sec - 4476 KB used
1410 sec - 4476 KB used
The only GC is one triggered explicitly. Without the System.gc(); no GC logs appear.

After 20 sec, the increase in memory used is from logging how much memory was used.

Conclusion

The library is relatively slow. Each get or set takes about 40 ns which really adds up when there are so many calls to make. I plan to work on it so it is much faster. ;)

On the upside, it wouldn't be possible to create 16 billion objects with the memory I have, nor could it be put in an ArrayList, so having it a little slow is still better than not working at all.

 

From http://vanillajava.blogspot.com/2011/08/collection-with-billions-of-entries.html

Published at DZone with permission of Peter Lawrey, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Artur Biesiadowski replied on Wed, 2011/08/10 - 3:22am

Ok, so it is huge wrapper around flyweight pattern.

Can you tell me what is the business case for having >2 billion doubles in huge list in memory in continous array?

Jozeph Debuille replied on Wed, 2011/08/10 - 6:35am

MutableByte is not a javabean. JavaBeans are reusable software components that can be manipulated visually in a builder. Stop doing this. And getters and setters are bad design.

Syed M Shaaf replied on Wed, 2011/08/10 - 7:32am

So accesing the (16 billion -1) is with an optimistic running time?

dieter von holten replied on Wed, 2011/08/10 - 9:50am

interesting know-how. - but rather useless unless you plan to actually use it: in an app this thousands of threads talking to the outside world through lightning fast datagrams... without any GC-pause ever!

 that might give an app which loses money faster than the current stock-dive..

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.