Peter is a DZone MVB and is not an employee of DZone and has posted 161 posts at DZone. You can read more from them at their website. View Full User Profile

Memory alignment in C, C++ and Java

  • submit to reddit

You might assume that reducing the size of a struct or class saves the same amount of memory.    However due to memory alignment in memory allocators it can make no difference, or perhaps more difference than you might expect.    This is because the the amount of memory reserved is usually a multiple of the memory alignment. For Java and C this can 8 or 16 bytes.


Size of memory reserved

These tests were performed in a 64-bit C program (gcc 4.5.2) and a 64 JVM (Oracle Java 7) In Java, direct memory is largely a wrapper for malloc and free.
BytesC malloc() reservedJava ByteBuffer.allocateDirect()
0 to 2432 bytes32 bytes + a ByteBuffer
25 to 4048 bytes48 bytes + a ByteBuffer
41 to 5664 bytes64 bytes + a ByteBuffer
57 to 7280 bytes80 bytes + a ByteBuffer

Constructing objects is a similar story


Number of fieldsC class of int
C class of void *
 Java class 
 with int 
Java class
with Object references
132/16 bytes32/16 bytes16 bytes16 bytes
232/16 bytes32/16 bytes24 bytes24 bytes
332/16 bytes32/32 bytes24 bytes24 bytes
432/16 bytes48/32 bytes32 bytes32 bytes
532/32 bytes48/48 bytes32 bytes32 bytes
632/32 bytes64/48 bytes40 bytes40 bytes
748/32 bytes64/64 bytes40 bytes40 bytes
848/32 bytes80/64 bytes48 bytes48 bytes

Using a C struct/class on the stack is more efficient than the other approaches for a number of reasons, two of them being that there is no memory management header and no additional pointer/reference (not shown in the table).

Sun/Oracle and OpenJDK 6 and 7 JVMs will use 32-bit references and the 8 byte memory aligned and to support up to 32 GB (8 * 4 G)    Most JVMs are less than 32 GB in size making this a useful optimisation.   Note: Part of the reason that JVMs are usually 1 to 4 GB in size is that the worst case Full GC time is typically 1 second per GB of heap and a 30 second full CG time is to long for most applications.   The typical way around this full GC time problem is to keep the working size of the heap to a few GB and use an external database or heapless "direct" and "memory mapped" memory.

Another solution for Java is to use a pause less concurrent collector such as that provided by Azul.   They claim excellent scalability beyond 40 GB of heap, but don't openly list their costs ;)


Why does this matter?

Say you have a class like this


class MyClass {
    int num;
    short value;

In C, how much memory is saved by changing num to a short or how much more is consumed of with make it long long.    The answer is likely to be none at all (unless you have an array of these) In Java, it could make a difference as the alignment size is different. Conversely, if the C class/struct is 16 or 17 bytes, it can make the size on the stack be 16 or 32 bytes. Similarly being 24 or 25 bytes can make the malloc'ed size used 32 or 48 bytes long.

The code

MemoryAlignment/main.cpp and



Published at DZone with permission of Peter Lawrey, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)



Michael Bien replied on Sat, 2011/09/17 - 6:24pm

if you have to dynamically allocate direct BBs in performance critical code you can slice them from bigger buffers. This not only safes memory.. it is also much faster (for small buffers of course).

The code is straight forward... here is a factory i wrote for JOCL (Java OpenCL binding) which supports static and dynamic usecases.

example usage:

Ashwin Jayaprakash replied on Mon, 2011/09/19 - 1:24am

I've done some more testing -

Peter Lawrey replied on Fri, 2011/12/09 - 4:58am in response to: Michael Bien

A good suggestions.  I don't use slice() as much as I would like to as they are not very re-usable. But perhaps I should think again.

John David replied on Thu, 2012/01/26 - 3:12am

I would venture to guess that memcpy() being such a basic function is heavily optimized with inline assembler. Perhaps they found a performance tweak, but for some reason it is not being used for stack->stack copies. I would have to take a look at Apple's version of glibc memcpy()and compare it to the glibc you are using.

Java Eclipse

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.