Hacking on GraphHopper - a Java road routing engine. Peter has posted 62 posts at DZone. You can read more from them at their website. View Full User Profile

Fast O(n) Integer Sorting Algorithm

03.02.2010
| 20038 views |
  • submit to reddit

Yesterday I learned that there is an O(n) integer sort algorithm (I should have read this before in a basic algorithm book :-/).

Now I wondered: is this necessary in real applications? E.g. somewhere in Java? Today I have taken the counting sort and I can argue: yes, you should use integer sort especially for large arrays!

And when in detail should you apply the fast integer sort? Apply it if

  • you have positive integer values to sort. The requirement ‘positive’ and ‘integer’ is necessary for the listed O(n) algorithm, but not if you implement your own possible better solution.
  • you have a limited interval for the integer values (preferable min and max=M should be known before you sort)
    E.g. if you know the maximum integer number in your array will be M=10^7 then you should use the integer sort if the array length n is roughly greater than M/2500 = 40000. This linear equation should hold true (for some values ;-) ), because quick sort is nearly independent of M and the time-offset for integer sort increases nearly linear with M as you can see in the graph

Now take a look at the graph where y=time in seconds for 10 runs and x=array length:

Conclusion

I would apply this sorting algorithm only for n>10^7 where the difference between quicksort and integer sort could lay in the range of seconds. The memory consumption was not measured but should be ~twice times higher for the fast integer sort.

Java Sourcecode

//class LinearSort
public static void main(String[] args) {

// init jvm
new LinearSort().start(1000, 10000, 10000);
new LinearSort().start(1000, 10000, 10000);

// run performance comparison
for (int maxInteger = 1000; maxInteger < 100000000; maxInteger *= 3) {
for (int arrLength = 1000; arrLength < 100000000; arrLength *= 3) {
System.gc();
new LinearSort().start(arrLength, maxInteger, 10);
}
}
}

private Random rand = new Random();

// stop watch for integer sort with *unknown* range. marked as Lin in the plot
private SimpleTimer linearStopWatch = new SimpleTimer();</pre>

// stop watch for integer sort with known range. marked as Lin' in the plot
private SimpleTimer linearKnownStopWatch = new SimpleTimer();

private SimpleTimer qSortStopWatch = new SimpleTimer();

private void start(int arrLength, int maxInteger, int times) {
for (int count = 0; count < times; count++) {
int[] list1 = new int[arrLength];
for (int i = 0; i < arrLength; i++) {
// do only allow positive integers until the specified 'max'-value
list1[i] = Math.abs(rand.nextInt(maxInteger));
}
linearStopWatch.start();
LinearSort.sort(list1);
linearStopWatch.pause();

int[] list2 = Arrays.copyOf(list1, arrLength);
qSortStopWatch.start();
Arrays.sort(list2);
qSortStopWatch.pause();

list2 = Arrays.copyOf(list1, arrLength);
linearKnownStopWatch.start();
LinearSort.sort(list2, 0, maxInteger);
linearKnownStopWatch.pause();
}

System.out.println(maxInteger + ";" + arrLength + ";" + linearStopWatch
+ ";" + linearKnownStopWatch
+ ";" + qSortStopWatch); // + ";" + qSortListStopWatch);
}

static int[] sort(int[] array, int min, int max) {
//the range is useful to minmize the memory usage
//countIntegers holds the number of each integer
int[] countIntegers = new int[max - min + 1];

for (int i = 0; i < array.length; i++) {
countIntegers[array[i] - min]++;
}

int insertPosition = 0;
//fill array in sorted order
for (int i = min; i <= max; i++) {
for (int j = 0; j < countIntegers[i - min]; j++) {
array[insertPosition++] = i;
}
}
return array;
}

static int[] sort(int[] array) {
int min, max = min = array[0];
//determine the max and min in the array
for (int i = 1; i < array.length; i++) {
if (array[i] < min)
min = array[i];

if (array[i] > max)
max = array[i];
}
return sort(array, min, max);
}

//class SimpleTimer

private long lastStart = -1;
private long time;

public void start() {
if (lastStart != -1)
throw new IllegalStateException("Call stop before!");

lastStart = System.currentTimeMillis();
}

public void pause() {
if (lastStart < 0)
throw new IllegalStateException("Call start before!");

time = time + (System.currentTimeMillis() - lastStart);
lastStart = -1;
}

public String toString() {
return time / 1000f + "";
}

From http://karussell.wordpress.com

Published at DZone with permission of its author, Peter Karussell.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Bob Lee replied on Tue, 2010/03/02 - 10:52am

It's called a histogram sort.

Stephane Vaucher replied on Tue, 2010/03/02 - 11:18am

These O(n) algorithms are generally in O(n + (or *) k) where k is another aspect that you are leveraging to accelerate the sort. The speed up is due to the fact that you are not comparing elements, but rather that you are treating them as ordinal. In counting sort, you are preallocating memory for every possible value. This means that the algorithm cannot scale as the memory requirement should be too large. Bucket-sort, on the other hand, allows O(kn) sorting where you apply k O(n) sorts, but requires less memory.

> Now I wondered: is this necessary in real applications? E.g. somewhere in Java?

If your application has lots of calculations requiring sorting (e.g. scientific applications), and if this is a bottleneck, then of course, it is useful. You might  even combine different sorting algorithms if you want to achieve peak performance. If, on the other hand, you only program DB-bound (fetch the data and present the data)  applications, then no it is not useful since the DB has optimised sorting algorithms (and a order lower constant value). Rule of thumb: use a standard comparison-based sorting algorithm if n is small and doesn't affect performance.

Michael Bien replied on Tue, 2010/03/02 - 1:50pm

Here is a Java - OpenCL demo using the GPU for sorting I added two days ago to the demo repository:

http://github.com/mbien/jocl-demos/tree/master/src/com/mbien/opencl/demos/radixsort/

it overtakes dual pivot quicksort (jdk7) at 32k elements and is already more than 13 times faster with 8Mil. elements. (Heap2GPU and GPU2Heap data transfer included - no cheating).

(bitonic sort is in the repro too)

If you have sorting as bottleneck - its somethimes worth to think out of the box ;)

Alon Aizenberg replied on Wed, 2010/03/03 - 2:38am

Interesting..

When you study computer science in the university you actually prove that there is no sorting algorithm that runs with less than nlog(n) time...  i don't remember the exact proof..

 I need to deepen in this to understand how this is possible. Is it because of the prerequsite that the array is positive integer array?

Peter Karussell replied on Wed, 2010/03/03 - 3:20am in response to: Alon Aizenberg

yes, but more generally you can say, it is because of the range the integer values has to be (max and min value). An analogous sort can be even applied to (uniformly distributed) decimal values within the range [0,1) -> bucket sort.

and: the sorting isn't based on comparison. A comparison based sort always requires Omega(n log n) in the worst case.

See the wikipedia entries of radix sort and counting sort for more information (or the book "introduction to algorithms" from thomas cormen)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.