Thursday Code Puzzler: Efficient Duplicate Detection
Thursday is code puzzler day here at DZone. The idea is simple: solve the coding problem as efficiently as you can, in any language or framework that you find suitable.
Note: Even though there really is nothing stopping you from finding a solution to this on the internet, try to keep honest, and come up with your own answer. It's all about the participation!
Do you have code puzzlers that you'd like to share with the DZone community? If so, please submit here.
Efficient Duplicate Detection
Give an array of 1001 elements, which includes the numbers 1 to 1000 inclusive in random order, plus one duplicate number, write the most efficient algorithm to detect the duplicate number. Make sure you only access each item in the array a maximum of once.
Catch up on all our previous puzzlers here.






Comments
Mark Priest replied on Wed, 2012/09/19 - 11:56pm
Daniel Seidler replied on Thu, 2012/09/20 - 3:23am
Java
public static int gaussDuplicate(int[] array, int n) { int sum = n * (n + 1) / 2; for (int i : array) sum = sum - i; return -sum;}Call:Mohammad Aq replied on Thu, 2012/09/20 - 3:45am
Daniel code might be a tad more efficent.
Here's mine in Java too:
static int getDuplicate(int[] array) {
final int arraySize = array.length; final int arraySizeMinusOne = arraySize - 1; final int sumMinusDuplicate = ((arraySizeMinusOne * arraySizeMinusOne) + arraySizeMinusOne) / 2; int sum = 0; for (int i = 0; i < arraySize; i++) { sum += array[i]; }Vijay Nathani replied on Thu, 2012/09/20 - 4:48am
Groovy code using open source library Guava
import com.google.common.collect.* def duplicateNumber(numbers) { def f = HashMultiset.create(numbers) Multisets.removeOccurrences(f, HashMultiset.create(1..1000)) f.first() } println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10Vijay Nathani replied on Thu, 2012/09/20 - 4:51am
Daniel's algorithm in Groovy
def duplicateNumber(numbers) { numbers.sum() - (1000 * 1001 / 2) } println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10Nils Rudolph replied on Thu, 2012/09/20 - 6:36am
The puzzle say to solve it in the most efficient way. Daniels solution is pretty fast but it always iterate over the whole array. So i tried a different solution:
public static int detectDuplicateNumber(int[] numbers, int n) { boolean[] numbersSeen = new boolean[n]; for (int i : numbers){ if (numbersSeen[i-1]) { return i; } numbersSeen[i-1] = true; } return -1;}
I did a little micro benchmarking. In comparision to Daniels Solution it is 3 times faster if duplicated numbers are at the front at the array but it is also 3 times slower if they are at the end.
With randomized arrays Daniels Solution performs 2/3 of the tries faster.
I tried to combine both algorithms, but Daniels algorithm was always faster in the average case.
Ewald Horn replied on Thu, 2012/09/20 - 9:06am
in response to:
Nils Rudolph
Ah.
I had this as an interview question once, after solving it, the interviewer asked me if I could improve it - I couldn't.
Here is what he then showed me:
int numbers[] = {1,2,3,4,5,6,7,8,2,10,11,12,13,14,9}; for (int pos = 1; pos < numbers.length; pos++) { numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos; }System.out.println("Duplicate is : " + numbers[numbers.length-1]);Good old XOR can be used if it's just numbers, the final result will be in the last position of the array. If you do the bitwise math, it starts making sense, but that's a LOT of 11111111 and 00000000 math!
Daniel Seidler replied on Thu, 2012/09/20 - 8:43am
in response to:
Nils Rudolph
Vijay Nathani replied on Thu, 2012/09/20 - 9:27am
in response to:
Ewald Horn
Ewald Horn replied on Thu, 2012/09/20 - 9:35am
in response to:
Vijay Nathani
Vijay Nathani replied on Thu, 2012/09/20 - 9:40am
in response to:
Ewald Horn
The algorithm seems to work only if the duplicate is 1. e.g. See code below.
def duplicate(numbers) { for (int pos = 1; pos < numbers.length; pos++) numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos; numbers[numbers.length - 1] } println duplicate([1,2,1,3].toArray()) //prints 1. Right println duplicate([-1,2,-1,3].toArray()) //prints 1. Wrong. println duplicate([1000,2,1000,3].toArray()) //prints 1. Wrong. println duplicate([500,2,500,3].toArray()) //prints 1. Wrong.println duplicate([250,2,250,3].toArray()) //prints 1. Wrong.
Ewald Horn replied on Thu, 2012/09/20 - 10:13am
in response to:
Vijay Nathani
I just checked again:
int numbers[] = {4,2,3,4,5,6,7,8,1,10,11,12,13,14,9}; for (int pos = 1; pos < numbers.length; pos++) { numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos; } System.out.println("Duplicate is : " + numbers[numbers.length-1]);It works fine in Java 1.6.0_33 - could it be a language difference?
Balazs Jakab replied on Thu, 2012/09/20 - 10:14am
I prefer Ewald's solution, but of course if will only work if the numbers satisfies the preconditions: "numbers 1 to 1000 inclusive in random order, plus one duplicate number". So if You add negative numbers or skip any of them from the list, it won't :)
Though I guess it could be a bit improved, since we could calculate the XOR-ed value of number 1-1000 (which is 1000 btw) in advance and store it as a constant in the code. Then we just need to XOR all the values in the array, at the end XOR the precalculated constant and voila, the result is the duplicated number.
Ewald Horn replied on Thu, 2012/09/20 - 10:19am
in response to:
Balazs Jakab
Hi Balazs, thank you, I can't take credit for the solution though - it's just something I learned about along the way. Is there no way to do the same for negative numbers as well?
Balazs Jakab replied on Thu, 2012/09/20 - 10:44am
in response to:
Ewald Horn
Hi!
I guess this algorithm works for negative numbers as well. You just need a continuous interval of numbers, for example -499 to 500 (The precalculated XOR-ed value for this interval is -8)
The trick is that if You XOR a number with itself the result is 0. Also if You XOR something with 0 the result is not changed. So in Your original solution all the numbers are XOR-ed twice (once from the array of numbers and once as the index), except of course the duplicated number which appear three times. Since pairs "kills" each other the full XOR-ed result is the duplicated number.
Ewald Horn replied on Thu, 2012/09/20 - 10:50am
in response to:
Balazs Jakab
Jagannathan Asokan replied on Fri, 2012/09/21 - 5:27am
simon harrer replied on Fri, 2012/09/21 - 5:58am
Vijay Nathani replied on Fri, 2012/09/21 - 6:20am
in response to:
Jagannathan Asokan
A set will be more efficient e.g. Groovy code
def duplicateNumber(numbers) { def f = new HashSet() numbers.find { !f.add(it) } }println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10
Eric Jablow replied on Fri, 2012/09/21 - 2:57pm
In Java:
int duplicatedInteger(int[]array) { Set<Integer> set = new HashSet<Integer>() ; for( int i: array) { if (!set.add(i)) { return i ;} } return -1; //Will neer be executed. }Steve Ck replied on Sun, 2012/09/23 - 11:46pm
in response to:
Vijay Nathani
Vijay Nathani replied on Mon, 2012/09/24 - 12:22am
in response to:
Steve Ck
But the problem says that numbers can be in random order e.g. [ 5, 1, 3, 2, 10, 3]
Balazs Jakab replied on Mon, 2012/09/24 - 2:36am
in response to:
Vijay Nathani
Yes, they can be in random order, but all the numbers for a continuous range should be included. So if You're using numbers from 1 to 10, in Your list You should have 11 numbers: the 10 numbers 1, 2, ... 10 and one duplicate in any order. Your example misses numbers 4, 6, 7, 8, 9, so the algorithm with XORs wont work.
Vijay Nathani replied on Mon, 2012/09/24 - 3:19am
in response to:
Balazs Jakab
Balazs Jakab replied on Mon, 2012/09/24 - 6:10am
in response to:
Vijay Nathani