I've been a zone leader with DZone since 2008, and I'm crazy about community. Every day I get to work with the best that JavaScript, HTML5, Android and iOS has to offer, creating apps that truly make at difference, as principal front-end architect at Avego. James is a DZone Zone Leader and has posted 639 posts at DZone. You can read more from them at their website. View Full User Profile

Thursday Code Puzzler: Efficient Duplicate Detection

09.19.2012
| 5020 views |
  • submit to reddit

Thursday is code puzzler day here at DZone. The idea is simple: solve the coding problem as efficiently as you can, in any language or framework that you find suitable.

Note: Even though there really is nothing stopping you from finding a solution to this on the internet, try to keep honest, and come up with your own answer.  It's all about the participation!

Do you have code puzzlers that you'd like to share with the DZone community?  If so, please submit here. 

Efficient Duplicate Detection

Give an array of 1001 elements, which includes the numbers 1 to 1000 inclusive in random order, plus one duplicate number, write the most efficient algorithm to detect the duplicate number. Make sure you only access each item in the array a maximum of once. 

Catch up on all our previous puzzlers here.

Comments

Mark Priest replied on Wed, 2012/09/19 - 11:56pm

What should the output of the algorithm be?  The duplicate number and its second position in the array?

Daniel Seidler replied on Thu, 2012/09/20 - 3:23am

Java 

public static int gaussDuplicate(int[] array, int n) {
   int sum = n * (n + 1) / 2;
   for (int i : array) sum = sum - i;
   return -sum;} 
Call:
gaussDuplicate(randomarray, 1000) 
 

Mohammad Aq replied on Thu, 2012/09/20 - 3:45am

Daniel code might be a tad more efficent.

 

 Here's mine in Java too:

 static int getDuplicate(int[] array) {

		final int arraySize = array.length;
		final int arraySizeMinusOne = arraySize - 1;
		final int sumMinusDuplicate = ((arraySizeMinusOne * arraySizeMinusOne) + arraySizeMinusOne) / 2;
		int sum = 0;

		for (int i = 0; i < arraySize; i++) {
			sum += array[i];
		}

Vijay Nathani replied on Thu, 2012/09/20 - 4:48am

Groovy code using open source library Guava

import com.google.common.collect.*
def duplicateNumber(numbers) {
	def f = HashMultiset.create(numbers)
	Multisets.removeOccurrences(f, HashMultiset.create(1..1000))
	f.first()
}
println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10

Vijay Nathani replied on Thu, 2012/09/20 - 4:51am

Daniel's algorithm in Groovy

def duplicateNumber(numbers) {
	numbers.sum() - (1000 * 1001 / 2)
}
println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10

Nils Rudolph replied on Thu, 2012/09/20 - 6:36am

The puzzle say to solve it in the most efficient way. Daniels solution is pretty fast but it always iterate over the whole array. So i tried a different solution:

 

public static int detectDuplicateNumber(int[] numbers, int n) {
	boolean[] numbersSeen = new boolean[n];
	
	for (int i : numbers){
	   if (numbersSeen[i-1]) {
		   return i;
	   }
	   numbersSeen[i-1] = true;
   }
   return -1; 

 I did a little micro benchmarking. In comparision to Daniels Solution it is 3 times faster if duplicated numbers are at the front at the array but it is also 3 times slower if they are at the end.

With randomized arrays Daniels Solution performs 2/3 of the tries faster.

 I tried to combine both algorithms, but Daniels algorithm was always faster in the average case. 

 

Ewald Horn replied on Thu, 2012/09/20 - 9:06am in response to: Nils Rudolph

Ah.
I had this as an interview question once, after solving it, the interviewer asked me if I could improve it - I couldn't.
Here is what he then showed me:

int numbers[] = {1,2,3,4,5,6,7,8,2,10,11,12,13,14,9}; 

for (int pos = 1; pos < numbers.length; pos++) 
{ 
  numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos; 
} 
System.out.println("Duplicate is : " + numbers[numbers.length-1]);  

Good old XOR can be used if it's just numbers, the final result will be in the last position of the array. If you do the bitwise math, it starts making sense, but that's a LOT of 11111111 and 00000000 math!

Daniel Seidler replied on Thu, 2012/09/20 - 8:43am in response to: Nils Rudolph

You can otimize your algorithm by using byte[] instead of boolean[]. 

Vijay Nathani replied on Thu, 2012/09/20 - 9:27am in response to: Ewald Horn

This algorithm will work here. The algorithm fails for negative numbers.

Ewald Horn replied on Thu, 2012/09/20 - 9:35am in response to: Vijay Nathani

True - and it won't work for floating point numbers either! The spec was "the numbers 1 to 1000 inclusive in random order, plus one duplicate number" so I'm happy with it, but I do admit that it's very far from perfect. Do you know more about bitwise operations - my knowledge is very basic I'm afraid and I'm curious to know if one can do something similar for negative numbers?

Vijay Nathani replied on Thu, 2012/09/20 - 9:40am in response to: Ewald Horn

The algorithm seems to work only if the duplicate is 1. e.g. See code below.

def duplicate(numbers) {
	for (int pos = 1; pos < numbers.length; pos++) 
		numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos; 
	numbers[numbers.length - 1]
}
println duplicate([1,2,1,3].toArray())   //prints 1. Right
println duplicate([-1,2,-1,3].toArray()) //prints 1. Wrong.
println duplicate([1000,2,1000,3].toArray()) //prints 1. Wrong.
println duplicate([500,2,500,3].toArray()) //prints 1. Wrong.

println duplicate([250,2,250,3].toArray()) //prints 1. Wrong. 

Ewald Horn replied on Thu, 2012/09/20 - 10:13am in response to: Vijay Nathani

I just checked again:

 

    int numbers[] = {4,2,3,4,5,6,7,8,1,10,11,12,13,14,9};


    for (int pos = 1; pos < numbers.length; pos++)
    {
      numbers[pos] = numbers[pos] ^ numbers[pos-1] ^ pos;
    }

    System.out.println("Duplicate is : " + numbers[numbers.length-1]);

  

It works fine in Java 1.6.0_33 - could it be a language difference?

 

Balazs Jakab replied on Thu, 2012/09/20 - 10:14am

I prefer Ewald's solution, but of course if will only work if the numbers satisfies the preconditions: "numbers 1 to 1000 inclusive in random order, plus one duplicate number". So if You add negative numbers or skip any of them from the list, it won't :)
Though I guess it could be a bit improved, since we could calculate the XOR-ed value of number 1-1000 (which is 1000 btw) in advance and store it as a constant in the code. Then we just need to XOR all the values in the array, at the end XOR the precalculated constant and voila, the result is the duplicated number.

Ewald Horn replied on Thu, 2012/09/20 - 10:19am in response to: Balazs Jakab

Hi Balazs, thank you, I can't take credit for the solution though - it's just something I learned about along the way. Is there no way to do the same for negative numbers as well? 

Balazs Jakab replied on Thu, 2012/09/20 - 10:44am in response to: Ewald Horn

Hi!

 

I guess this algorithm works for negative numbers as well. You just need a continuous interval of numbers, for example -499 to 500 (The precalculated XOR-ed value for this interval is -8)
The trick is that if You XOR a number with itself the result is 0. Also if You XOR something with 0 the result is not changed. So in Your original solution all the numbers are XOR-ed twice (once from the array of numbers and once as the index), except of course the duplicated number which appear three times. Since pairs "kills" each other the full XOR-ed result is the duplicated number.

Ewald Horn replied on Thu, 2012/09/20 - 10:50am in response to: Balazs Jakab

Fascinating! I'll have to do some more bitwise exercises, it's amazing what one can do with the math. Thank you for sharing.

Jagannathan Asokan replied on Fri, 2012/09/21 - 5:27am

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Random;
public class EfficientDuplicateDetection {
public static void main(String[] args) {
ArrayList randomList = new ArrayList();
Random randomGen = new Random();
for(int i=0;i<1001;i++) {
randomList.add(new Integer(randomGen.nextInt(1000)));
}
HashMap duplicateIdentifier = new HashMap();
int countDuplicate =0;
for(int i=0;i<1001;i++) {
Integer listVal = (Integer)randomList.get(i);
if(duplicateIdentifier.containsKey(listVal)) {
System.out.println("Duplicate Found " + listVal);
countDuplicate++;
} else {
duplicateIdentifier.put(listVal, listVal);
//System.out.println(listVal);
}
}
System.out.println("Total Number of Duplicates  " + countDuplicate);
}
}

simon harrer replied on Fri, 2012/09/21 - 5:58am

 

def detect_duplicate(array)
  array.inject(:+) - 500500
end 

# Usage
my_array = (1..1000)
my_array << 6 
assert_equal 6, detect_duplicate(my_array)

 

Vijay Nathani replied on Fri, 2012/09/21 - 6:20am in response to: Jagannathan Asokan

A set will be more efficient e.g. Groovy code

def duplicateNumber(numbers) {
	def f = new HashSet()
	numbers.find { !f.add(it) }
}

println duplicateNumber(new ArrayList(1..1000).plus(1,10)) //prints 10 

Eric Jablow replied on Fri, 2012/09/21 - 2:57pm

In Java:

 

int duplicatedInteger(int[]array) { 

  Set<Integer> set = new HashSet<Integer>() ; 

  for( int i: array) {

    if (!set.add(i)) { return i ;}

  }

  return -1; //Will neer be executed. 

} 

 

Steve Ck replied on Sun, 2012/09/23 - 11:46pm in response to: Vijay Nathani

Vijay, I think your code is correct, but your tests on line 7-9 are incorrect.  A key part of the problem is that the range is continuous. When that is the case there are always pairs of values and positions that will XOR to zero for all but the duplicate.

Vijay Nathani replied on Mon, 2012/09/24 - 12:22am in response to: Steve Ck

But the problem says that numbers can be in random order e.g. [ 5, 1, 3, 2, 10, 3] 

Balazs Jakab replied on Mon, 2012/09/24 - 2:36am in response to: Vijay Nathani

Hi!
Yes, they can be in random order, but all the numbers for a continuous range should be included. So if You're using numbers from 1 to 10, in Your list You should have 11 numbers: the 10 numbers 1, 2, ... 10 and one duplicate in any order. Your example misses numbers 4, 6, 7, 8, 9, so the algorithm with XORs wont work.

Vijay Nathani replied on Mon, 2012/09/24 - 3:19am in response to: Balazs Jakab

Yes, you are right. The algorthim will work for all numbers except negative numbers. Thanks.

Balazs Jakab replied on Mon, 2012/09/24 - 6:10am in response to: Vijay Nathani

Well, actually - as I've wrote in a previous comment - it even works for negative numbers. You just have to be sure that indexes goes through the same range as randomly ordered numbers. So at the end each number appears twice: once from the list of randomly ordered numbers and once from the index. Except of course the duplicated number which appears three times and therefore remains at the end.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.