I've been a zone leader with DZone since 2008, and I'm crazy about community. Every day I get to work with the best that JavaScript, HTML5, Android and iOS has to offer, creating apps that truly make at difference, as principal front-end architect at Avego. James is a DZone Zone Leader and has posted 639 posts at DZone. You can read more from them at their website. View Full User Profile

Thursday Code Puzzler: Needles In the Haystack

05.16.2012
| 3600 views |
  • submit to reddit

By now you should be familiar with our Thursday Code Puzzler slot. The idea is simple: solve the coding problem as efficiently as you can, in any language or framework that you think is suitable.

Note: Even though there really is nothing stopping you from finding a solution to this on the internet, try to keep honest, and come up with your own answer.  It's all about the participation!

Find The Needles in the Haystack

This week it's another classic string based question - to find the number of occurances of a given String (needle) in another (haystack). So the method signature would look like:

public int findNeedleOccurances(String needle, String haystack)

Try to make the solution as efficient as possible, and if you do get to run it post up your timings, or even better, the Big O notation for your solution.

 

Comments

Chirag Visavadia replied on Thu, 2012/05/17 - 6:03am

public static final int findNeedleOccurances(String needle, String haystack) {

	        final int needleLength = needle.length() - 1;
		final int haystackLength = haystack.length() - 1;
		int needleCount = 0;

		/**
		 * Iterate through entire haystack string
		 */
		for (int i = 0; i <= haystackLength; i++) {

			/**
			 * If not enough char to compare then break : reduce unnecessary
			 * looping
			 */
			if (haystackLength - i < needleLength)
				break;

			/**
			 * Iterate through entire needle string
			 */
			for (int j = 0; j <= needleLength; j++) {

				/**
				 * If first char and last char not match then break : reduce
				 * unnecessary looping
				 */
				if (haystack.charAt(i + j) != needle.charAt(j) || haystack.charAt(i + needleLength) != needle.charAt(needleLength))
					break;

				/**
				 * Avoid to compare last char as we already compare in above
				 * condition
				 */
				if (needleLength - 1 == j) {
					needleCount++;
					break;
				}
			}
		}

		return needleCount;
	}

Ian Mcintosh replied on Thu, 2012/05/17 - 8:59am

def findNeedles(needle: String, haystack: String) = {

  haystack.sliding(needle.size).filter(_==needle).size

}

Mike P(Okidoky) replied on Thu, 2012/05/17 - 12:05pm

(I see that using Chromium on Linux I now have access to a wysiwyg editor. Although it has bugs, I could work around it.)

 For this sort of thing, I prefer operating on arrays directly, because it produces shorter easier faster performing code. Coming from an assmebler background way back when, I know that --a >= 0 is faster than ++a < end, so I made everything go backwards.

I had some other ideas on this that I didn't pursue. Some sort of hashing technique might be possible. If the search is performed repeatedly and the haystack stays the same, it should be possible to create a series of hash codes for each offset, at either all lenghths, or a set of lenghts. Then on a search the hash would be calculated on the needle and then all the candidates could be looked up more quickly, after which point you'd only be comparing char regions on those candidates. Or (and/or) perhaps it might be possible to break things down into some sort of tree/hierarchy.

A repeated search isn't a part of the requirement, so I'm leaving it with a brute force method with a focus on performance. The code would translate to assembler quite nicely as well, if performance is a requirement. In that case I would not want to compare 16 bit characters one by one, but using MMX instructions or larger 64 bit chunks, padded with the left overs.

Another idea just popped up. I could represent the strings differently. Instead of comparing character by character, I could take bit 0 (LSB) from 64 characters and put it in a long. The next long would represent bit 1 of those 64 characters. This would be "clipped" to the limit of what characters are available. Using an "and" instruction I could make it compare up to 64 characters with one significant bit. The chances that all bits 0 of all characters to compare match, almost always only happens when there is a match, probably. So two "ands" and one compare of two longs would probably eliminate an unmatched string. If matched, it proceeds with either a conventional string compare character by character, or do more of these bit-ized comparisons. The code below doesn't do any of this, but I'd like to share my thoughts about this, because I might not get around implementing this.

 

public static int findNeedleOccurances(String needle, String haystack)
{
	char[] n = needle.toCharArray(), h = haystack.toCharArray();
	int nl = needle.length(), count = 0;
	char ne = n[--nl];
	
outer:
	for (int hi = haystack.length(); hi > 0; )
	{
		if (h[--hi] != ne)
			continue;
		for (int hc = hi, nc = nl; nc > 0; )
		{
			if (h[--hc] != n[--nc])
				continue outer;
		}
		count++;
	}

	return count;
}

 

 

Vijay Nathani replied on Thu, 2012/05/17 - 11:07am

Groovy is given below

def findNeedleOccurances(String needle, String haystack) {
    return haystack.count(needle)
}
println findNeedleOccurances("a","baca")

 Basically, it is a builtin function in string class itself. So no code is needed.

Mike P(Okidoky) replied on Thu, 2012/05/17 - 11:42am in response to: Vijay Nathani

How does the Groovy library based implementation do the calculation? Is it possible to copy-paste that code here?

Vijay Nathani replied on Thu, 2012/05/17 - 10:40pm in response to: Mike P(Okidoky)

Groovy 1.8.4 in the file "src\main\org\codehaus\groovy\runtime\DefaultGroovyMethods.java" codes this function as:

public static int count(String self, String text) {
    int answer = 0;
    for (int idx = 0; true; idx++) {
        idx = self.indexOf(text, idx);
        if (idx >= 0) {
            ++answer;
        } else {
            break;
        }
    }
    return answer;
}

Mike P(Okidoky) replied on Sun, 2012/05/20 - 12:46am in response to: Vijay Nathani

That Groovy code has a bug, because haystack="abbabba", needle="abba", results in 1, while in fact it should result in 2. "abba" appears in "abbabba" twice, not once.

 

Vijay Nathani replied on Sun, 2012/05/20 - 3:31pm in response to: Mike P(Okidoky)

The groovy code does not have the bug. I ran it. The following code prints 2, not 1.

#!/usr/bin/env groovy

def findNeedleOccurances(String needle, String haystack) {
	return haystack.count(needle)
}
println findNeedleOccurances("abba","abbabba")

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.