A Regular Expression HashMap Implementation in Java
Below is an implementation of a Regular Expression HashMap. It works with key-value pairs which the key is a regular expression. It compiles the key (regular expression) while adding (i.e. putting), so there is no compile time while getting. Once getting an element, you don't give regular expression; you give any possible value of a regular expression.
As a result, this behaviour provides to map numerous values of a regular expression into the same value. The class does not depend to any external libraries, uses only default java.util. So, it will be used simply when a behaviour like that is required.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.regex.Pattern;
/**
* This class is an extended version of Java HashMap
* and includes pattern-value lists which are used to
* evaluate regular expression values. If given item
* is a regular expression, it is saved in regexp lists.
* If requested item matches with a regular expression,
* its value is get from regexp lists.
*
* @author cb
*
* @param <K> : Key of the map item.
* @param <V> : Value of the map item.
*/
public class RegExHashMap<K, V> extends HashMap<K, V> {
// list of regular expression patterns
private ArrayList<Pattern> regExPatterns = new ArrayList<Pattern>();
// list of regular expression values which match patterns
private ArrayList<V> regExValues = new ArrayList<V>();
/**
* Compile regular expression and add it to the regexp list as key.
*/
@Override
public V put(K key, V value) {
regExPatterns.add(Pattern.compile(key.toString()));
regExValues.add(value);
return value;
}
/**
* If requested value matches with a regular expression,
* returns it from regexp lists.
*/
@Override
public V get(Object key) {
CharSequence cs = new String(key.toString());
for (int i = 0; i < regExPatterns.size(); i++) {
if (regExPatterns.get(i).matcher(cs).matches()) {
return regExValues.get(i);
}
}
return super.get(key);
}
}
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)






Comments
Kaan yy replied on Wed, 2012/04/11 - 1:16am
Since your are extending hashmap; list of regExPatterns and regExValues are not needed. why don't you call superior put with the compiled key and value. this would also simplify the matching get.
in fact this is a must. currently you are discarding map interface with this implemantation. what if i call remove method...
Oliver Plohmann replied on Wed, 2012/04/11 - 2:30am
Hi Cagdas,
nice idea! What do you exactly mean with compiling a pattern? The get method iterates over all patterns till a matching one is found. So the get is not really logarithmically in time. Also made me think how to improve this, but this seems not easy ...
Lance Semmens replied on Wed, 2012/04/11 - 2:38am
This is a terrible implementation of a map.
1. You call it RegExHashMap but there is no hashing involved. Using the word hash should mean it has performance of O(1) but your's has performance of O(N).
2. Your put method does not call super.put() so all of the other methods on map (remove, size, putAll, values, entrySet etc) don't work. You would be better to implement java.util.Map and throw UnsupportedOperationException() for all other methods than what you are doing here.
3. Why not RegExHashMap<V> extends HashMap<CharSequence, V>?
4. new String(key.toString()). Please explain what new String achieves (string is immutable)
5. Your two lists are not needed if extending HashMap
6. Your implementation allows multiple entries with the same key
7. Your implementation throws NPE for null keys
I could go on and on...
Lance Semmens replied on Wed, 2012/04/11 - 2:59am
in response to:
Oliver Plohmann
Fabrizio Giudici replied on Wed, 2012/04/11 - 5:23am
Given that we're commenting regex maps, I've been using another home-made since a few time. It works for my specific projects that needs it, but I never validated it against the Map contract. I'd like to get some feedback just while you're still hot about the topic ;-). It is not optimized with pattern compilation, but it does super.put() and has a shortcut for gets whose arg is not a regexp (which in my use cases happens 80% of the time). My impression is that designing an implementation that is optimized for a lot of generic cases *and* it complies with Map contract is a hard task.
public class RegexTreeMap<Type> extends TreeMap<String, Type> { @Nonnull public static String escape (final @Nonnull String string) { final StringBuilder builder = new StringBuilder(); for (int i = 0; i < string.length(); i++) { final char c = string.charAt(i); if ("[\\^$.|?*+()".contains("" + c)) { builder.append('\\'); } builder.append(c); } return builder.toString(); } @Override public Type put (final @Nonnull String string, final Type value) { return super.put(Pattern.quote(string), value); } public Type putRegex (final @Nonnull String regex, final Type value) { return super.put(regex, value); } @Override public Type get (final @Nonnull Object value) { final String stringValue = (String)value; Type result = super.get(Pattern.quote(stringValue)); // first try a direct match that is fast int matchLength = 0; if (result == null) // otherwise returns the longest match { for (final Entry<String, Type> entry : super.entrySet()) { final String regex = entry.getKey(); if (stringValue.matches(regex) && (regex.length() > matchLength)) { result = entry.getValue(); matchLength = regex.length(); } } } return result; } }Liam Knox replied on Wed, 2012/04/11 - 8:43am
1. Why do this?
2. You shoud delegate and never inhertit ( not flexible )
Also what are you trying to do?
Map is a great 'private' classs. As a public return is a whore. Never expose it
Andrew Gilmartin replied on Wed, 2012/04/11 - 3:34pm
Lance Semmens replied on Thu, 2012/04/12 - 4:41am
in response to:
Fabrizio Giudici
Fabrizio Giudici: Your escape method is not needed. Java regexes treat anything wrapped in \Q and \E as a literal.
Eg "\\Q+-:(?*\\E".matches("+-:(?*") == true
Lance Semmens replied on Thu, 2012/04/12 - 5:06am
in response to:
Andrew Gilmartin
Andrew Gilmartin: I think that combining all of the regexes into a single regex (or'red together) will just end up with the same O(N) performance but in a single regex rather than looping muiltiple regexes.
It's pretty hard to get rid of the O(N) performance for this very general RegexMap. To improve performance, I think you would need to think about your incoming data and then try to group your regexes in buckets. You could then use a hash lookup for the bucket.
For example, you might use the first three characters to lookup the bucket.
Another example is for a URL, you might choose to split the input URL's on "/" and have a tree hierarchy of regexes that only apply to a single level at a time (instead of multiple regexes that work on the entire URL). I'm sure that SpringMVC has a solution to this, I wonder how they have done it?
Reto Bachmann-Gmür replied on Mon, 2012/12/17 - 5:12pm
The basic idea is that the first regex is converted to a DFA and with every new key that is added the DFA is modified to accept the strings that are matched by the new regex as well. The exit states are mapped to the values.
The code is here: https://github.com/retobg/regexmap. It currently only supports a very limited subset of the regex syntax and it certainly has bugs.