Dustin has posted 1 posts at DZone. View Full User Profile

Create Regular Expressions in XML With the Regexml Open Source Library

05.10.2010
| 1038 views |
  • submit to reddit

Regular expressions are great at parsing portions of text out of a string or determining whether text matches a specific pattern. However, this power comes at a cost. Regular expressions can be very complex to write, hard to document, and difficult to understand. The Regexml project provides a simple way to define and document complex regular expressions in XML. For example, this simple Regexml expression defines a zip (postal) code:

<regexml xmlns="http://schemas.regexml.org/expressions">
<expression id="zipcode">
<start/>
<match equals="\d" min="5" capture="true"/> <!-- 5 digit zip code -->
<group min="0">
<match equals="-"/>
<match equals="\d" min="4" capture="true"/> <!-- optional "plus 4" -->
</group>
<end/>
</expression>
</regexml>

After consuming this XML, the Regexml library creates and caches a standard java.util.regex.Pattern object that can be used to parse data out of text or determine if the text matches a pattern. The capture attribute in the XML above indicates the portions of the text that should be parsed out and made available to the client application. The equivalent regular expression looks like this:

^(\d{5})(?:-(\d{4}))?$

Though the traditional regular expression is far shorter, it's brevity and cryptic symbols make it harder to read and understand. Of course, for simple expressions like this one, a traditional regular expression specified in the code may be most appropriate. However, as expressions become more complex, the ability to document them in-line, employ whitespace to show hierarchy, and use expressive attributes rather than symbols can simplify maintenance and debugging. For more information about the open source Regexml project, see the overview and comprehensive introduction at:

 http://www.regexml.org/

0
Published at DZone with permission of its author, Dustin Callaway.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)