As principal partner of DataCurl LLC, Dan Wilson runs both the consulting practice and, a way to help employees start and stick with healthier lifestyles. Before launching DataCurl, Dan held numerous senior program and development positions in such industries as Technical Consulting, Health Care, Online Publishing and Government Contracting. Dan is an avid participant in technology communities; an Adobe Community Professional, manager of the Triangle ColdFusion User Group in Research Triangle Park, North Carolina, Managing Director of the popular Model-Glue framework and contributor to numerous open source projects based on ColdFusion, Flex and AIR platforms. Dan presents on ColdFusion, Flex and Rapid Development Techniques at popular conferences around the world. You can find his thoughts on ColdFusion, Flex, AIR and other technology matters at and some occasional ramblings on food at Dan has posted 33 posts at DZone. You can read more from them at their website. View Full User Profile

So You Wanna Learn Regex? - Part 6

  • submit to reddit

Welcome to So You Wanna Learn Regex? Part 6. OK, I know I said part 5 would be the last part in the series, but I just had to work this one out and wanted to share. Remember, If you want more tutorials about regex, especially more advanced ones than the mickey mouse onces here, go bug Ben. He knows more about this than I ever will and I hear he has a blog...

In our last exercise, we looked at cleaning up some data scripts.

In this exercise, we are going to reformat a configuration file from .ini style to ColdSpring MapFactory style. Specifically, I'm integrating CFFormProtect into an application and I want the config to be managed in ColdSpring with the rest of my configurations. Sure, I could go flapping around with copy+paste, smashing keys, burning tendons, but that seems so Junior Programmerish, doesn't it?


Assume this set of declarations:


What we want, is to turn:mouseMovement=1 into: <entry key="mouseMovement"><value>1</value></entry>

Note we've split a string delimted by an equals sign into some XML nodes.

So as you know, we define this pattern in the gobbledegook of regular expressions. When read one chunk at a time, these actually make sense. We'll go through the exercise, then look at why it worked.

In Eclipse, perform the following:

  1. Open a new file and paste the above set of declarations: (yes, the whole thing)
  2. Open the find dialogue (I use CTRL+F) and make sure the Regular Expression option is ticked
  3. Enter the following in the Find: Input(.*[^=])=(.*)
  4. Enter the following in the Replace: Input<entry key="$1"><value>$2</value></entry>
  5. Press Find and make sure the pattern matches what we want
  6. Lastly, press Replace All

You Should Have This:

<entry key="mouseMovement"><value>1</value></entry>
<entry key="usedKeyboard"><value>1</value></entry>
<entry key="timedFormSubmission"><value>1</value></entry>
<entry key="hiddenFormField"><value>1</value></entry>
<entry key="akismet"><value>0</value></entry>
<entry key="tooManyUrls"><value>1</value></entry>
<entry key="teststrings"><value>1</value></entry>
<entry key="projectHoneyPot"><value>0</value></entry>
<entry key="timedFormMinSeconds"><value>5</value></entry>
<entry key="timedFormMaxSeconds"><value>3600</value></entry>
<entry key="encryptionKey"><value>JacobMuns0n</value></entry>
<entry key="akismetAPIKey"><value></value></entry>
<entry key="akismetBlogURL"><value></value></entry>
<entry key="akismetFormNameField"><value></value></entry>
<entry key="akismetFormEmailField"><value></value></entry>
<entry key="akismetFormURLField"><value></value></entry>
<entry key="akismetFormBodyField"><value></value></entry>
<entry key="tooManyUrlsMaxUrls"><value>6</value></entry>

(if not, you missed a step. Look at the image and compare with what you have in your Find/Replace dialog. Make sure there is no extra whitespace in the find expression)

Blamo! The configuration data has changed from the *ini format to the ColdSpring XML format. Look at how much money you saved from having to ice down your wrists. Let's decode the code, shall we?

Here is the find portion of the regular expression: (.*[^=])=(.*)

  • ()  The first character chunk is surrounded by parenthesis. This means we'll be defining an assignable group.
  • .*[^=]  Inside the first set of parenthesis are .* meaning all characters, then followed by [^=] which excludes an equals sign. So (.*[^=]) means starting at the beginning, give me an assignable group of all characters until you hit an equals sign.
  • =  Next we have the equals sign, because this is the boundary marking the second group to define.
  • (.*)  Next is a chunk surrounded by parenthesis. This means we'll be defining another assignable group
  • .*  Inside the second set of parenthesis are .* meaning all characters. Since there is nothing else, we want everything up until the end of the line.
All of that defines the boundaries for a character walking regular expression gnome to start at the beginning of each line, grab a first group of characters before the equals sign, and a second group of characters after the equals sign and name them $1 and $2 for us.

Then in the Replace section, we used: <entry key="$1"><value>$2</value></entry>

  • This is just structured xml with the group numbers in the right places.


So in plain English, we asked the regular expression find/replace gnome to: Bust up each line into a pre-equals sign and post-equals sign groups and then put the content for each group inside the XML litteral.

I'm sure you can agree this was much easier than a copy/paste extravaganza... I hope you enjoyed this (extended) blog series on Regex. If you want more of these, go bug Ben Nadel... His brain is a mobius strip of interesting regular expression patterns...


Published at DZone with permission of its author, Dan Wilson.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)