Felix Dahlke is a software developer particularly interested in games and open source. He writes most of his code in JavaScript, C++ and Python. Felix is a DZone MVB and is not an employee of DZone and has posted 16 posts at DZone. You can read more from them at their website. View Full User Profile

Regular Expressions

05.25.2011
| 7690 views |
  • submit to reddit

Earlier this month, I expressed my astonishment about the fact that the majority of software developers I’ve worked with in the last seven years doesn’t know the first thing about regular expressions:

fhd%3A It's amazing how many actual developers don't know regular expressions. It's like carpenters who don't know about hammers.

As you might have guessed, I regard regular expressions as a fundamental element of every programmer’s toolbox. However, I’m not good with metaphors, and I don’t know the first thing about carpentry, so hammer missed the point. Thomas Ferris Nicolaisen found a better analogy:

tfnico%3A %40fhd I would rather say it's like carpenters who don't know circle-saws %3B)

He’s right: Regular expressions are a specialised way to work with text, mostly relevant to programmers – not everyone who works with text in general.

Most of the other replies I got indicated that although they knew (or had once known) regular expressions, they rarely used it nowadays. I think that’s a shame, so I decided to share what I do with regular expressions on a daily basis, maybe you’ll find it useful. I do use them in code occasionally, but what I do all the time, whether I use an editor or an IDE, is searching and replacing. If you’re not at all familiar with regular expressions, I suggest this reference to make sense of the remainder of this post.

Searching

I sometimes mention that I grew up with UNIX and that’s true. One of the first things I learned about programming was how to use the tools of the Linux command-line, like grep, which is a command that allows you to search the contents of one or more files with a regular expression.

I can’t come up with a convincing example because I mostly use regular expression searching in conjunction with replacing, rarely alone. But imagine you’re trying to search for a specific string in JavaScript, but forgot which String delimiter (‘ or “) you used. Here’s the grep command:

grep -R "[\"']Some string[\"']" /path/to/your/webapp

Naturally, you don’t have to grow a beard and become a CLI geek to harness the power of regular expression searching, here’s how you do the exact same thing in Eclipse:

Regular expression search in Eclipse

Replacing

As mentioned above, I use regular expressions mostly for searching and replacing, a very powerful technique that saved me countless hours of mind-numbing, repetetive typing. Have you ever heard a co-worker make the same keyboard sound many times in a row? Like moving the cursor to the next line, placing it at the beginning and pressing CTRL+V? I’m a lazy person, and I can’t stand repetetive typing tasks. Fortunately, you can avoid the majority of these with regular expressions.

Here’s an example of how regular expression search and replace speeds up refactoring. We had a whole lot of test cases that looked like this:

assertThat(RomanNumerals.convert(1), is("I");
...
assertThat(RomanNumerals.convert(5), is("V");
...
assertThat(RomanNumerals.convert(10), is("X");

Too much duplication, so we created a method assertRomanNumeralEquals() to get rid of that:

private static void assertRomanNumeralEquals(String roman, int arab) {
    assertThat(RomanNumerals.convert(arab), is(roman));
}

Eclipse was able to extract the method for us, but it wasn’t able to make all the assertThat() invocations use the new method instead. So that’s where regular expression replacement comes in handy even in a sophisticated IDE. I replaced the following expression:

assertThat\(RomanNumerals.convert\((.*)\),\ is\((".*")\)\);

With this:

assertThatRomanNumeralEquals(\2, \1);

This is how it looks in Eclipse (select the lines to which you want to apply this before opening the find/replace dialog):

Regular expression search and replace in Eclipse

The expression might look a bit intimidating if you’re not used to regular expressions, but you will be able to write down something like this in no time if you practice them.

In case you’re wondering, this is also possible on the command-line, with the sed command.

Conclusion

Regular expressions are a powerful tool for processing and editing text, automated or interactively. If you use them habitually, you will have learned something for life, because every reasonable editor and IDE supports them. However, regular expressions are not standarised, so there are slight differences between Perl, Java etc. You might have noticed that there  are also some minor differences between grep and Eclipse in the first example above. This is sometimes good for a few short confusions, but it has never hurt my productivity notably.

Speaking of productivity; although regular expressions will probably not make you write code faster, they can significantly increase refactoring speed, a task on which I find myself working most of the time. How much time do you spend actually writing code down? And how much time do you spend editing existing code? I think the ratio is at least 1:10 in my case. If you are able to refactor fast, you will refactor more often, which is likely to improve design and maintainability of your code.

If you, however, decide to ignore regular expressions until you find a situation in which you really need them (that might never happen, you can always find a workaround), you are entering a negative feedback loop: You are not very familiar with them, so if you are faced with problems, they don’t come to mind and you don’t use them. If you don’t use them regularly, you will never become familiar with them. Searching and replacing is an ideal way to break that loop, so I suggest you try it.

 

From http://ubercode.de/blog/regular-expressions

Published at DZone with permission of Felix Dahlke, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

darryl west replied on Fri, 2011/05/27 - 12:35am

if you don't know the basics of RE, you are simply a "fake" programmer--nothing more than a script kitty.  man up and learn the basics.

Matthieu Brouillard replied on Fri, 2011/05/27 - 4:27am

I fully agree with you regarding the fact a lot of developper do not use/know enough such a powerfull tool that are Regular Expressions.

For your particular case, you could have ended in the same result by using step-by-step pure refactoring within Eclipse for example. Doing this is a good exercise during katas to learn the real power of refatcoring functionnalities of your preferred IDE.

Slava Lo replied on Sun, 2011/05/29 - 2:45am

"With the great power (of RE) comes great responsibility" (c)
Next time you create marvelous silver bullet Regular Expression, think about how long is it going to take you to understand it again 3-6 months later down the track, when some bug will need to be fixed, also think about other developers in the team who may need to understand your regexp to implement a change request.

Felix Dahlke replied on Mon, 2011/05/30 - 4:35am in response to: Slava Lo

That's very true. I considered myself to be quite good at reading regex, but when I had to read some Perl code recently that used obscure things like lookahead/lookbehind and conditionals, it took ages for me to understand what it was supposed to do :)

Philippe Lhoste replied on Mon, 2011/05/30 - 8:24am

"think about other developers in the team who may need to understand your regexp to implement a change request"

That's also true when you use some design pattern / algorithm / clever code.

Good developers should know about these tools and should have no difficulty understanding them...

Of course, concise code provided by regexes or other tools shouldn't mean comments should be skipped. Good documentation of code is still a must.

I rarely use regexes in Java code, but when I do, beyond the simplest expressions, I take care to explain them.

Now, I indeed often use them in tools, the example given in code is indeed a common scenario. And regexes can be also useful in UI (even if it is a bit advanced), for example to filter columns in a table.

Alosh Bennett replied on Thu, 2011/06/09 - 2:51am

I felt liberated the day I learnt RE. Most of the REs used are in a use-and-throw fashion for grepping through the logs, or occassionally modifying the code. Whatever little RE that is used inside the code (java world) is and should always be supplimented by a comment usually a LOT bigger than the expression.

 

What is the pitfall of falling in love with RE? You tend to treat everything as string which inherently means you are losing the semantics.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.