Enterprise Integration Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 527 posts at DZone. You can read more from them at their website. View Full User Profile

Regular Expressions: Non Greedy Matching

  • submit to reddit

I was playing around with some football data earlier in the week and I wanted to try and extract just the name ‘Rooney’ from the following bit of text:

Rooney 8′, 27′

My initial regular expression was the following which annoyingly captures the time of the first goal:

> "Rooney 8′, 27′".match(/(.*)\s\d(.*)/)[1]
=> "Rooney 8,"

It works fine if the player has only scored one goal…

> "Rooney 8′".match(/(.*)\s\d(.*)/)[1]
=> "Rooney"

…but since the second part of the regex (“\s\d(.*)”) appears twice only the last instance of it is matched and the rest of the text gets captured by the first part of the regex.

One way around this is to make the first part of the regex non greedy/lazy so that it will make as little as possible rather than as much as possible.

We can do that by using the lazy version of ‘*’ which is ‘*?’:

> "Rooney 8′, 27′".match(/(.*?)\s\d(.*)/)[1]
=> "Rooney"

Of course you could also argue that I’m being very lazy by using “.*” in the first place and you probably have a point! The following more explicit regular expression achieves the same thing:

> "Rooney 8′, 27′".match(/([A-Za-z\s-]+)\s\d(.*)/)[1]
=> "Rooney"

As a side note I find that when I’m playing around with regular expressions it really makes sense to have a bunch of test cases that I can run after each change to make sure I haven’t inadvertently broken everything.

regexper.com is also really helpful for visualising what the regular expression is actually doing!

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)