Q&A - Greedy matching in regular expressions
This came in the mail, thought other folks might be interested.
Hi Roy. I need to check a line of html and make the value of the style attribute lowercase. I've tried to come up with a regex that will work but I keep making the entire line of html lowercase instead of just the stuff in the style value. I can't get the match to end with the correct quote, instead it goes to the last quote on the line. So something like this:
[Tag style="WIDTH:20px; color:blue;" href="blah.com/PageTWO"] I want to change to this:
[Tag style="width:20px; color:blue;" href="blah.com/PageTWO"]
But instead I get this:
[Tag style="width:20px; color:blue;" href="blah.com/pagetwo"]
Because the match ends with the end quote of the href.
If you can point me in the right direction (or having something like this laying around), I would GREATLY appreciate it.
Answer:
It's called "greedy matching" - because it looks for the *last* character.
Try to add a "?" after the quanitiy specifier (probably '*'). That makes the match end on the *first* match.
For example, given the following string as input:
"abcdfgdrbdtargd"
The following greedy regex (greedy by default) will match up until the lasd 'd':
(.*d)
However, this regex will find several matches, the first one is "abcd":
(.*?d)
(you can do without the braces if you want).
I'd also suggest adding two good regex mailing list to your arsenal instead of sending help messages to various people:
http://groups.yahoo.com/group/dotnetregex/
http://lists.aspadvice.com/SignUp/list.aspx?l=68&c=16
There are people there that know a whole lot more than me on regular expressions.