Strip HTML tags from a string using regular expressions
Tuesday, May 13, 2003 at 9:41AM Paschal asked me to find a simple solution for stripping HTML tags from a given string using Regular expressions.
The solution is quite simple:
1. Retrieve all the HTML tags using this pattern: <(.|\n)*?>
2. Replace them with an empty string and return the result
Here's a C# function that does this:
private string StripHTML(string htmlString){
//This pattern Matches everything found inside html tags;//(.|\n) - > Look for any character or a new line
// *? -> 0 or more occurences, and make a non-greedy search meaning
//That the match will stop at the first available '>' it sees, and not at the last one
//(if it stopped at the last one we could have overlooked
//nested HTML tags inside a bigger HTML tag..)
// Thanks to Oisin and Hugh Brown for helping on this one...
string pattern = @"<(.|\n)*?>";
return Regex.Replace(htmlString,pattern,string.Empty);
}
Or with just one line of code:
string
stripped = Regex.Replace(textBox1.Text,@"<(.|\n)*?>",string.Empty);





Reader Comments (10)
wow this works great! THX
Hi Roy,
If you were gay I'd marry you man... you are so awesome looking!
Kev
Thank you very much!
Works for my purposes! Thank you for sharing.
Hello,
nice and simple, save me a lot of time.
Thanks !!!
Darko
Thank u very much.. this thread is very useful
Thanks a lot, you saved me!
Will this work if the text contains < or >.I guess it will strip these also.What can be done to consider this scenario?
@Yogi: If your characters are not escaped as then you would need to use a far more complex regular expression. There is one such example here: http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx
Great. It's works in javascript: str.replace(/<(.|\n)*?>/g,'')
thanks Roy!!