Search The Blog
My Books

New:

My Songs

 

The Art of Unit Testing

Buy PDF or Print book at Manning

Buy on Amazon

Latest Posts
from 5whys.com
Twitter: @RoyOsherove
About this site

TDD in .NET Online Course

TDD and BDD in Ruby Online Course

 

Subscribe!

This site aims to connect all the dots of my online activities - from tools, books blogs and twitter accounts, to upcoming conferences, engagements and user group talks.

« Expression Boy | Main | MSDN Hands-On Labs »
Tuesday
May132003

Strip HTML tags from a string using regular expressions

Paschal asked me to find a simple solution for stripping HTML tags from a given string using Regular expressions.

The solution is quite simple:

1. Retrieve all the HTML tags using this pattern: <(.|\n)*?>

2. Replace them with an empty string and return the result

Here's a C# function that does this:

private string StripHTML(string htmlString)

{

//This pattern Matches everything found inside html tags;

//(.|\n) - > Look for any character or a new line

// *?  -> 0 or more occurences, and make a non-greedy search meaning

//That the match will stop at the first available '>' it sees, and not at the last one

//(if it stopped at the last one we could have overlooked

//nested HTML tags inside a bigger HTML tag..)

// Thanks to Oisin and Hugh Brown for helping on this one...

string pattern = @"<(.|\n)*?>";

 

return  Regex.Replace(htmlString,pattern,string.Empty);

}

Or with just one line of code:

string stripped = Regex.Replace(textBox1.Text,@"<(.|\n)*?>",string.Empty);

 

PrintView Printer Friendly Version

Reader Comments (10)

wow this works great! THX

December 6, 2010 | Unregistered Commenterm_sobo

Hi Roy,

If you were gay I'd marry you man... you are so awesome looking!

Kev

December 28, 2010 | Unregistered CommenterKevin Johnston

Thank you very much!

December 30, 2010 | Unregistered CommenterAnh Dung

Works for my purposes! Thank you for sharing.

March 2, 2011 | Unregistered CommenterGrady Persell

Hello,

nice and simple, save me a lot of time.

Thanks !!!

Darko

March 19, 2011 | Unregistered CommenterImputo

Thank u very much.. this thread is very useful

April 28, 2011 | Unregistered CommenterDineshKumar

Thanks a lot, you saved me!

May 19, 2011 | Unregistered CommenterCharlie

Will this work if the text contains < or >.I guess it will strip these also.What can be done to consider this scenario?

June 30, 2011 | Unregistered CommenterYogi

@Yogi: If your characters are not escaped as then you would need to use a far more complex regular expression. There is one such example here: http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx

July 21, 2011 | Unregistered CommenterDRead

Great. It's works in javascript: str.replace(/<(.|\n)*?>/g,'')

thanks Roy!!

March 24, 2012 | Unregistered CommenterSeba

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
Web Analytics