Follow @RoyOsherove on Twitter

Problem: Underlying HTML from a selection inside WebBrowser Control produces different HTML than the original page

here's a problem I've been trying to deal with: the WebBrowser control in .NET 2.0 Winforms.

I'm trying to get the selected text's underlying HTML representation, and it works, kind of.


IHTMLDocument2 doc2 = browser.Document.DomDocument as IHTMLDocument2;
IHTMLTxtRange range = doc2.selection.createRange() as IHTMLTxtRange;

string plainTextSelection = range.text;

string selectionHtmlText = browser.DocumentText;

string fullHtmlText = browser.DocumentText;


My problem is that the html text from the selection range is different from that of the original HTML underlying the whole document. the bolder variable contain different HTML, which obviously produces the same result but one of them was generated dynamically by the web browser control (or so it seems).

So, in the range's html text you might have upper instead of lower cased html tag, different amount of new line chars, different formatting of quotes and double quotes and all sorts of bad stuff including getting the order of html tag attributes different.

Does anyone know of a way to get the "real" HTML from a selection inside the web browser control?

.NET Rocks: Regular Expressions with Roy Osherove

Enter the Larkware Contest