Roy Osherove

View Original

System.Text.Encoding.Tip

I learned this the hard way, so you don't have to. This applies to anyone who is going to be using System.IO to read ANSI files that combine Hebrew And English mixed inside.

Now, when I attempted to do this, I used the StreamReader class, which is pretty simple to use:

StreamReader r = new StreamReader(myFile);

string MyText = r.ReadToEnd();

r.Close();

Ah! but there's one important caveat you'll notice when you try to MyText. The hebrew string inside will vanish without a trace, leaving you with a big heart attack...

To fix this problem, you'll need to specify the encoding in which the file is formatted. This is done by specifying one of the Encoding Classes available through System.Text.Encoding.*  like so:

StreamReader r = new StreamReader(myFile,System.Text.Encoding.Default);

What I found is that passing any other encoding type does not work for these files , and either truncates the text or displays garbage.  Since Encoding.Default automatically gets the encoding with the same codepage used by your system it saves you from the trouble of trying to figure this stuff out. I have no borderline cases to check against, but it seems to work perfectly for me.

Conclusion: when  reading an ANSI file, or if you're having problem reading any text file format, first try to read it by passing in Encoding.Default, and only then all the other types.