>>13
You don't have to guess. A standard compliant (X)HTML file will normally not have anything that cannot be decoded as plain ascii before the meta tag that specifies the charset.
Read http://www.joelonsoftware.com/articles/Unicode.html, that explains it better than I could.
>>1
If you can't solve it, just try to work around it: Decode the HTML entities and save the result. There should be a module in the CPAN that can do this, HTML::Entities or something.