In the xml parser that I’ve been writing for rss/atom feeds I’ve encountered what many people have found; bizarre encoding issues when displaying the data from the database on a webpage. Since this is not really well explained by the searches I did on google I’ll explain it here.
Issue: you have utf-8 data coming from a source, you put it into a utf8_general_ci column of a mysql database table. You read the data from the database and display it as html/xhtml. Instead of getting things like double backquotes or long dashes you get euro signs or umlaut type of characters, usually strings of them instead of the correct format.
Potential solution: use utf8_encode and htmlentities in PHP to clean the data before going into the database. This does not work. Why? Those characters are not covered by html standards since they are above ascii code 126. See here for the full code[Read more...]