Showing entries 1 to 2
Displaying posts with tag: encoding (reset)
Character Sets: Migrating to utf8mb4 with pt_online_schema_change

Modern applications often feature the use of data in many different languages. This is often true even of applications that only offer a user facing interface in a single language. Many users may, for example, need to enter names which, although using Latin characters, feature diacritics; in other cases, they may need to enter text which contains Chinese or Japanese characters. Even if a user is capable of using an application localized for only one language, it may be necessary to deal with data from a wide variety of languages.

Additionally, increased use of mobile phones has lead to changes in communications behaviour; this includes a vastly increased use of standardized characters intended to convey emotions, often called “emojis” or “emoticons.” Originally, such information was conveyed using ASCII text, such as “:-)” to indicate happiness – but, as noted, this has changed, with many devices automatically converting such …

[Read more]
Battling XHTML :: Storing UTF-8 data in MySQL

In the xml parser that I’ve been writing for rss/atom feeds I’ve encountered what many people have found; bizarre encoding issues when displaying the data from the database on a webpage. Since this is not really well explained by the searches I did on google I’ll explain it here.

Issue: you have utf-8 data coming from a source, you put it into a utf8_general_ci column of a mysql database table. You read the data from the database and display it as html/xhtml. Instead of getting things like double backquotes or long dashes you get euro signs or umlaut type of characters, usually strings of them instead of the correct format.

Potential solution: use utf8_encode and htmlentities in PHP to clean the data before going into the database. This does not work. Why? Those characters are not covered by html standards since they are above ascii code 126. See here for the full code chart: …

[Read more]
Showing entries 1 to 2