Showing entries 1 to 6
Displaying posts with tag: utf-8 (reset)
WordPress and UTF-8

Update: WordPress 4.2 has full UTF-8 support! There’s no need to upgrade manually any more. ?

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know …

[Read more]
How To – Configure MySQL to Use UTF-8

Background Knowledge

Using the character set UTF-8 allows for the use of any language, can represent every character in the Unicode character set and is backward compatibility with ASCII. Not to mention is can handle any platform and be sent through many different systems without corruption. With such advantages this is why so many are making the switch.

The following instructions were done on Debian Squeeze v6.04 AMD64 operating system using MySQL v14.14 Distrib 5.1.61.

Solution – Server Configuration

At present MySQL is configured by default to use “latin1″ character set. Here’s how to change MySQL configuration to use UTF-8 character set and collation.

  1. Check MySQL’s current configuration, run the following two SQL statements.

[Read more]
UTF-8 with MySQL and LAMP

A recent question on a mailing list was the best practices for UTF-8 and PHP/MySQL. The following are the configurations I used in my multi-language projects.

MySQL UTF-8 Configuration

# my.cnf
default_character_set = utf8
character_set_client       = utf8
character_set_server       = utf8
default_character_set = utf8

PHP UTF-8 Configuration

default_charset = "utf-8"

Apache UTF-8 Configuration

AddDefaultCharset UTF-8
    AddCharset UTF-8   .htm

HTML file UTF-8 Configuration

 <meta charset="utf-8">

PHP file UTF-8 Configuration

header('Content-type: text/html; charset=UTF-8');

MySQL connection (extra precaution)


Shell UTF-8

And last but not least, even editing files in shell can be affected (.e.g UTF-8 data to be …

[Read more]
Find multi-byte characters in a table

Multi-byte characters can cause quite a few problems for the unsuspecting DBA or web master. Most of the times all you need to do to figure out how to fix the problem is detect which database records have UTF-8 data in them. Scanning records manually is not an option. Try the following query to find strings with multi-byte [...]

Battling XHTML :: Storing UTF-8 data in MySQL

In the xml parser that I’ve been writing for rss/atom feeds I’ve encountered what many people have found; bizarre encoding issues when displaying the data from the database on a webpage. Since this is not really well explained by the searches I did on google I’ll explain it here.

Issue: you have utf-8 data coming from a source, you put it into a utf8_general_ci column of a mysql database table. You read the data from the database and display it as html/xhtml. Instead of getting things like double backquotes or long dashes you get euro signs or umlaut type of characters, usually strings of them instead of the correct format.

Potential solution: use utf8_encode and htmlentities in PHP to clean the data before going into the database. This does not work. Why? Those characters are not covered by html standards since they are above ascii code 126. See here for the full code chart: …

[Read more]
French Characters Not Rendering Correctly

Background Knowledge

The MySQL database v4.0.23 is using the default character set of “Latin1″. When the database was created I had no knowledge of character sets other wise it would have been “UTF-8″.
The web pages are using a character set of “UTF-8″.


Data being queried from a MySQL database that contains French accent characters will not render correctly in the browser even after applying PHP htmlentities().

Example code: $string = htmlentities($string , ENT_QUOTES, “UTF-8″);


The queried data from the database was inputted using the character set “ISO-8859-1″. I …

[Read more]
Showing entries 1 to 6