Background Knowledge
Using the character set UTF-8 allows for the use of any language, can represent every character in the Unicode character set and is backward compatibility with ASCII. Not to mention is can handle any platform and be sent through many different systems without corruption. With such advantages this is why so many are making the switch.
The following instructions were done on Debian Squeeze v6.04 AMD64 operating system using MySQL v14.14 Distrib 5.1.61.
Solution – Server Configuration
At present MySQL is configured by default to use “latin1″ character set. Here’s how to change MySQL configuration to use UTF-8 character set and collation.
-
Check MySQL’s current configuration, run the following two SQL statements.
1 2
SHOW VARIABLES LIKE '%collation%'; SHOW VARIABLES LIKE '%char%';
Results of SHOW VARIABLES LIKE ‘%collation%’
+----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+
Results of SHOW VARIABLES LIKE ‘%char%’
+--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
- Edit the MySQL configuration file, “my.cnf” or add an
addition config file (e.g.
/etc/mysql/conf.d/char_collation_set.cnf).
To locate the “my.cnf” file on Linux can be done by using the shell/terminal command “locate my.cnf” without double quotes.
If locate is not available, you can use “find / -name ‘my.cnf’” to locate the my.cnf.
Make sure to replace any references to “latin1″.
Example of additional configuration changes.
Note: The following settings set the character set and collation for the database, server and database connections only.
[mysqld] default-character-set=utf8 default-collation=utf8_general_ci character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' character-set-filesystem=utf8 [client] default-character-set=utf8
- Restart the MySQL server (daemon). This is done by running at the shell/terminal “/etc/init.d/mysql stop” and then /etc/init.d/mysql start”.
- Check the settings by doing step 1 again.
Solution – Creating Tables with Character Set and Collation
CREATE TABLE `content` ( `id` int(11) NOT NULL auto_increment, `language` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `title` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL default '', PRIMARY KEY (`id`) ) ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_general_ci;
Solution via PHP PDO UTF-8 Connection to MySQL
The following is an example of connecting to MySQL using PHP PDO. The UTF-8 character set is enforced for the connection within the connection options using the SQL statement “SET NAMES utf8″.
1 2 3 4 5 |
$szMysqlDSN = 'mysql:host=localhost;dbname=DB_NAME_HERE'; $szMysqlDBUser = 'username_here'; $szMysqlDBPassword = 'password_here'; $aOptions = array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"); $obMysqlDBHandle = new PDO($mysqlDSN, $mysqlDBUser, $mysqlDBPassword, $aOptions); |
Solution via PHP PDO Import Data from Latin1 to UTF-8
MySQL can convert from one character set to another. In order to do this you must know what character set your converting from. The character set to convert from is set by “CHARACTER SET ‘latin1′”. To see what character sets are supported, run the following SQL statement “SHOW CHARACTER SET;”.
Note: Make sure the database and table have been correctly set to the desired character set to convert to.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
LOAD DATA INFILE '/var/tmp/TEST.TXT' INTO TABLE programs CHARACTER SET 'latin1' FIELDS TERMINATED BY '|' LINES TERMINATED BY '\r\n' (`parent_id`, @`languages_id`, @`countries_id`, @`programs_types_id`, `origin_airdate`, `theatrical_run`, `title10`, `title15`, `title25`, `title50`, `title80`) SET created = now(), languages_id = (SELECT id FROM languages WHERE english_name=@languages_id), countries_id = (SELECT id FROM countries WHERE fyi_code=@countries_id), |
Solution – How to Check Database Table/Table Columns Character Set and Collation.
Keep in mind that you can have database server, database table and database table columns character sets and collations. All can be set differently. Make sure all are set as desired.
How to check the database table and database table columns. Replace “tblname” with desired table name.
1 |
SHOW CREATE TABLE tblname; |
Solution – How To View UTF-8 Characters
To ensure you can view all characters, you must ensure their MySQL client and/or SSH client is set to render the corresponding character set.
Using MySQL Workbench, make sure to use SQL “SET NAMES utf8;” before each query.
Using a SSH client such as PuTTY or KiTTY, set the session under “Window” -> “Translation” character set” to “UTF-8″.
Source: Configuring MySQL To Use UTF-8
Source: MySQL 5.1 Reference Manual :: 5.1.3. Server System
Variables
Source: Getting out of MySQL Character Set Hell
Source: Convert latin1 to UTF-8 in MySQL
Source: MySQL 5.1 Reference Manual 10.1.13.1. Unicode
Character Sets
Source: MySQL 5.1 Reference Manual 13.2.6. LOAD DATA INFILE
Syntax