Showing entries 1 to 7
Displaying posts with tag: latin1 (reset)
MySQL Character encoding – part 2

In MySQL Character encoding – part 1 we stated that the myriad of ways in which character encoding can be controlled can lead to many situations where your data may not be available as expected.

UTF8 was designed on a placemat in a New Jersey diner one night in September or so 1992.

Setting MySQL Client and Server Character encoding.

Lets restart MySQL with the correct setting for our purpose, UTF8. Here we can see the setting in the MySQL configuration file, in this case /etc/mysql/my.cnf.

character-set-server = utf8

This change is then reflected in the session and global variables once the instance is restarted with the new configuration parameter.

mysql> SELECT …
[Read more]
MySQL Character encoding – part 1

Breaking and unbreaking your data

Recently at FOSDEM, Maciej presented “Breaking and unbreaking your data”, a presentation about the potential problems you can incur regarding character encoding whilst working with MySQL. In short, there are a myriad of places where character encoding can be controlled, which gives ample opportunity for the system to break and for text to become unrecoverable.


The slides from the presentation are available on slideshare.

Character Encoding – MySQL DevRoom – FOSDEM 2015 from

[Read more]
That's not my name! A story about character sets

When computers were still using large black text oriented screens or no screens at all, a computer only knew how to store a limited set of characters. Then it was normal to store a name with the more complicated characters replaced by more basic characters. The ASCII standard was used to make communication between multiple systems (or applications) easier. Storing characters as ASCII needs little space and is quite strait forward.

Then DOS used CP850 and CP437 and so on to make it possible to use language /location specific characters.
Then ISO8859-1, ISO8859-15 and more of these character sets were defined as standard.

And now there is Unicode: UTF-8, UTF-16, UCS2, etc. which allow you to …

[Read more]
Migrating MySQL latin1 to utf8 – The process

Having covered the preparation and character set options of performing a latin1 to utf8 MySQL migration, just how do you perform the migration correctly.

Example Case

Just to recap, we have the following example table and data.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 …
[Read more]
Charset support in MySQL is really not all that complex

The headline is flame-bait, don’t take it. I just wanted to point something out about character sets and collations in MySQL.

To the uninitiated, it may seem overwhelming. Everything has a character set! Everything has a collation! And they act weirdly! The server has one. The database has one (oh, and it changes magically as I USE different databases.) Every table has one, and columns too. Is that all? NO! My connection has one! Kill me now!

Relax. In truth, only one kind of thing actually has a charset/collation. That is values. And values are stored in columns. The only thing that really has a charset/collation is a column.[1]

What about all the rest of those things — connection, database, server, table? Those are just defaults, which determine what charset/collation a value gets if it isn’t overridden. So if the table’s default charset is utf8, and you add a column without saying what …

[Read more]
Migrating MySQL latin1 to utf8 – Character Set Options

Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

  • Instance
  • Schema
  • Table
  • Column

In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

mysql> create table test1(c1 varchar(10) not null);
mysql> show create table test1\G
Create Table: CREATE TABLE `test1` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you want all tables in your instance to always be a default of utf8, you can changed the server variable character_set_server. This can be set dynamically.

mysql> set global …
[Read more]
Migrating MySQL latin1 to utf8 – Preparation

Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

  • latin1 is a single byte character set.
  • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences

MySQL variables

MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

mysql> show global variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value …
[Read more]
Showing entries 1 to 7