Planet MySQL

Displaying posts with tag: utf8 (reset)

Mar

2022

Migrating to utf8mb4: Things to Consider

Posted by Sveta Smirnova of MySQL Performance Blog on Tue 29 Mar 2022 12:00 UTC
Tags:

utf8, Character Sets, unicode, Insight for DBAs, Insight for Developers, MySQL, utf8mb4, mysql-and-variants, MySQL character se

The utf8mb4 character set is the new default as of MySQL 8.0, and this change neither affects existing data nor forces any upgrades.

Migration to utf8mb4 has many advantages including:

It can store more symbols, including emojis
It has new collations for Asian languages
It is faster than utf8mb3

Still, you may wonder how migration affects your existing data. This blog covers multiple aspects of it.

Storage Requirements

As the name suggests, the maximum number of bytes that one character can take with character set utf8mb4 is four bytes. This is larger than the requirements for utf8mb3 which takes three bytes and many other MySQL character sets.

Fortunately, utf8mb3 is a subset of …

[Read more]

Apr

2018

MySQL Performance : 8.0 and UTF8 impact

Posted by Dimitri Kravtchuk on Wed 25 Apr 2018 22:58 UTC
Tags:

utf8, MySQL, Performance

The world is moving to UTF8, MySQL 8.0 has utf8mb4 charset as default now, but, to be honest, I was pretty surprised how sensible the "charset" related topic could be.. -- in fact you may easily hit huge performance overhead just by using an "odd" config settings around your client/server charset and collation. While to avoid any potential charset mismatch between client and server, MySQL has from a long time an excellent option : "skip-character-set-client-handshake" which is forcing any client connection to be "aligned" with server settings ! (for more details see the ref. manual : https://dev.mysql.com/doc/refman/8.0/en/server-options.html#option_mysqld_character-set-client-handshake) -- this option is NOT set by default (to leave you a freedom in choose of charsets used on client and server sides). However, in my …

[Read more]

Jan

2017

Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0

Posted by MySQL Server Dev Team on Fri 13 Jan 2017 14:16 UTC
Tags:

collation, utf8, upgrades, cjk, MySQL, mysql8.0

In MySQL 8.0 our plan is to drastically improve support for utf8. While utf8 support itself dates back to MySQL 4.1, there exist some limitations. The “sushi = beer” problem in the title refers to Bug #76553. Sushi and beer don’t even go well together, at least not to my taste:-) I will use this bug as an example to explain issues with utf8 collations in the past and our plans for utf8 support going forward.…

Sep

2015

Importing the Unicode Character Database in MySQL

Posted by Daniel van Eeden on Mon 07 Sep 2015 07:28 UTC
Tags:

utf8, unicode, MySQL, utf8mb4

In Python it is easily possible to findout the name of a Unicode character and findout some properties about that character. The module which does that is called unicodedata.

An example:

>>> import unicodedata
>>> unicodedata.name('☺')
'WHITE SMILING FACE'

This module uses the data as released in the UnicodeData.txt file from the unicode.org website.

So if UnicodeData.txt is a 'database', then we should be able to import it into MySQL and use it!

I wrote a small Python script to automate this. The basic steps are:

Download UnicodeData.txt
Create a unicodedata.ucd table
Use LOAD DATA LOCAL INFILE to load the data

This isn't difficult especially because the file doesn't have the actual characters in it. It is …

[Read more]

Mar

2015

MySQL Character encoding – part 2

Posted by dba square on Tue 03 Mar 2015 09:46 UTC
Tags:

conference, Development, configuration, character encoding, utf8, FOSDEM, schema, latin1, MySQL, Managing MySQL, Development with MySQL, Varia

In MySQL Character encoding – part 1 we stated that the myriad of ways in which character encoding can be controlled can lead to many situations where your data may not be available as expected.

UTF8 was designed on a placemat in a New Jersey diner one night in September or so 1992.

Setting MySQL Client and Server Character encoding.

Lets restart MySQL with the correct setting for our purpose, UTF8. Here we can see the setting in the MySQL configuration file, in this case /etc/mysql/my.cnf.

character-set-server = utf8

This change is then reflected in the session and global variables once the instance is restarted with the new configuration parameter.

mysql> SELECT …

[Read more]

Feb

2015

MySQL Character encoding – part 1

Posted by dba square on Thu 12 Feb 2015 17:38 UTC
Tags:

conference, character encoding, utf8, FOSDEM, latin1, MySQL, Managing MySQL, Development with MySQL, Varia

Breaking and unbreaking your data

Recently at FOSDEM, Maciej presented “Breaking and unbreaking your data”, a presentation about the potential problems you can incur regarding character encoding whilst working with MySQL. In short, there are a myriad of places where character encoding can be controlled, which gives ample opportunity for the system to break and for text to become unrecoverable.

The slides from the presentation are available on slideshare.

Character Encoding – MySQL DevRoom – FOSDEM 2015 from …

[Read more]

Oct

2013

utf8 data on latin1 tables: converting to utf8 without downtime or double encoding

Posted by MySQL Performance Blog on Wed 16 Oct 2013 05:00 UTC
Tags:

utf8, Insight for DBAs, MySQL, latin1 tables, utf8 horror stories

Here’s a problem some or most of us have encountered. You have a latin1 table defined like below, and your application is storing utf8 data to the column on a latin1 connection. Obviously, double encoding occurs. Now your development team decided to use utf8 everywhere, but during the process you can only have as little to no downtime while keeping your stored data valid.

CREATE TABLE `t` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
master> SET NAMES latin1;
master> INSERT INTO t (c) VALUES ('¡Celebración!');
master> SELECT id, c, HEX(c) FROM t;
+----+-----------------+--------------------------------+
| id | c               | HEX(c)                         |
+----+-----------------+--------------------------------+
|  3 | ¡Celebración!   | C2A143656C656272616369C3B36E21 |
+----+-----------------+--------------------------------+
1 row in set (0.00 sec)
master> SET …

[Read more]

Sep

2013

Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL

Posted by Anders Karlsson on Fri 27 Sep 2013 12:51 UTC
Tags:

utf8, unicode, mariadb, MySQL

As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of both MariaDB and MySQL. This is a character set that is just like the one just called utf8, except this one accepts all the UNICODE characters with up to 3 UNICODE bytes, or 4 bytes using the UTF-8 encoding.

This means that there are more limits to how long a column might be when using utf8mb4 compared …

[Read more]

Sep

2013

How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...

Posted by Anders Karlsson on Fri 27 Sep 2013 08:17 UTC
Tags:

utf8, unicode, MySQL

UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE is a character set that is very complete, you should be able to make yourself understood in any language using the characters from this vast character set. This is not to say that all characters from all languages are in UNICODE, some are missing here and there and sometimes new characters make their way into …

[Read more]

May

2013

The Dangers in Changing Default Character Sets on Tables

Posted by Jonathon Coombes on Tue 28 May 2013 00:42 UTC
Tags:

utf8, alter, character, sets, table, MySQL

The ALTER TABLE statement syntax is explained in the manual at:

http://dev.mysql.com/doc/refman/5.6/en/alter-table.html

To put it simply, there are two ways you can alter the table to use a new character set.

1. ALTER TABLE tablename DEFAULT CHARACTER SET utf8;

This will alter the table to use the new character set as the default, but as a safety mechanism, it will only change the table definition for the default character set. That is, existing character fields will have the old character set per column. For example:

mysql> create table mybig5 (id int not null auto_increment primary key,
-> subject varchar(100) ) engine=innodb default charset big5;
Query OK, 0 rows affected (0.81 sec)

mysql> show create table …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links