Showing entries 1 to 5
Displaying posts with tag: utf8mb4 (reset)
A Guide to Better Understanding MySQL Charset Levels

We usually receive and see some questions regarding the charset levels in MySQL, especially after the deprecation of utf8mb3 and the new default uf8mb4. If you understand how the charset works on MySQL but have some questions regarding this change, please check out Migrating to utf8mb4: Things to Consider by Sveta Smirnova.Some of the questions […]

Migrating to utf8mb4: Things to Consider

The utf8mb4 character set is the new default as of MySQL 8.0, and this change neither affects existing data nor forces any upgrades.

Migration to utf8mb4 has many advantages including:

  • It can store more symbols, including emojis
  • It has new collations for Asian languages
  • It is faster than utf8mb3

Still, you may wonder how migration affects your existing data. This blog covers multiple aspects of it.

Storage Requirements

As the name suggests, the maximum number of bytes that one character can take with character set utf8mb4 is four bytes. This is larger than the requirements for utf8mb3 which takes three bytes and many other MySQL character sets.

Fortunately, utf8mb3 is a subset of …

[Read more]
Troubleshooting Issues with MySQL Character Sets Q & A

In this blog, I will provide answers to the Q & A for the Troubleshooting Issues with MySQL Character Sets webinar.

First, I want to thank everybody for attending the March 9 MySQL character sets troubleshooting webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: We’ve had some issues converting tables from

utf8

  to

utf8mb4

. Our issue was that the collation we wanted to use –

[Read more]
MySQL 5.7, utf8mb4 and the load data infile

In this post, I’ll discuss how MySQL 5.7 handles UTF8MB4 and the load data infile.

Many of my clients have told me that they do not like using the LOAD DATA INFILE statement and prefer to manually parse and load the data. The main reason they do it is issues with the character sets, specifically UTF8MB4 and the load data infile. This was surprising to me as nowadays everyone uses UTF8. MySQL 5.7 (as well as 5.6) has full support for UTF8MB4, which should fix any remaining issues (i.e., you can now load new emoji, like ).

Last week I was investigating an interesting case where we were loading data and got the following error:

mysql -e 'select version()'
+-----------+
| version() |
+-----------+
| 5.7.12    |
+-----------+
$ mysql -vvv testdb < load_data.sql
ERROR 1300 (HY000) at line 1: Invalid …
[Read more]
Importing the Unicode Character Database in MySQL

In Python it is easily possible to findout the name of a Unicode character and findout some properties about that character. The module which does that is called unicodedata.

An example:

>>> import unicodedata
>>> unicodedata.name('☺')
'WHITE SMILING FACE'

This module uses the data as released in the UnicodeData.txt file from the unicode.org website.

So if UnicodeData.txt is a 'database', then we should be able to import it into MySQL and use it!

I wrote a small Python script to automate this. The basic steps are:

  • Download UnicodeData.txt
  • Create a unicodedata.ucd table
  • Use LOAD DATA LOCAL INFILE to load the data

This isn't difficult especially because the file doesn't have the actual characters in it. It is …

[Read more]
Showing entries 1 to 5