Showing entries 1 to 8
Displaying posts with tag: Character Sets (reset)
A Guide to Better Understanding MySQL Charset Levels

We usually receive and see some questions regarding the charset levels in MySQL, especially after the deprecation of utf8mb3 and the new default uf8mb4. If you understand how the charset works on MySQL but have some questions regarding this change, please check out Migrating to utf8mb4: Things to Consider by Sveta Smirnova. Some of the […]

Understanding How MySQL Collation and Charset Settings Impact Performance

This blog was originally published in February 2019 and was updated in September 2023.

Web applications rely on databases to run the internet, powering everything from e-commerce platforms to social media networks to streaming services. MySQL is one of the most popular database management systems, playing a pivotal role in the functionality and performance of web applications.

In today’s blog, I’ll take a look at MySQL collation and charset settings to shed light on how they impact the performance of web applications and how to use them to effectively communicate with your users.

Understanding Character Sets and Encoding in MySQL

Character sets and encoding in MySQL play a vital role in how data is stored and retrieved in a database. A character set is a collection of characters with unique representations for each character, such as letters, numbers, and symbols, that define how data is …

[Read more]
Migrating to utf8mb4: Things to Consider

The utf8mb4 character set is the new default as of MySQL 8.0, and this change neither affects existing data nor forces any upgrades.

Migration to utf8mb4 has many advantages including:

  • It can store more symbols, including emojis
  • It has new collations for Asian languages
  • It is faster than utf8mb3

Still, you may wonder how migration affects your existing data. This blog covers multiple aspects of it.

Storage Requirements

As the name suggests, the maximum number of bytes that one character can take with character set utf8mb4 is four bytes. This is larger than the requirements for utf8mb3 which takes three bytes and many other MySQL character sets.

Fortunately, utf8mb3 is a subset of …

[Read more]
Replicating from MySQL 8.0 to MySQL 5.7

In this blog post, we’ll discuss how to set a replication from MySQL 8.0 to MySQL 5.7. There are some situations that having this configuration might help. For example, in the case of a MySQL upgrade, it can be useful to have a master that is using a newer version of MySQL to an older version slave as a rollback plan. Another example is in the case of upgrading a master x master replication topology.

Officially, replication is only supported between consecutive major MySQL versions, and only from a lower version master to a higher version slave. Here is an example of a supported scenario:

5.7 master –> 8.0 slave

while the opposite is not supported:

8.0 master –> 5.7 slave

In this blog post, I’ll walk through how to overcome the …

[Read more]
Character Sets: Migrating to utf8mb4 with pt_online_schema_change

Modern applications often feature the use of data in many different languages. This is often true even of applications that only offer a user facing interface in a single language. Many users may, for example, need to enter names which, although using Latin characters, feature diacritics; in other cases, they may need to enter text which contains Chinese or Japanese characters. Even if a user is capable of using an application localized for only one language, it may be necessary to deal with data from a wide variety of languages.

Additionally, increased use of mobile phones has lead to changes in communications behaviour; this includes a vastly increased use of standardized characters intended to convey emotions, often called “emojis” or “emoticons.” Originally, such information was conveyed using ASCII text, such as “:-)” to indicate happiness – but, as noted, this has changed, with many devices automatically converting such …

[Read more]
Troubleshooting Issues with MySQL Character Sets Q & A

In this blog, I will provide answers to the Q & A for the Troubleshooting Issues with MySQL Character Sets webinar.

First, I want to thank everybody for attending the March 9 MySQL character sets troubleshooting webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: We’ve had some issues converting tables from

utf8

  to

utf8mb4

. Our issue was that the collation we wanted to use –

[Read more]
DBJ – MySQL Character Sets

In our latest article at Database Journal we talk about Character Sets in MySQL.  What are they?  How do they affect searching?  How do they affect data that is inserted or updated?  How can I set and control the for an application or globally in my database?  And what pre-tell is collation?  We answer all these questions and more.

Database Journal – Understanding MySQL Character Sets

JOIN Performance & Charsets

We have written before about the importance of using numeric types as keys, but maybe you've inherited a schema that you can't change or have chosen string types as keys for a specific reason. Either way, the character sets used on joined columns can have a significant impact on the performance of your queries.

Take the following example, using the InnoDB storage engine:

PLAIN TEXT SQL:

  1. CREATE TABLE `t1` (
  2. `char_id` char(6) NOT NULL,
  3. `v` varchar(128) NOT NULL,
  4. PRIMARY KEY (`char_id`)
  5. ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  6.  
  7. CREATE TABLE `t2` (
  8. `id` int UNSIGNED NOT NULL AUTO_INCREMENT,
[Read more]
Showing entries 1 to 8