Planet MySQL Planet MySQL: Meta Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 10 of 18 8 Older Entries

Displaying posts with tag: utf8 (reset)

Importing the Unicode Character Database in MySQL
+1 Vote Up -0Vote Down

In Python it is easily possible to findout the name of a Unicode character and findout some properties about that character. The module which does that is called unicodedata.

An example:

>>> import unicodedata

This module uses the data as released in the UnicodeData.txt file from the website.

So if UnicodeData.txt is a 'database', then we should be able to import it into MySQL and use it!

I wrote a small …

  [Read more...]
MySQL Character encoding – part 2
+0 Vote Up -0Vote Down

In MySQL Character encoding – part 1 we stated that the myriad of ways in which character encoding can be controlled can lead to many situations where your data may not be available as expected.

UTF8 was designed on a placemat in a New Jersey diner one night in September or so 1992.

Setting MySQL Client and Server Character encoding.

Lets restart MySQL with the correct setting for our purpose, UTF8. Here …

  [Read more...]
MySQL Character encoding – part 1
+0 Vote Up -0Vote Down

Breaking and unbreaking your data

Recently at FOSDEM, Maciej presented “Breaking and unbreaking your data”, a presentation about the potential problems you can incur regarding character encoding whilst working with MySQL. In short, there are a myriad of places where character encoding can be controlled, which gives ample opportunity for the system to break and for text to become unrecoverable.

The slides from the presentation are available on …

  [Read more...]
utf8 data on latin1 tables: converting to utf8 without downtime or double encoding
+2 Vote Up -0Vote Down

Here’s a problem some or most of us have encountered. You have a latin1 table defined like below, and your application is storing utf8 data to the column on a latin1 connection. Obviously, double encoding occurs. Now your development team decided to use utf8 everywhere, but during the process you can only have as little to no downtime while keeping your stored data valid.

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c` text,
  PRIMARY KEY (`id`)
master> SET NAMES latin1;
master> INSERT INTO t (c) VALUES ('¡Celebración!'); …
  [Read more...]
Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL
+1 Vote Up -0Vote Down

As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of …

  [Read more...]
How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...
+1 Vote Up -0Vote Down

UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE …

  [Read more...]
The Dangers in Changing Default Character Sets on Tables
Employee_Team +5 Vote Up -0Vote Down

The ALTER TABLE statement syntax is explained in the manual at:

To put it simply, there are two ways you can alter the table to use a new character set.


This will alter the table to use the new character set as the default, but as a safety mechanism, it will only change the table definition for the default character set. That is, existing character fields will have the old character set per column. …

  [Read more...]
Believe говорит по русский - (Believe's talking Russian)
+0 Vote Up -0Vote Down

It has been a very very long working week-end for the technical team at Believe...
From MariaDB 5.2 to 5.5.28 some few QP regression still need  to be fixed.

From latin1 to utf8 very few indexes as …

  [Read more...]
That's not my name! A story about character sets
+2 Vote Up -0Vote Down

When computers were still using large black text oriented screens or no screens at all, a computer only knew how to store a limited set of characters. Then it was normal to store a name with the more complicated characters replaced by more basic characters. The ASCII standard was used to make communication between multiple systems (or applications) easier. Storing characters as ASCII needs little space and is quite strait forward.

Then DOS used CP850 and CP437 and so on to make it …

  [Read more...]
Adding a case insensitive, distinct unicode collation
+6 Vote Up -1Vote Down

Every once in a while questions like the one in MySQL Bug #60843 or Bug #19567 come up:

What collation should i use if i want case insensitive behavior but also want all accented letter to be treated as distinct to their base letters?

or shorter, as the reporter of bug #60843 put it:

I need something like utf8_bin + ci

utf8_general_ci and utf8_unicode_ci unfortunately do not provide this behavior and utf8_bin is obviously not case insensitive.

…  [Read more...]
Showing entries 1 to 10 of 18 8 Older Entries

Planet MySQL © 1995, 2015, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.