Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 14

Displaying posts with tag: utf8 (reset)

utf8 data on latin1 tables: converting to utf8 without downtime or double encoding
+2 Vote Up -0Vote Down

Here’s a problem some or most of us have encountered. You have a latin1 table defined like below, and your application is storing utf8 data to the column on a latin1 connection. Obviously, double encoding occurs. Now your development team decided to use utf8 everywhere, but during the process you can only have as little to no downtime while keeping your stored data valid.

CREATE TABLE `t` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
master> SET NAMES latin1;
master> INSERT INTO t (c) VALUES ('¡Celebración!');
master> SELECT id, c, HEX(c) FROM t;
+----+-----------------+--------------------------------+
| id | c               | HEX(c)                         |
+----+-----------------+--------------------------------+
|
  [Read more...]
Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL
+1 Vote Up -0Vote Down
As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of both MariaDB and MySQL. This is a character set that is just like the one just called utf8, except this one accepts all the UNICODE characters with up to 3 UNICODE bytes, or 4 bytes using the UTF-8 encoding.

This means



  [Read more...]
How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...
+1 Vote Up -0Vote Down
UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE is a character set that is very complete, you should be able to make yourself understood in any language using the characters from this vast character set. This is not to say that all characters from all languages are in UNICODE,

  [Read more...]
The Dangers in Changing Default Character Sets on Tables
Employee_Team +5 Vote Up -0Vote Down

The ALTER TABLE statement syntax is explained in the manual at:

http://dev.mysql.com/doc/refman/5.6/en/alter-table.html

To put it simply, there are two ways you can alter the table to use a new character set.

1. ALTER TABLE tablename DEFAULT CHARACTER SET utf8;

This will alter the table to use the new character set as the default, but as a safety mechanism, it will only change the table definition for the default character set. That is, existing character fields will have the old character set per column. For example:

mysql> create table mybig5 (id int not null auto_increment primary key,      
    -> subject varchar(100) ) engine=innodb default charset










  [Read more...]
That's not my name! A story about character sets
+2 Vote Up -0Vote Down
When computers were still using large black text oriented screens or no screens at all, a computer only knew how to store a limited set of characters. Then it was normal to store a name with the more complicated characters replaced by more basic characters. The ASCII standard was used to make communication between multiple systems (or applications) easier. Storing characters as ASCII needs little space and is quite strait forward.

Then DOS used CP850 and CP437 and so on to make it possible to use language /location specific characters.
Then ISO8859-1, ISO8859-15 and more of these character sets were defined as standard.

And now there is Unicode: UTF-8, UTF-16, UCS2, etc.




  [Read more...]
Adding a case insensitive, distinct unicode collation
+6 Vote Up -1Vote Down

Every once in a while questions like the one in MySQL Bug #60843 or Bug #19567 come up:

What collation should i use if i want case insensitive behavior but also want all accented letter to be treated as distinct to their base letters?

or shorter, as the reporter of bug #60843 put it:

I need something like utf8_bin + ci

utf8_general_ci and utf8_unicode_ci unfortunately do not provide this behavior and utf8_bin is obviously not case insensitive.

read more

How To – Configure MySQL to Use UTF-8
+1 Vote Up -1Vote Down

Background Knowledge


Using the character set UTF-8 allows for the use of any language, can represent every character in the Unicode character set and is backward compatibility with ASCII. Not to mention is can handle any platform and be sent through many different systems without corruption. With such advantages this is why so many are making the switch.

The following instructions were done on Debian Squeeze v6.04 AMD64 operating system using MySQL v14.14 Distrib 5.1.61.

Solution – Server Configuration


At present MySQL is configured by default to use “latin1″ character set. Here’s how to change MySQL configuration to use UTF-8 character set and collation.

  • Check MySQL’s current configuration, run the following two SQL statements.
    1
    2
    
    SHOW VARIABLES LIKE '%collation%'; 
    SHOW
  •   [Read more...]
    Migrating MySQL latin1 to utf8 – The process
    +2 Vote Up -0Vote Down

    Having covered the preparation and character set options of performing a latin1 to utf8 MySQL migration, just how do you perform the migration correctly.

    Example Case

    Just to recap, we have the following example table and data.

    mysql> select c,length(c),char_length(c),charset(c), hex(c) from conv.test_latin1;
    +---------------+-----------+----------------+------------+----------------------------+
    | c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
    +---------------+-----------+----------------+------------+----------------------------+
    | a             |         1 |              1 | latin1     | 61
      [Read more...]
    Charset support in MySQL is really not all that complex
    +3 Vote Up -1Vote Down

    The headline is flame-bait, don’t take it. I just wanted to point something out about character sets and collations in MySQL.

    To the uninitiated, it may seem overwhelming. Everything has a character set! Everything has a collation! And they act weirdly! The server has one. The database has one (oh, and it changes magically as I USE different databases.) Every table has one, and columns too. Is that all? NO! My connection has one! Kill me now!

    Relax. In truth, only one kind of thing actually has a charset/collation. That is values. And values are stored in columns. The only thing that really has a charset/collation is a column.[1]

    What about all the rest of those things — connection, database, server, table? Those are just defaults, which determine what charset/collation a

      [Read more...]
    Migrating MySQL latin1 to utf8 – Character Set Options
    +1 Vote Up -0Vote Down

    Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

    • Instance
    • Schema
    • Table
    • Column

    In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

    mysql> create table test1(c1 varchar(10) not null);
    mysql> show create table test1\G
    Create Table: CREATE TABLE `test1` (
      `c1` varchar(10) NOT NULL
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
    

    If you want all tables in your instance to always be a default of utf8, you can changed the server variable

      [Read more...]
    Migrating MySQL latin1 to utf8 – Preparation
    +0 Vote Up -0Vote Down

    Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

    • latin1 is a single byte character set.
    • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences

    MySQL variables

    MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

    mysql> show global variables like '%char%';
      [Read more...]
    From Russia with Blogs: PlanetMySQL in Russian
    Employee +3 Vote Up -0Vote Down

    My colleague Lenz might have forgotten to post before he disappeared on a well-deserved vacation but we've enabled Russian as a choice in PlanetMySQL. Feel free to start submitting your Russian language blogs.

    Russian Language PlanetMySQL: http://ru.planet.mysql.com
    New feed submissions: http://ru.planet.mysql.com/new

    We haven't completely translated all the strings yet (that's my fault, I need to stringify the vote stuff) but we're getting there!

    (EDIT: LenZ is not on vacation... in fact he is at PHPDay2009 in Verona, Italy... sorry LenZ)

    How To Access MySQL from Oracle With ODBC and SQL
    +3 Vote Up -0Vote Down

    The Oracle gateway for ODBC provides an almost seamless data integration between Oracle and other RDBMS. I won’t argue about its performance, limits, or relevance. It serves a few purposes; set it up and you’ll be able, for example, to create database links between Oracle and MySQL. After all, wouldn’t it be nice if you could run some of the following SQL statements?

    • select o.col1, m.col1 from oracle_tab
      o, mysql_tab@mysql m where o.col1=m.col1;
    • insert into oracle_tab (select * from mysql_tab@mysql);

    This post is intended to share, the same way


      [Read more...]
    French Characters Not Rendering Correctly
    +0 Vote Up -0Vote Down

    Background Knowledge


    The MySQL database (http://www.mysql.com/) v4.0.23 is using the default character set of “Latin1″. When the database was created I had no knowledge of character sets other wise it would have been “UTF-8″.
    The web pages are using a character set of “UTF-8″.

    Problem


    Data being queried from a MySQL database (http://www.mysql.com/) that contains French accent characters will not render correctly in the browser even after applying PHP htmlentities().

    Example code: $string = htmlentities($string , ENT_QUOTES, “UTF-8″);

    Solution


    The queried data from the database was inputted using the


      [Read more...]
    Showing entries 1 to 14

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.