Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 6

Displaying posts with tag: unicode (reset)

WordPress and UTF-8
Employee +2 Vote Up -0Vote Down

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case characters in Latin alphabets (such as English, French and German), it doesn’t recognise the same connection for other alphabets. For example, it doesn’t know that ‘Ω’ and ‘ω’ are the upper and lower-case

  [Read more...]
Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL
+1 Vote Up -0Vote Down
As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of both MariaDB and MySQL. This is a character set that is just like the one just called utf8, except this one accepts all the UNICODE characters with up to 3 UNICODE bytes, or 4 bytes using the UTF-8 encoding.

This means



  [Read more...]
How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...
+1 Vote Up -0Vote Down
UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE is a character set that is very complete, you should be able to make yourself understood in any language using the characters from this vast character set. This is not to say that all characters from all languages are in UNICODE,

  [Read more...]
Adding a case insensitive, distinct unicode collation
+6 Vote Up -1Vote Down

Every once in a while questions like the one in MySQL Bug #60843 or Bug #19567 come up:

What collation should i use if i want case insensitive behavior but also want all accented letter to be treated as distinct to their base letters?

or shorter, as the reporter of bug #60843 put it:

I need something like utf8_bin + ci

utf8_general_ci and utf8_unicode_ci unfortunately do not provide this behavior and utf8_bin is obviously not case insensitive.

read more

Guidelines for generating XML
+0 Vote Up -0Vote Down

Over the last little while I've come across quite a few XML feed generators written in PHP, with varying degrees of 'correctness'. Even though generating XML should be very simple, there's still quite a bit of pitfalls I feel every PHP or (insert your language)-developer should know about.

1. You are better off using an XML library

This is the first and foremost rule. Most people end up generating their xml using simple string concatenation, while there are many dedicated tools out there that really help you generate your own XML.

In PHP land the best example is XMLWriter. It is actually quite easy to use:

  •  
  • $xmlWriter = new
  •   [Read more...]
    Unicode coming to PHP 6
    +0 Vote Up -0Vote Down
    The move from PHP 5 to PHP 6 will be a painful one. But once it’s done, I hope that it will be easier to handle safe web development for a global, multi-language internet. After all these years, we still … Continue reading →
    Showing entries 1 to 6

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.