Planet MySQL Planet MySQL: Meta Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 6

Displaying posts with tag: unicode (reset)

WordPress and UTF-8
Employee +2 Vote Up -0Vote Down

For many years, MySQL had only supported a small part of UTF-8, a section commonly referred to as plane 0, the “Basic Multilingual Plane”, or the BMP. The UTF-8 spec is divided into “planes“, and plane 0 contains the most commonly used characters. For a long time, this was reasonably sufficient for MySQL’s purposes, and WordPress made do with this limitation.

It has always been possible to store all UTF-8 characters in the latin1 character set, though latin1 has shortcomings. While it recognises the connection between upper and lower case …

  [Read more...]
Using 4-byte UTF-8 (aka 3-byte UNICODE) in MariaDB and MySQL
+1 Vote Up -0Vote Down

As I wrote in a previous post, MariaDB / MySQL has some issues with the standard UTF-8 encoding there. This UTF-8 encoding limits us to 3 UTF-8 bytes or 2 UNICODE bytes if you want to look at it that way. This is slightly limiting, but for languages it is usually pretty much OK, although there are some little used languages in the 3 byte UNICODE range. But in addition to languages, you will be missing symbols, such as smileys!

Help is on the way though, in the utf8mb4 character set that is part of …

  [Read more...]
How MariaDB and MySQL makes life with UTF-8 a bit too easy. And how to fix it...
+1 Vote Up -0Vote Down

UNICODE is getting more and more traction and most new applications, at least web applications, support UNICODE. I have written about UNICODE and related stuff before in Character sets, Collations, UTF-8 and all that but before I go into some more specific and some issues, and fixes, let me tell you about UNICODE, UTF-8 and how MySQL interprets it. See the blogpost linked to above for more information on the subject, surprisingly even more boring, on Collations.

So, let's begin with UNICODE. UNICODE …

  [Read more...]
Adding a case insensitive, distinct unicode collation
+6 Vote Up -1Vote Down

Every once in a while questions like the one in MySQL Bug #60843 or Bug #19567 come up:

What collation should i use if i want case insensitive behavior but also want all accented letter to be treated as distinct to their base letters?

or shorter, as the reporter of bug #60843 put it:

I need something like utf8_bin + ci

utf8_general_ci and utf8_unicode_ci unfortunately do not provide this behavior and utf8_bin is obviously not case insensitive.

…  [Read more...]
Guidelines for generating XML
+0 Vote Up -0Vote Down

Over the last little while I've come across quite a few XML feed generators written in PHP, with varying degrees of 'correctness'. Even though generating XML should be very simple, there's still quite a bit of pitfalls I feel every PHP or (insert your language)-developer should know about.

1. You are better off using an XML library

This is the first and foremost rule. Most people end up generating their xml using simple string concatenation, while there are many dedicated tools out there that really help you generate your own XML.

In PHP land the best example is …

  [Read more...]
Unicode coming to PHP 6
+0 Vote Up -0Vote Down

The move from PHP 5 to PHP 6 will be a painful one. But once it’s done, I hope that it will be easier to handle safe web development for a global, multi-language internet. After all these years, we still … Continue reading →

Showing entries 1 to 6

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.