This is the english translation of an article in my german blog. This article, like
the german original, is licensed CC-BY-SA. The english translation has been
kindly provided by Tobias Klausmann.
Recently, I had to explain this to several people, hence a
writeup for the blog for easier reference. The question: I have
content in my database that can be sucessfully read and written
by my application, but if I do a mysqldump to transfer the data
to a new system, all the non-ASCII characters like Umlauts are
destroyed. This happens if you save data to a DB with the wrong
text encoding label.
In MySQL, every string has a label that describes the character
encoding the string was written in (and should be interpreted
in). The string _latin1"Köhntopp" thus (hopefully) is the
character sequence K-0xF6-hntopp and the string _utf8"Köhntopp"
consequently should be K-0xC3 0xB6-hntopp. Problems arise as soon
as the label (_latin1 or _utf8) does not match the encoding
inside the string (0xF6 vs. 0xC3 0xB6).
This is outlined in more detail in Handling character sets, and you should have
read that article before you continue.
Continue reading "MySQL is destroying my
Umlauts"
Feb
11
2012