I am curious about character sets and collations, especially how they are used in databases. I got some time to play with them recently. I did some testing today on MySQL. I will do the same test on Sql Server, Oracle, and PostgreSql, time permitting. I am only dealing with simplified Chinese at this point. I may take up traditional Chinese too.
Here is my setup:
1. I created a table that stores simplified Chinese characters in different character set, along with collation used, pinyin, number of strokes, and tone value. There are 126 collations in MySQL, only 10 of which are suited for simplified Chinese.
2. I used the Chinese version of the golden rule,