I have seen a few people now ask about using MySQL's FULLTEXT
indexing with asian languages such as Chinese, Japanese and
Korean (herein referred to as CJK.), however, there doesn't seem
to be a good centralised article that covers it.
The information is out there, I just don't think it has been well
presented yet.
As I have recently done a bunch of research on this topic for a
customer, I figured it might be a good opportunity to make my
debut in the MySQL blogosphere.
So here we go...
I'll open by saying that attempting to use FULLTEXT with CJK text
in MySQL 5.0 will be unsuccessful.
From the CJK FAQ in the MySQL manual:
"For FULLTEXT searches, we need to know where words begin and
end. With Western languages, this is rarely a problem because
most (if not all) of these use an easy-to-identify word boundary
— the space character. However, this is not …
With all due respect to Monty (and I mean that — much respect is due), I have some serious issues with his portrayal of the 5.1 release. I hate to make my first entry on Planet MySQL about a controversy, but he encouraged people to blog about their experience with 5.1, so that’s what I’ll do here.
Overall Quality
As a long time user, I am very confident that the quality of 5.1 GA far exceeds that of the initial 5.0 GA release (5.0.15). In fact, I would go further and suggest that the MySQL organization has if anything been too conservative about declaring 5.1 GA.
It’s obviously true that there are still many bugs open. However no software is bug free, especially not those with codebase as large as MySQL. So the question is not if they are bug free, but are the …
[Read more]