The Korean MySQL Power User Group gets a special guest speaker next weekend (Oct 31 2015 – 4pm – 4:33’s offices in Gangnam — nearest train stop is Samseong station, Line 2 — post requires Cafe Naver login) — Mark Callaghan (Small Datum, @markcallaghan, and formerly High Availability MySQL). I’ve been to many of their meetups, and I think this is a great opportunity for many DBAs to learn more about how Mark helps make MySQL and MongoDB better …[Read more]
If you are a MySQL power user in Korea, its well worth joining the Korean MySQL Power User Group. This is a group led by senior DBAs at many Korean companies. From what I gather, there is experience there using MySQL, MariaDB, Percona Server and Galera Cluster (many on various 5.5, some on 5.6, and quite a few testing 10.0). No one is using WebScaleSQL (yet?). The discussion group is rather active, and I’ve got a profile there (I get questions translated for me).[Read more]
For some months now, there have been some back & forth emails with Matt, one of the senior DBAs behind the popular messaging service, KakaoTalk (yes, they are powered by MariaDB). Today I got some positive information: the book published entirely in the Korean language, titled Real MariaDB is now available.[Read more]
I have seen a few people now ask about using MySQL's FULLTEXT
indexing with asian languages such as Chinese, Japanese and
Korean (herein referred to as CJK.), however, there doesn't seem
to be a good centralised article that covers it.
The information is out there, I just don't think it has been well presented yet.
As I have recently done a bunch of research on this topic for a customer, I figured it might be a good opportunity to make my debut in the MySQL blogosphere.
So here we go...
I'll open by saying that attempting to use FULLTEXT with CJK text in MySQL 5.0 will be unsuccessful.
From the CJK FAQ in the MySQL manual:
"For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not …