I have seen a few people now ask about using MySQL's FULLTEXT
indexing with asian languages such as Chinese, Japanese and
Korean (herein referred to as CJK.), however, there doesn't seem
to be a good centralised article that covers it.
The information is out there, I just don't think it has been well
presented yet.
As I have recently done a bunch of research on this topic for a
customer, I figured it might be a good opportunity to make my
debut in the MySQL blogosphere.
So here we go...
I'll open by saying that attempting to use FULLTEXT with CJK text
in MySQL 5.0 will be unsuccessful.
From the CJK FAQ in the MySQL manual:
"For FULLTEXT searches, we need to know where words begin and
end. With Western languages, this is rarely a problem because
most (if not all) of these use an easy-to-identify word boundary
— the space character. However, this is not …
I came across Sphinx today via the MySQL Performance Blog (which has some good entries you might want to check out). It is an Open Source Full Text SQL Search Engine. It can be installed as a storage engine type on MySQL, and from what I hear can beat the pants off of MySQL's built-in full text search in some cases.
From the web site:
Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting …
[Read more]
MySQL has supported FULLTEXT
indexes since version
3.23.23
. VARCHAR
and TEXT
Columns that have been indexed with FULLTEXT
can be
used with special SQL statements that perform the full text
search in MySQL.
To get started you need to define the FULLTEXT
index
on some columns. Like other indexes, FULLTEXT
indexes can contain multiple columns. Here's how you might add a
FULLTEXT
index to some table columns:
ALTER TABLE news ADD FULLTEXT(headline, story);
Once you have a FULLTEXT
index, you can search it
using MATCH
and AGAINST
statements. For
example:
SELECT headline, story FROM news WHERE MATCH (headline,story) AGAINST ('Hurricane');
The result of this query is automatically sorted by relevancy.
MATCH
The MATCH
function is used to …
I've found the two person shower.
No, it does not have two shower heads, but instead takes two
people to use. One person gets to take the shower, the other
person has to operate the controls since the shower goes from hot
to cold every 10 seconds. Even with an operator, the person
taking the shower gets scalded and/or frozen every few
seconds.
Excellent plumbing job!
BTW we are working on a 5.0 release candidate for MySQL. Sure,
this has nothing to do with showers, hot or cold, but it will
make Zack happy that I added this to a blog entry since the LJ
"post multiplier" sank the planetmysql site and of course nothing
was mentioned about 5.0 in any of the multiplied posts.
On a different note I am wondering if Brad is using fulltext for
his new auto tag feature, or if he just created his own inverted
index method, and if he did, did he use HASH or BTREE indexes on
the column?