Planet MySQL

Displaying posts with tag: cjk (reset)

Jun

2017

Posted by MySQL Server Dev Team on Fri 23 Jun 2017 13:19 UTC
Tags:

cjk, datatypes, MySQL

以前の記事では、MySQL 8.0.1で導入された新しい日本語のutf8bm4のCollation(文字照合順)についてご紹介しました。このcollation (utf8mb4_ja_0900_as_cs) は、CLDR 30で定義されたアクセント記号(清音濁音半濁音)ならびに大文字小文字(拗音促音など)を判別する実装となっています。

今日ご紹介するのはひらがなカタカナを判別できる新しい「かなセンシティブ」なCollation utf8mb4_ja_0900_as_cs_ksです。 …

[Read more]

Jun

2017

MySQL8.0: 日本語のutf8bm4のCollation(文字照合順)

Posted by MySQL Server Dev Team on Fri 23 Jun 2017 13:17 UTC
Tags:

cjk, datatypes, MySQL

MySQL 8.0.1では、utf8mb4の大文字小文字およびアクセント記号付きの文字を判別するas_cs collationに加えて、日本語用のCollation(文字照合順)を追加しました。

utf8mb4_ja_0900_as_csについて

日本語に関する文字照合およびソートのルールは複雑です。日本語ではひらがな、カタカナ、漢字、アルファベット(ラテン文字)を混在させて利用しています。さらに、全角と半角が存在する文字もあります。では、‘あ’, ‘ア’, ‘a’, ‘ｱ’はどのようにソートされるのでしょうか？

Unicode照合アルゴリズム(UCA / Unicode Collation …

[Read more]

Jun

2017

MySQL 8.0: Kana-sensitive collation for Japanese

Posted by MySQL Server Dev Team on Thu 15 Jun 2017 11:05 UTC
Tags:

cjk, datatypes, MySQL

In my previous post, I wrote about the new Japanese collation for utf8mb4 introduced in MySQL 8.0.1! This collation (utf8mb4_ja_0900_as_cs) implements accent / case sensitivity for Japanese as defined by CLDR 30.

Today, I am writing about our new utf8mb4_ja_0900_as_cs_ks collation which includes support for kana sensitivity.…

Jan

2017

Sushi = Beer ?! An introduction of UTF8 support in MySQL 8.0

Posted by MySQL Server Dev Team on Fri 13 Jan 2017 14:16 UTC
Tags:

collation, utf8, upgrades, cjk, MySQL, mysql8.0

In MySQL 8.0 our plan is to drastically improve support for utf8. While utf8 support itself dates back to MySQL 4.1, there exist some limitations. The “sushi = beer” problem in the title refers to Bug #76553. Sushi and beer don’t even go well together, at least not to my taste:-) I will use this bug as an example to explain issues with utf8 collations in the past and our plans for utf8 support going forward.…

May

2015

InnoDB 전문 검색 : N-gram Parser

Posted by MySQL Server Dev Team on Thu 14 May 2015 15:47 UTC
Tags:

cjk, Full Text Search, MySQL

기본 InnoDB 전문 검색(Full Text) 파서는 공백이나 단어 분리자가 토큰인 라틴 기반 언어들에서는 이상적이지만 개별 단어에 대한 고정된 구분자가 없는 중국어, 일본어, 한국어(CJK)같은 언어들에서는 각 단어는 여러개의 문자들의 조합으로 이루어집니다. 그래서 이경우엔 단어 토큰들을 처리할 수 있는 다른 방법이 필요합니다.

우리는 CJK에서 사용할 수 있는 n-gram 파서를 제공하기 위한 새로운 플러그블 전문 파서(pluggable full-text parser)를 MySQL 5.7.6 에서 제공할 수 …

[Read more]

Dec

2008

FULLTEXT and Asian Languages with MySQL 5.0

Posted by Lachlan Mulcahy on Tue 16 Dec 2008 03:09 UTC
Tags:

fulltext, sun, Chinese, 5.0, asian, cjk, japanese, korean, MySQL

I have seen a few people now ask about using MySQL's FULLTEXT indexing with asian languages such as Chinese, Japanese and Korean (herein referred to as CJK.), however, there doesn't seem to be a good centralised article that covers it.

The information is out there, I just don't think it has been well presented yet.

As I have recently done a bunch of research on this topic for a customer, I figured it might be a good opportunity to make my debut in the MySQL blogosphere.

So here we go...

I'll open by saying that attempting to use FULLTEXT with CJK text in MySQL 5.0 will be unsuccessful.

From the CJK FAQ in the MySQL manual:

"For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links