Showing entries 1 to 10 of 14
4 Older Entries »
Displaying posts with tag: fulltext (reset)
InnoDB 全文検索 : MeCab Parser

(この記事は InnoDB Full-Text: MeCab ParserYoshiaki Yamasaki が翻訳したものです)

このブログ記事で紹介した一般的なCJK(中国語、日本語、韓国語)のサポートに加えて、私達はMeCabパーサーも追加しました。MeCabは日本語の形態素解析エンジンで、私達は今 …

[Read more]
InnoDB 全文検索 : N-gram Parser

(この記事は InnoDB Full-Text : N-gram ParserYoshiaki Yamasaki が翻訳したものです)

デフォルトのInnoDB全文検索パーサー(構文解析プログラム)は、空白がトークン(語句)もしくは単語の区切りとなっているラテン語ベースの言語に対して理想的です。しかし、個々の単語の区切り文字が存在せず、それぞれの単語は複数の文字で構成できる中国語・日本語・韓国語(CJK)のような言語には向いていません。そこで、私たちは異なった方法で単語/トークンを識別する方法を必要とします。
私は今 …

[Read more]
InnoDB Full-Text: MeCab Parser

In addition to our general CJK support, as detailed in this blog post, we’ve also added a MeCab parser. MeCab is a Japanese morphological analyzer, and we now have a full-text plugin parser based on it!

How Would I Use It?

  1. Set the mecab_rc_file option — mecab_rc_file is a read-only system variable pertaining to the MeCab parser. The mecabrc file that it points to is a configuration file required by MeCab, …
[Read more]
InnoDB Full-Text : N-gram Parser

The default InnoDB full-text parser is ideal for latin based languages where whitespace is the token or word separator, but for languages like Chinese, Japanese, and Korean (CJK)—where there is no fixed separators for individual words, and each word can be compromised of multiple characters—we need a different way to handle the word tokens. I’m now very happy to say that in MySQL 5.7.6 we’ve made use of the new pluggable full-text parser support in order to provide you with an n-gram parser that can be used with CJK!

[Read more]
Advanced JSON for MySQL

What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough …

[Read more]
Advanced JSON for MySQL: indexing and aggregation for highly complex JSON documents

What is JSON
JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL
It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don't solve the really tough …

[Read more]
Today is the day in which MyISAM is no longer needed

Of course, this is just a catchy title. As far as I know not all system tables can be converted to InnoDB yet (e.g. grant tables), which makes the header technically false. MyISAM is a very simple engine, and that has some inherent advantages (no transactional overhead, easier to “edit” manually, usually less space footprint on disk), but also some very ugly disadvantages: not crash safe, no foreign keys, only full-table locks, consistency problems, bugs in for large tables,… The 5.7.5 “Milestone 15” release, presented today at the Oracle Open World has an impressive list of changes, which I will need some time to digest, like an in-development ( …

[Read more]
Fresh dogfood: Migrating to InnoDB fulltext search on bugs.mysql.com

Even frequent visitors to bugs.mysql.com can sometimes miss the little note in the bottom right corner of each page:

Page generated in 0.017 sec. using MySQL 5.6.11-enterprise-commercial-advanced-log

That text changed this past weekend, going from MySQL Enterprise 5.6.10 to 5.6.11.  But more importantly, the collection of MyISAM tables which support the bugs system were also converted to InnoDB.  There’s a little story to tell here about eating this particular helping of dogfood which also amplifies changelog comments, so here it is:

We like to keep bugs.mysql.com on a current release of MySQL, and we started looking to upgrade from 5.5.27 shortly after GA.  In doing so, …

[Read more]
OpenSQLCamp Lightning Talk Videos

OpenSQLCamp was a huge success! Not many folks have blogged about what they learned there….if you missed it, all is not lost. We did take videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded).

All the videos have been processed, and I am working on uploading them to YouTube and filling in details for the video descriptions. Not all the videos are up right now….right now all the lightning talks are up.

[Read more]
Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx – Fight!)

In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).

Similarities

  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites …
[Read more]
Showing entries 1 to 10 of 14
4 Older Entries »